View source on GitHub |
Loads a federated version of the Google Landmark v2 dataset.
tff.simulation.datasets.gldv2.load_data(
num_worker: int = 1,
cache_dir: str = 'cache',
gld23k: bool = False,
base_url: str = GLD_SHARD_BASE_URL
)
The dataset consists of photos of various world landmarks, with images grouped by photographer to achieve a federated partitioning of the data. The dataset is downloaded and cached locally. If previously downloaded, it tries to load the dataset from cache.
The tf.data.Datasets
returned by
tff.simulation.datasets.ClientData.create_tf_dataset_for_client
will yield
collections.OrderedDict
objects at each iteration, with the following keys
and values:
'image/decoded'
: Atf.Tensor
withdtype=tf.uint8
that corresponds to the pixels of the landmark images.'class'
: Atf.Tensor
withdtype=tf.int64
and shape [1], corresponding to the class label of the landmark ([0, 203) for gld23k, [0, 2028) for gld160k).
Two flavors of GLD datasets are available. When gld23k is true, a minimum version of the federated Google landmark dataset will be provided for faster iterations. The gld23k dataset contains 203 classes, 233 clients and 23080 images. When gld23k is false, the gld160k dataset (https://arxiv.org/abs/2003.08082) will be provided. The gld160k dataset contains 2,028 classes, 1262 clients and 164,172 images.
Returns | |
---|---|
Tuple of (train, test) where the tuple elements are
a tff.simulation.datasets.ClientData and a tf.data.Dataset .
|