View source on GitHub |
Loads a federated version of the CIFAR-100 dataset.
tff.simulation.datasets.cifar100.load_data(
cache_dir=None
)
The dataset is downloaded and cached locally. If previously downloaded, it tries to load the dataset from cache.
The dataset is derived from the CIFAR-100 dataset. The training and testing examples are partitioned across 500 and 100 clients (respectively). No clients share any data samples, so it is a true partition of CIFAR-100. The train clients have string client IDs in the range [0-499], while the test clients have string client IDs in the range [0-99]. The train clients form a true partition of the CIFAR-100 training split, while the test clients form a true partition of the CIFAR-100 testing split.
The data partitioning is done using a hierarchical Latent Dirichlet Allocation (LDA) process, referred to as the Pachinko Allocation Method (PAM). This method uses a two-stage LDA process, where each client has an associated multinomial distribution over the coarse labels of CIFAR-100, and a coarse-to-fine label multinomial distribution for that coarse label over the labels under that coarse label. The coarse label multinomial is drawn from a symmetric Dirichlet with parameter 0.1, and each coarse-to-fine multinomial distribution is drawn from a symmetric Dirichlet with parameter 10. Each client has 100 samples. To generate a sample for the client, we first select a coarse label by drawing from the coarse label multinomial distribution, and then draw a fine label using the coarse-to-fine multinomial distribution. We then randomly draw a sample from CIFAR-100 with that label (without replacement). If this exhausts the set of samples with this label, we remove the label from the coarse-to-fine multinomial and renormalize the multinomial distribution.
Data set sizes:
- train: 50,000 examples
- test: 10,000 examples
The tf.data.Datasets
returned by
tff.simulation.datasets.ClientData.create_tf_dataset_for_client
will yield
collections.OrderedDict
objects at each iteration, with the following keys
and values, in lexicographic order by key:
'coarse_label'
: atf.Tensor
withdtype=tf.int64
and shape [1] that corresponds to the coarse label of the associated image. Labels are in the range [0-19].'image'
: atf.Tensor
withdtype=tf.uint8
and shape [32, 32, 3], containing the red/blue/green pixels of the image. Each pixel is a value in the range [0, 255].'label'
: atf.Tensor
withdtype=tf.int64
and shape [1], the class label of the corresponding image. Labels are in the range [0-99].
Args | |
---|---|
cache_dir
|
(Optional) directory to cache the downloaded file. If None ,
caches in Keras' default cache directory.
|
Returns | |
---|---|
Tuple of (train, test) where the tuple elements are
tff.simulation.datasets.ClientData objects.
|