View source on GitHub |
Loads the Federated EMNIST dataset.
tff.simulation.datasets.emnist.load_data(
only_digits=True, cache_dir=None
)
Used in the notebooks
Used in the tutorials |
---|
Downloads and caches the dataset locally. If previously downloaded, tries to load the dataset from cache.
This dataset is derived from the Leaf repository (https://github.com/TalwalkarLab/leaf) pre-processing of the Extended MNIST dataset, grouping examples by writer. Details about Leaf were published in "LEAF: A Benchmark for Federated Settings" https://arxiv.org/abs/1812.01097
Data set sizes:
only_digits=True: 3,383 users, 10 label classes
- train: 341,873 examples
- test: 40,832 examples
only_digits=False: 3,400 users, 62 label classes
- train: 671,585 examples
- test: 77,483 examples
Rather than holding out specific users, each user's examples are split across train and test so that all users have at least one example in train and one example in test. Writers that had less than 2 examples are excluded from the data set.
The tf.data.Datasets
returned by
tff.simulation.datasets.ClientData.create_tf_dataset_for_client
will yield
collections.OrderedDict
objects at each iteration, with the following keys
and values, in lexicographic order by key:
'label'
: atf.Tensor
withdtype=tf.int32
and shape [1], the class label of the corresponding pixels. Labels [0-9] correspond to the digits classes, labels [10-35] correspond to the uppercase classes (e.g., label 11 is 'B'), and labels [36-61] correspond to the lowercase classes (e.g., label 37 is 'b').'pixels'
: atf.Tensor
withdtype=tf.float32
and shape [28, 28], containing the pixels of the handwritten digit, with values in the range [0.0, 1.0].
Returns | |
---|---|
Tuple of (train, test) where the tuple elements are
tff.simulation.datasets.ClientData objects.
|