View source on GitHub |
Loads the Federated CelebA dataset.
tff.simulation.datasets.celeba.load_data(
split_by_clients=True, cache_dir=None
)
Downloads and caches the dataset locally. If previously downloaded, tries to load the dataset from cache.
This dataset is derived from the LEAF repository preprocessing of the CelebA dataset, grouping examples by celebrity id. Details about LEAF were published in "LEAF: A Benchmark for Federated Settings", and details about CelebA were published in "Deep Learning Face Attributes in the Wild".
The raw CelebA dataset contains 10,177 unique identities. During LEAF preprocessing, all clients with less than 5 examples are removed; this leaves 9,343 clients.
The data is available with train and test splits by clients or by examples. That is, when split by clients, ~90% of clients are selected for the train set, ~10% of clients are selected for test, and all the examples for a given user are part of the same data split. When split by examples, each client is located in both the train data and the test data, with ~90% of the examples on each client selected for train and ~10% of the examples selected for test.
Data set sizes:
split_by_clients=True:
- train: 8,408 clients, 180,429 total examples
- test: 935 clients, 19,859 total examples
split_by_clients=False:
- train: 9,343 clients, 177,457 total examples
- test: 9,343 clients, 22,831 total examples
The tf.data.Datasets
returned by
tff.simulation.datasets.ClientData.create_tf_dataset_for_client
will yield
collections.OrderedDict
objects at each iteration. These objects have a
key/value pair storing the image of the celebrity:
'image'
: atf.Tensor
withdtype=tf.int64
and shape [84, 84, 3], containing the red/blue/green pixels of the image. Each pixel is a value in the range [0, 255].
The OrderedDict objects also contain an additional 40 key/value pairs for the celebrity image attributes, each of the format:
{attribute name}
: atf.Tensor
withdtype=tf.bool
and shape [1], set to True if the celebrity has this attribute in the image, or False if they don't.
The attribute names are: 'five_o_clock_shadow', 'arched_eyebrows', 'attractive', 'bags_under_eyes', 'bald', 'bangs', 'big_lips', 'big_nose', 'black_hair', 'blond_hair', 'blurry', 'brown_hair', 'bushy_eyebrows', 'chubby', 'double_chin', 'eyeglasses', 'goatee', 'gray_hair', 'heavy_makeup', 'high_cheekbones', 'male', 'mouth_slightly_open', 'mustache', 'narrow_eyes', 'no_beard', 'oval_face', 'pale_skin', 'pointy_nose', 'receding_hairline', 'rosy_cheeks', 'sideburns', 'smiling', 'straight_hair', 'wavy_hair', 'wearing_earrings', 'wearing_hat', 'wearing_lipstick', 'wearing_necklace', 'wearing_necktie', 'young'
Returns | |
---|---|
Tuple of (train, test) where the tuple elements are
tff.simulation.datasets.ClientData objects.
|