tff.simulation.datasets.TestClientData

A tff.simulation.datasets.ClientData intended for test purposes.

Inherits From: ClientData

The implementation is based on tf.data.Dataset.from_tensor_slices. This class is intended only for constructing toy federated datasets, especially to support simulation tests. Using this for large datasets is not recommended, as it requires putting all client data into the underlying TensorFlow graph (which is memory intensive).

tensor_slices_dict A dictionary keyed by client_id, where values are lists, tuples, or dicts for passing to tf.data.Dataset.from_tensor_slices. Note that namedtuples and attrs classes are not explicitly supported, but a user can convert their data from those formats to a dict, and then use this class. The leaves of this dictionary must not be tf.Tensors, in order to avoid putting eager tensors into graphs.

ValueError If a client with no data is found.
TypeError If tensor_slices_dict is not a dictionary, or its value structures are namedtuples, or its value structures are not either strictly lists, strictly (standard, non-named) tuples, or strictly dictionaries.
TypeError If any leaf of tensor_slices_dict is a tf.Tensor.

client_ids A list of string identifiers for clients in this dataset.
dataset_computation A tff.Computation accepting a client ID, returning a dataset.

element_type_structure The element type information of the client datasets.

elements returned by datasets in this ClientData object.

serializable_dataset_fn A callable accepting a client ID and returning a tf.data.Dataset.

Note that this callable must be traceable by TF, as it will be used in the context of a tf.function.

Methods

create_tf_dataset_for_client

View source

Creates a new tf.data.Dataset containing the client training examples.

This function will create a dataset for a given client, given that client_id is contained in the client_ids property of the ClientData. Unlike create_dataset, this method need not be serializable.

Args
client_id The string client_id for the desired client.

Returns
A tf.data.Dataset object.

create_tf_dataset_from_all_clients

View source

Creates a new tf.data.Dataset containing all client examples.

This function is intended for use training centralized, non-distributed models (num_clients=1). This can be useful as a point of comparison against federated models.

Currently, the implementation produces a dataset that contains all examples from a single client in order, and so generally additional shuffling should be performed.

Args
seed Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any nonnegative 32-bit integer, an array of such integers, or None.

Returns
A tf.data.Dataset object.

datasets

View source

Yields the tf.data.Dataset for each client in random order.

This function is intended for use building a static array of client data to be provided to the top-level federated computation.

Args
limit_count Optional, a maximum number of datasets to return.
seed Optional, a seed to determine the order in which clients are processed in the joined dataset. The seed can be any nonnegative 32-bit integer, an array of such integers, or None.

from_clients_and_tf_fn

View source

Constructs a ClientData based on the given function.

Args
client_ids A non-empty list of strings to use as input to create_dataset_fn.
serializable_dataset_fn A function that takes a client_id from the above list, and returns a tf.data.Dataset. This function must be serializable and usable within the context of a tf.function and tff.Computation.

Raises
TypeError If serializable_dataset_fn is a tff.Computation.

Returns
A ClientData object.

preprocess

View source

Applies preprocess_fn to each client's data.

Args
preprocess_fn A callable accepting a tf.data.Dataset and returning a preprocessed tf.data.Dataset. This function must be traceable by TF.

Returns
A tff.simulation.datasets.ClientData.

Raises
IncompatiblePreprocessFnError If preprocess_fn is a tff.Computation.

train_test_client_split

View source

Returns a pair of (train, test) ClientData.

This method partitions the clients of client_data into two ClientData objects with disjoint sets of ClientData.client_ids. All clients in the test ClientData are guaranteed to have non-empty datasets, but the training ClientData may have clients with no data.

Args
client_data The base ClientData to split.
num_test_clients How many clients to hold out for testing. This can be at most len(client_data.client_ids) - 1, since we don't want to produce empty ClientData.
seed Optional seed to fix shuffling of clients before splitting. The seed can be any nonnegative 32-bit integer, an array of such integers, or None.

Returns
A pair (train_client_data, test_client_data), where test_client_data has num_test_clients selected at random, subject to the constraint they each have at least 1 batch in their dataset.

Raises
ValueError If num_test_clients cannot be satistifed by client_data, or too many clients have empty datasets.