View source on GitHub |
Loads the federated Shakespeare dataset.
tff.simulation.datasets.shakespeare.load_data(
cache_dir: Optional[str] = None
) -> tuple[client_data.ClientData, client_data.ClientData]
Used in the notebooks
Used in the tutorials |
---|
Downloads and caches the dataset locally. If previously downloaded, tries to load the dataset from cache.
This dataset is derived from the Leaf repository (https://github.com/TalwalkarLab/leaf) pre-processing on the works of Shakespeare, which is published in "LEAF: A Benchmark for Federated Settings" https://arxiv.org/abs/1812.01097
The data set consists of 715 users (characters of Shakespeare plays), where each example corresponds to a contiguous set of lines spoken by the character in a given play.
Data set sizes:
- train: 16,068 examples
- test: 2,356 examples
Rather than holding out specific users, each user's examples are split across train and test so that all users have at least one example in train and one example in test. Characters that had less than 2 examples are excluded from the data set.
The tf.data.Datasets
returned by
tff.simulation.datasets.ClientData.create_tf_dataset_for_client
will yield
collections.OrderedDict
objects at each iteration, with the following keys
and values:
'snippets'
: atf.Tensor
withdtype=tf.string
, the snippet of contiguous text.
Args | |
---|---|
cache_dir
|
(Optional) directory to cache the downloaded file. If None ,
caches in Keras' default cache directory.
|
Returns | |
---|---|
Tuple of (train, test) where the tuple elements are
tff.simulation.datasets.ClientData objects.
|