View source on GitHub |
Loads the named dataset into a tf.data.Dataset
.
tfds.load(
name: str,
*,
split: Optional[Tree[splits_lib.SplitArg]] = None,
data_dir: Union[None, str, os.PathLike] = None,
batch_size: Optional[int] = None,
shuffle_files: bool = False,
download: bool = True,
as_supervised: bool = False,
decoders: Optional[TreeDict[decode.partial_decode.DecoderArg]] = None,
read_config: Optional[read_config_lib.ReadConfig] = None,
with_info: bool = False,
builder_kwargs: Optional[Dict[str, Any]] = None,
download_and_prepare_kwargs: Optional[Dict[str, Any]] = None,
as_dataset_kwargs: Optional[Dict[str, Any]] = None,
try_gcs: bool = False
)
Used in the notebooks
Used in the guide | Used in the tutorials |
---|---|
tfds.load
is a convenience method that:
Fetch the
tfds.core.DatasetBuilder
by name:builder = tfds.builder(name, data_dir=data_dir, **builder_kwargs)
Generate the data (when
download=True
):builder.download_and_prepare(**download_and_prepare_kwargs)
Load the
tf.data.Dataset
object:ds = builder.as_dataset( split=split, as_supervised=as_supervised, shuffle_files=shuffle_files, read_config=read_config, decoders=decoders, **as_dataset_kwargs, )
See: https://www.tensorflow.org/datasets/overview#load_a_dataset for more examples.
If you'd like NumPy arrays instead of tf.data.Dataset
s or tf.Tensor
s,
you can pass the return value to tfds.as_numpy
.
Args | |
---|---|
name
|
str , the registered name of the DatasetBuilder (the snake case
version of the class name). The config and version can also be specified
in the name as follows: 'dataset_name[/config_name][:version]' . For
example, 'movielens/25m-ratings' (for the latest version of
'25m-ratings' ), 'movielens:0.1.0' (for the default config and version
0.1.0), or'movielens/25m-ratings:0.1.0' . Note that only the latest
version can be generated, but old versions can be read if they are present
on disk. For convenience, the name parameter can contain comma-separated
keyword arguments for the builder. For example, 'foo_bar/a=True,b=3'
would use the FooBar dataset passing the keyword arguments a=True and
b=3 (for builders with configs, it would be 'foo_bar/zoo/a=True,b=3'
to use the 'zoo' config and pass to the builder keyword arguments
a=True and b=3 ).
|
split
|
Which split of the data to load (e.g. 'train' , 'test' , ['train',
'test'] , 'train[80%:]' ,...). See our split API
guide. If None , will return
all splits in a Dict[Split, tf.data.Dataset]
|
data_dir
|
directory to read/write data. Defaults to the value of the environment variable TFDS_DATA_DIR, if set, otherwise falls back to datasets are stored. |
batch_size
|
int , if set, add a batch dimension to examples. Note that
variable length features will be 0-padded. If batch_size=-1 , will return
the full dataset as tf.Tensor s.
|
shuffle_files
|
bool , whether to shuffle the input files. Defaults to
False .
|
download
|
bool (optional), whether to call
tfds.core.DatasetBuilder.download_and_prepare before calling
tfds.core.DatasetBuilder.as_dataset . If False , data is expected to be
in data_dir . If True and the data is already in data_dir ,
when data_dir is a Placer path.
|
as_supervised
|
bool , if True , the returned tf.data.Dataset will have a
2-tuple structure (input, label) according to
builder.info.supervised_keys . If False , the default, the returned
tf.data.Dataset will have a dictionary with all the features.
|
decoders
|
Nested dict of Decoder objects which allow to customize the
decoding. The structure should match the feature structure, but only
customized feature keys need to be present. See the
guide
for more info.
|
read_config
|
tfds.ReadConfig , Additional options to configure the input
pipeline (e.g. seed, num parallel reads,...).
|
with_info
|
bool , if True , tfds.load will return the tuple
(tf.data.Dataset , tfds.core.DatasetInfo ), the latter containing the
info associated with the builder.
|
builder_kwargs
|
dict (optional), keyword arguments to be passed to the
tfds.core.DatasetBuilder constructor. data_dir will be passed through
by default.
|
download_and_prepare_kwargs
|
dict (optional) keyword arguments passed to
tfds.core.DatasetBuilder.download_and_prepare if download=True . Allow
to control where to download and extract the cached data. If not set,
cache_dir and manual_dir will automatically be deduced from data_dir.
|
as_dataset_kwargs
|
dict (optional), keyword arguments passed to
tfds.core.DatasetBuilder.as_dataset .
|
try_gcs
|
bool , if True, tfds.load will see if the dataset exists on the
public GCS bucket before building it locally. This is equivalent to
passing data_dir='gs://tfds-data/datasets' . Warning: try_gcs is
different than builder_kwargs.download_config.try_download_gcs .
try_gcs (default: False) overrides data_dir to be the public GCS
bucket. try_download_gcs (default: True) allows downloading from GCS
while keeping a different data_dir than the public GCS bucket. So, to
fully bypass GCS, please use try_gcs=False and
download_and_prepare_kwargs={'download_config':
tfds.core.download.DownloadConfig(try_download_gcs=False)}) .
|
Returns | |
---|---|
ds
|
tf.data.Dataset , the dataset requested, or if split is None, a
dict<key: tfds.Split, value: tf.data.Dataset> . If batch_size=-1 ,
these will be full datasets as tf.Tensor s.
|
ds_info
|
tfds.core.DatasetInfo , if with_info is True, then tfds.load
will return a tuple (ds, ds_info) containing dataset information
(version, features, splits, num_examples,...). Note that the ds_info
object documents the entire dataset, regardless of the split requested.
Split-specific information is available in ds_info.splits .
|