tensorflow_datasets
(tfds
) defines a collection of datasets ready-to-use with TensorFlow.
Each dataset is defined as a tfds.core.DatasetBuilder
, which encapsulates
the logic to download the dataset and construct an input pipeline, as well as
contains the dataset documentation (version, splits, number of examples, etc.).
The main library entrypoints are:
tfds.builder
: fetch atfds.core.DatasetBuilder
by nametfds.load
: convenience method to construct a builder, download the data, and create an input pipeline, returning atf.data.Dataset
.
Documentation:
- These API docs
- Available datasets
- Colab tutorial
- Add a dataset
Modules
beam
module: Beam utils.
core
module: API to define datasets.
dataset_builders
module: Dataset builders API.
decode
module: Decoder public API.
deprecated
module: Deprecated symbols.
download
module: tfds.download.DownloadManager
API.
features
module: API defining dataset features (image, text, scalar,...).
folder_dataset
module: Utils to load data comming from third party sources directly with TFDS.
testing
module: Testing utilities.
transform
module: Transform API.
typing
module: TFDS typing annotations.
visualization
module: Visualizer utils.
Classes
class GenerateMode
: Enum
for how to treat pre-existing downloads and data.
class ImageFolder
: Generic image classification dataset created from manual directory.
class ReadConfig
: Configures input reading pipeline.
class Split
: Enum
for dataset splits.
class TranslateFolder
: Generic text translation dataset created from manual directory.
Functions
as_dataframe(...)
: Convert the dataset into a pandas dataframe.
as_numpy(...)
: Converts a tf.data.Dataset
to an iterable of NumPy arrays.
benchmark(...)
: Benchmarks any iterable (e.g tf.data.Dataset
).
builder(...)
: Fetches a tfds.core.DatasetBuilder
by string name.
builder_cls(...)
: Fetches a tfds.core.DatasetBuilder
class by string name.
builder_from_directories(...)
: Loads a tfds.core.DatasetBuilder
from the given generated dataset path.
builder_from_directory(...)
: Loads a tfds.core.DatasetBuilder
from the given generated dataset path.
data_source(...)
: Gets a data source from the named dataset.
dataset_collection(...)
: Instantiates a DatasetCollectionLoader.
disable_progress_bar(...)
: Disables Tqdm progress bar.
display_progress_bar(...)
: Controls whether Tqdm progress bar is enabled/disabled.
enable_progress_bar(...)
: Enables Tqdm progress bar.
even_splits(...)
: Generates a list of non-overlapping sub-splits of same size.
is_dataset_on_gcs(...)
: If the dataset is available on the GCS bucket gs://tfds-data/datasets.
list_builders(...)
: Returns the string names of all tfds.core.DatasetBuilder
s.
list_dataset_collections(...)
: Returns the string names of all tfds.core.DatasetCollectionBuilder
s.
load(...)
: Loads the named dataset into a tf.data.Dataset
.
show_examples(...)
: Visualize images (and labels) from an image classification dataset.
show_statistics(...)
: Display the datasets statistics on a Colab/Jupyter notebook.
split_for_jax_process(...)
: Returns the subsplit of the data for the process.