TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

TFDS CLI

TFDS CLI is a command-line tool that provides various commands to easily work with TensorFlow Datasets.

View on TensorFlow.org

Run in Google Colab

View source on GitHub

Download notebook

Disable TF logs on import

%%capture
%env TF_CPP_MIN_LOG_LEVEL=1  # Disable logs on TF import

Installation

The CLI tool is installed with tensorflow-datasets (or tfds-nightly).

pip install -q tfds-nightly apache-beam
tfds --version

For the list of all CLI commands:

tfds --help

usage: tfds [-h] [--helpfull] [--version] {build,new} ...

Tensorflow Datasets CLI tool

optional arguments:
  -h, --help   show this help message and exit
  --helpfull   show full help message and exit
  --version    show program's version number and exit

command:
  {build,new}
    build      Commands for downloading and preparing datasets.
    new        Creates a new dataset directory from the template.

`tfds new`: Implementing a new Dataset

This command will help you kickstart writing your new Python dataset by creating a <dataset_name>/ directory containing default implementation files.

Usage:

tfds new my_dataset

Dataset generated at /tmpfs/src/temp/docs/my_dataset
You can start searching `TODO(my_dataset)` to complete the implementation.
Please check https://www.tensorflow.org/datasets/add_dataset for additional details.

tfds new my_dataset will create:

ls -1 my_dataset/

CITATIONS.bib
README.md
TAGS.txt
__init__.py
checksums.tsv
dummy_data/
my_dataset_dataset_builder.py
my_dataset_dataset_builder_test.py

An optional flag --data_format can be used to generate format-specific dataset builders (e.g., conll). If no data format is given, it will generate a template for a standard tfds.core.GeneratorBasedBuilder. Refer to the documentation for details on the available format-specific dataset builders.

See our writing dataset guide for more info.

Available options:

tfds new --help

usage: tfds new [-h] [--helpfull] [--data_format {standard,conll,conllu}]
                [--dir DIR]
                dataset_name

positional arguments:
  dataset_name          Name of the dataset to be created (in snake_case)

optional arguments:
  -h, --help            show this help message and exit
  --helpfull            show full help message and exit
  --data_format {standard,conll,conllu}
                        Optional format of the input data, which is used to
                        generate a format-specific template.
  --dir DIR             Path where the dataset directory will be created.
                        Defaults to current directory.

`tfds build`: Download and prepare a dataset

Use tfds build <my_dataset> to generate a new dataset. <my_dataset> can be:

A path to dataset/ folder or dataset.py file (empty for current directory):
- tfds build datasets/my_dataset/
- cd datasets/my_dataset/ && tfds build
- cd datasets/my_dataset/ && tfds build my_dataset
- cd datasets/my_dataset/ && tfds build my_dataset.py
A registered dataset:
- tfds build mnist
- tfds build my_dataset --imports my_project.datasets