TFDS CLI는 TensorFlow Datasets로 쉽게 작업할 수 있도록 다양한 명령을 제공하는 명령줄 도구입니다.
TensorFlow.org에서 보기 | Google Colab에서 실행 | GitHub에서 소스 보기 | 노트북 다운로드 |
가져올 때 TF 로그 비활성화
%%capture
%env TF_CPP_MIN_LOG_LEVEL=1 # Disable logs on TF import
설치
CLI 도구는 tensorflow-datasets
datasets(또는 tfds-nightly
)와 함께 설치됩니다.
pip install -q tfds-nightly
tfds --version
모든 CLI 명령 목록의 경우:
tfds --help
usage: tfds [-h] [--helpfull] [--version] {build,new} ... Tensorflow Datasets CLI tool optional arguments: -h, --help show this help message and exit --helpfull show full help message and exit --version show program's version number and exit command: {build,new} build Commands for downloading and preparing datasets. new Creates a new dataset directory from the template.
tfds new
: 새 데이터세트 구현
이 명령은 기본 구현 파일이 포함된 <dataset_name>/
디렉토리를 생성하여 새 Python 데이터 세트 작성을 시작하는 데 도움이 됩니다.
용법:
tfds new my_dataset
2022-02-07 04:04:10.397902: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected Dataset generated at /tmpfs/src/temp/docs/my_dataset You can start searching `TODO(my_dataset)` to complete the implementation. Please check https://www.tensorflow.org/datasets/add_dataset for additional details.
생성할 것:
ls -1 my_dataset/
__init__.py checksums.tsv dummy_data/ my_dataset.py my_dataset_test.py
자세한 내용은 데이터 세트 작성 가이드 를 참조하세요.
사용 가능한 옵션:
tfds new --help
usage: tfds new [-h] [--helpfull] [--dir DIR] dataset_name positional arguments: dataset_name Name of the dataset to be created (in snake_case) optional arguments: -h, --help show this help message and exit --helpfull show full help message and exit --dir DIR Path where the dataset directory will be created. Defaults to current directory.
tfds build
: 데이터세트 다운로드 및 준비
tfds build <my_dataset>
를 사용하여 새 데이터 세트를 생성합니다. <my_dataset>
은 다음과 같을 수 있습니다.
dataset/
폴더 또는dataset.py
파일의 경로(현재 디렉토리의 경우 비어 있음):-
tfds build datasets/my_dataset/
-
cd datasets/my_dataset/ && tfds build
-
cd datasets/my_dataset/ && tfds build my_dataset
-
cd datasets/my_dataset/ && tfds build my_dataset.py
-
등록된 데이터세트:
-
tfds build mnist
-
tfds build my_dataset --imports my_project.datasets
-
사용 가능한 옵션:
tfds build --help
usage: tfds build [-h] [--helpfull] [--datasets DATASETS_KEYWORD [DATASETS_KEYWORD ...]] [--overwrite] [--max_examples_per_split [MAX_EXAMPLES_PER_SPLIT]] [--data_dir DATA_DIR] [--download_dir DOWNLOAD_DIR] [--extract_dir EXTRACT_DIR] [--manual_dir MANUAL_DIR] [--add_name_to_manual_dir] [--config CONFIG] [--config_idx CONFIG_IDX] [--imports IMPORTS] [--register_checksums] [--force_checksums_validation] [--beam_pipeline_options BEAM_PIPELINE_OPTIONS] [--file_format FILE_FORMAT] [--exclude_datasets EXCLUDE_DATASETS] [--experimental_latest_version] [datasets [datasets ...]] positional arguments: datasets Name(s) of the dataset(s) to build. Default to current dir. See https://www.tensorflow.org/datasets/cli for accepted values. optional arguments: -h, --help show this help message and exit --helpfull show full help message and exit --datasets DATASETS_KEYWORD [DATASETS_KEYWORD ...] Datasets can also be provided as keyword argument. Debug & tests: --pdb Enter post-mortem debugging mode if an exception is raised. --overwrite Delete pre-existing dataset if it exists. --max_examples_per_split [MAX_EXAMPLES_PER_SPLIT] When set, only generate the first X examples (default to 1), rather than the full dataset.If set to 0, only execute the `_split_generators` (which download the original data), but skip `_generator_examples` Paths: --data_dir DATA_DIR Where to place datasets. Default to `~/tensorflow_datasets/` or `TFDS_DATA_DIR` environement variable. --download_dir DOWNLOAD_DIR Where to place downloads. Default to `<data_dir>/downloads/`. --extract_dir EXTRACT_DIR Where to extract files. Default to `<download_dir>/extracted/`. --manual_dir MANUAL_DIR Where to manually download data (required for some datasets). Default to `<download_dir>/manual/`. --add_name_to_manual_dir If true, append the dataset name to the `manual_dir` (e.g. `<download_dir>/manual/<dataset_name>/`. Useful to avoid collisions if many datasets are generated. Generation: --config CONFIG, -c CONFIG Config name to build. Build all configs if not set. --config_idx CONFIG_IDX Config id to build (`builder_cls.BUILDER_CONFIGS[config_idx]`). Mutually exclusive with `--config`. --imports IMPORTS, -i IMPORTS Comma separated list of module to import to register datasets. --register_checksums If True, store size and checksum of downloaded files. --force_checksums_validation If True, raise an error if the checksums are not found. --beam_pipeline_options BEAM_PIPELINE_OPTIONS A (comma-separated) list of flags to pass to `PipelineOptions` when preparing with Apache Beam. (see: https://www.tensorflow.org/datasets/beam_datasets). Example: `--beam_pipeline_options=job_name=my- job,project=my-project` --file_format FILE_FORMAT File format to which generate the tf-examples. Available values: ['tfrecord', 'riegeli'] (see `tfds.core.FileFormat`). Automation: Used by automated scripts. --exclude_datasets EXCLUDE_DATASETS If set, generate all datasets except the one defined here. Comma separated list of datasets to exclude. --experimental_latest_version Build the latest Version(experiments=...) available rather than default version.