View source on GitHub |
Configuration for tfds.core.DatasetBuilder.download_and_prepare
.
tfds.download.DownloadConfig(
extract_dir: Optional[epath.PathLike] = None,
manual_dir: Optional[epath.PathLike] = None,
download_mode: util.GenerateMode = tfds.download.DownloadConfig.download_mode
,
compute_stats: util.ComputeStatsMode = tfds.download.ComputeStatsMode.SKIP
,
max_examples_per_split: Optional[int] = None,
register_checksums: bool = False,
force_checksums_validation: bool = False,
beam_runner: Optional[Any] = None,
beam_options: Optional[Any] = None,
try_download_gcs: bool = True,
verify_ssl: bool = True,
override_max_simultaneous_downloads: Optional[int] = None,
num_shards: Optional[int] = None,
min_shard_size: int = shard_utils.DEFAULT_MIN_SHARD_SIZE,
max_shard_size: int = shard_utils.DEFAULT_MAX_SHARD_SIZE
)
Attributes | |
---|---|
extract_dir
|
str , directory where extracted files are stored. Defaults to
" |
manual_dir
|
str , read-only directory where manually downloaded/extracted
data is stored. Defaults to <download_dir>/manual .
|
download_mode
|
tfds.GenerateMode , how to deal with downloads or data that
already exists. Defaults to REUSE_DATASET_IF_EXISTS , which will reuse
both downloads and data if it already exists.
|
compute_stats
|
tfds.download.ComputeStats , whether to compute statistics
over the generated data. Defaults to AUTO .
|
max_examples_per_split
|
int , optional max number of examples to write into
each split (used for testing). If set to 0, only execute the
_split_generators (download the original data), but skip
_generator_examples .
|
register_checksums
|
bool , defaults to False. If True, checksum of
downloaded files are recorded.
|
force_checksums_validation
|
bool , defaults to False. If True, raises an
error if an URL do not have checksums.
|
beam_runner
|
Runner to pass to beam.Pipeline , only used for datasets based
on Beam for the generation.
|
beam_options
|
PipelineOptions to pass to beam.Pipeline , only used for
datasets based on Beam for the generation.
|
try_download_gcs
|
bool , defaults to True. If True, prepared dataset will
be downloaded from GCS, when available. If False, dataset will be
downloaded and prepared from scratch.
|
verify_ssl
|
bool , defaults to True. If True, will verify certificate when
downloading dataset.
|
override_max_simultaneous_downloads
|
int , optional max number of
simultaneous downloads. If set, it will override dataset builder and
downloader default values.
|
num_shards
|
optional number of shards that should be created. If None ,
then the number of shards is computed based on the total size of the
dataset and the min and max shard size.
|
min_shard_size
|
optional minimum shard size in bytes. If None , 64 MB is
used.
|
max_shard_size
|
optional maximum shard size in bytes. If None , 1 GiB is
used.
|
Methods
get_shard_config
get_shard_config() -> shard_utils.ShardConfig
replace
replace(
**kwargs
) -> DownloadConfig
Returns a copy with updated attributes.