View source on GitHub |
Enum
for how to treat pre-existing downloads and data.
The default mode is REUSE_DATASET_IF_EXISTS
, which will reuse both
raw downloads and the prepared dataset if they exist.
The generations modes:
Downloads | Dataset | Metadata | |
---|---|---|---|
REUSE_DATASET_IF_EXISTS (default) |
Reuse | Reuse | Reuse |
UPDATE_DATASET_INFO |
Reuse | Reuse | Fresh |
REUSE_CACHE_IF_EXISTS |
Reuse | Fresh | Fresh |
FORCE_REDOWNLOAD |
Fresh | Fresh | Fresh |
UPDATE_DATASET_INFO only regenerates DatasetInfo metadata which is directly
coming from the Builder metadata, and not directly used to prepare the data
or computed from the downloaded or prepared data.
This means that description
, config_tags
, etc. will be updated, but
download_size
, schema
, splits
, disable_shuffling
, file_format
will
not be updated.
UPDATE_DATASET_INFO will fail if the data has never been prepared.