TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

rlu_dmlab_explore_object_rewards_many

Description:

RL Unplugged is suite of benchmarks for offline reinforcement learning. The RL Unplugged is designed around the following considerations: to facilitate ease of use, we provide the datasets with a unified API which makes it easy for the practitioner to work with all data in the suite once a general pipeline has been established.

The datasets follow the RLDS format to represent steps and episodes.

DeepMind Lab dataset has several levels from the challenging, partially observable Deepmind Lab suite. DeepMind Lab dataset is collected by training distributed R2D2 by Kapturowski et al., 2018 agents from scratch on individual tasks. We recorded the experience across all actors during entire training runs a few times for every task. The details of the dataset generation process is described in Gulcehre et al., 2021.

We release datasets for five different DeepMind Lab levels: seekavoid_arena_01, explore_rewards_few, explore_rewards_many, rooms_watermaze, rooms_select_nonmatching_object. We also release the snapshot datasets for seekavoid_arena_01 level that we generated the datasets from a trained R2D2 snapshot with different levels of epsilons for the epsilon-greedy algorithm when evaluating the agent in the environment.

DeepMind Lab dataset is fairly large-scale. We recommend you to try it if you are interested in large-scale offline RL models with memory.

Homepage: https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged
Source code: tfds.rl_unplugged.rlu_dmlab_explore_object_rewards_many.RluDmlabExploreObjectRewardsMany
Versions:
- 1.0.0: Initial release.
- 1.1.0: Added is_last.
- 1.2.0 (default): BGR -> RGB fix for pixel observations.
Download size: Unknown size
Auto-cached (documentation): No
Feature structure:

FeaturesDict({
    'episode_id': int64,
    'episode_return': float32,
    'steps': Dataset({
        'action': int64,
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'last_action': int64,
            'last_reward': float32,
            'pixels': Image(shape=(72, 96, 3), dtype=uint8),
        }),
        'reward': float32,
    }),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
episode_id	Tensor		int64
episode_return	Tensor		float32
steps	Dataset
steps/action	Tensor		int64
steps/discount	Tensor		float32
steps/is_first	Tensor		bool
steps/is_last	Tensor		bool
steps/is_terminal	Tensor		bool
steps/observation	FeaturesDict
steps/observation/last_action	Tensor		int64
steps/observation/last_reward	Tensor		float32
steps/observation/pixels	Image	(72, 96, 3)	uint8
steps/reward	Tensor		float32

Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Citation:

@article{gulcehre2021rbve,
    title={Regularized Behavior Value Estimation},
    author={ {\c{C} }aglar G{\"{u} }l{\c{c} }ehre and
               Sergio G{\'{o} }mez Colmenarejo and
               Ziyu Wang and
               Jakub Sygnowski and
               Thomas Paine and
               Konrad Zolna and
               Yutian Chen and
               Matthew W. Hoffman and
               Razvan Pascanu and
               Nando de Freitas},
    year={2021},
    journal   = {CoRR},
    url       = {https://arxiv.org/abs/2103.09575},
    eprint={2103.09575},
    archivePrefix={arXiv},
}

rlu_dmlab_explore_object_rewards_many/training_0 (default config)

Dataset size: 1.51 TiB
Splits:

Split	Examples
`'train'`	111,370

Examples (tfds.as_dataframe):

rlu_dmlab_explore_object_rewards_many/training_1

Dataset size: 1.44 TiB
Splits:

Split	Examples
`'train'`	111,367

Examples (tfds.as_dataframe):

rlu_dmlab_explore_object_rewards_many/training_2

Dataset size: 1.48 TiB
Splits:

Split	Examples
`'train'`	111,367

Examples (tfds.as_dataframe):