rlu_dmlab_explore_object_rewards_few

설명 :

RL Unplugged는 오프라인 강화 학습을 위한 벤치마크 모음입니다. RL Unplugged는 다음 고려 사항을 중심으로 설계되었습니다. 사용 편의성을 높이기 위해 일반 파이프라인이 설정되면 실무자가 제품군의 모든 데이터로 쉽게 작업할 수 있도록 하는 통합 API와 함께 데이터 세트를 제공합니다.

데이터 세트는 RLDS 형식 을 따라 단계와 에피소드를 나타냅니다.

DeepMind Lab 데이터 세트에는 도전적이고 부분적으로 관찰 가능한 Deepmind Lab 제품군 의 여러 수준이 있습니다. DeepMind Lab 데이터 세트는 Kapturowski et al., 2018 에이전트가 개별 작업에 대해 처음부터 분산 R2D2를 교육하여 수집됩니다. 모든 작업에 대해 전체 교육 실행 동안 모든 행위자에 대한 경험을 몇 번 기록했습니다. 데이터 세트 생성 프로세스의 세부 사항은 Gulcehre et al., 2021 에 설명되어 있습니다.

우리는 다섯 가지 DeepMind Lab 수준( seekavoid_arena_01 , explore_rewards_few , explore_rewards_many , rooms_watermaze , rooms_select_nonmatching_object )에 대한 데이터 세트를 공개합니다. 또한 환경에서 에이전트를 평가할 때 epsilon-greedy 알고리즘에 대해 서로 다른 수준의 엡실론을 사용하여 훈련된 R2D2 스냅샷에서 데이터 세트를 생성한 seekavoid_arena_01 수준의 스냅샷 데이터 세트를 릴리스합니다.

DeepMind Lab 데이터 세트는 상당히 대규모입니다. 메모리가 포함된 대규모 오프라인 RL 모델에 관심이 있는 경우 사용해 보는 것이 좋습니다.

홈페이지 : https://github.com/deepmind/deepmind-research/tree/master/rl_unplugged
소스 코드 : tfds.rl_unplugged.rlu_dmlab_explore_object_rewards_few.RluDmlabExploreObjectRewardsFew
버전 :
- 1.0.0 : 최초 릴리스.
- 1.1.0 : is_last가 추가되었습니다.
- 1.2.0 (기본값): BGR -> 픽셀 관찰을 위한 RGB 수정.
다운로드 크기 : Unknown size
자동 캐시 ( 문서 ): 아니요
분할 :

나뉘다	예
`'train'`	89,144

기능 구조 :

FeaturesDict({
    'episode_id': int64,
    'episode_return': float32,
    'steps': Dataset({
        'action': int64,
        'discount': float32,
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'observation': FeaturesDict({
            'last_action': int64,
            'last_reward': float32,
            'pixels': Image(shape=(72, 96, 3), dtype=uint8),
        }),
        'reward': float32,
    }),
})

기능 문서 :

특징	수업	모양	D타입
	풍모Dict
episode_id	텐서		int64
episode_return	텐서		float32
단계	데이터세트
단계/액션	텐서		int64
단계/할인	텐서		float32
단계/is_first	텐서		부울
단계/is_last	텐서		부울
단계/is_terminal	텐서		부울
단계/관찰	풍모Dict
단계/관찰/last_action	텐서		int64
단계/관찰/last_reward	텐서		float32
단계/관찰/픽셀	영상	(72, 96, 3)	uint8
단계/보상	텐서		float32

감독된 키 ( as_supervised 문서 참조): None
그림 ( tfds.show_examples ): 지원되지 않습니다.
인용 :

@article{gulcehre2021rbve,
    title={Regularized Behavior Value Estimation},
    author={ {\c{C} }aglar G{\"{u} }l{\c{c} }ehre and
               Sergio G{\'{o} }mez Colmenarejo and
               Ziyu Wang and
               Jakub Sygnowski and
               Thomas Paine and
               Konrad Zolna and
               Yutian Chen and
               Matthew W. Hoffman and
               Razvan Pascanu and
               Nando de Freitas},
    year={2021},
    journal   = {CoRR},
    url       = {https://arxiv.org/abs/2103.09575},
    eprint={2103.09575},
    archivePrefix={arXiv},
}

rlu_dmlab_explore_object_rewards_few/training_0(기본 구성)

데이터세트 크기 : 847.00 GiB
예 ( tfds.as_dataframe ):

rlu_dmlab_explore_object_rewards_few/training_1

데이터세트 크기 : 877.76 GiB
예 ( tfds.as_dataframe ):

rlu_dmlab_explore_object_rewards_few/training_2

데이터세트 크기 : 836.43 GiB
예 ( tfds.as_dataframe ):

rlu_dmlab_explore_object_rewards_few 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

rlu_dmlab_explore_object_rewards_few/training_0(기본 구성)

rlu_dmlab_explore_object_rewards_few/training_1

rlu_dmlab_explore_object_rewards_few/training_2

rlu_dmlab_explore_object_rewards_few