- Description:
Sawyer pushing and picking objects in a bin
Homepage: https://arxiv.org/abs/2206.11894
Source code:
tfds.robotics.rtx.StanfordMaskVitConvertedExternallyToRlds
Versions:
0.1.0
(default): Initial release.
Download size:
Unknown size
Dataset size:
76.17 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
9,109 |
'val' |
91 |
- Feature structure:
FeaturesDict({
'episode_metadata': FeaturesDict({
'file_path': Text(shape=(), dtype=string),
}),
'steps': Dataset({
'action': Tensor(shape=(5,), dtype=float32, description=Robot action, consists of [3x change in end effector position, 1x gripper yaw, 1x open/close gripper (-1 means to open the gripper, 1 means close)].),
'discount': Scalar(shape=(), dtype=float32, description=Discount if provided, default to 1.),
'is_first': bool,
'is_last': bool,
'is_terminal': bool,
'language_embedding': Tensor(shape=(512,), dtype=float32, description=Kona language embedding. See https://tfhub.dev/google/universal-sentence-encoder-large/5),
'language_instruction': Text(shape=(), dtype=string),
'observation': FeaturesDict({
'end_effector_pose': Tensor(shape=(5,), dtype=float32, description=Robot end effector pose, consists of [3x Cartesian position, 1x gripper yaw, 1x gripper position]. This is the state used in the MaskViT paper.),
'finger_sensors': Tensor(shape=(1,), dtype=float32, description=1x Sawyer gripper finger sensors.),
'high_bound': Tensor(shape=(5,), dtype=float32, description=High bound for end effector pose normalization. Consists of [3x Cartesian position, 1x gripper yaw, 1x gripper position].),
'image': Image(shape=(480, 480, 3), dtype=uint8, description=Main camera RGB observation.),
'low_bound': Tensor(shape=(5,), dtype=float32, description=Low bound for end effector pose normalization. Consists of [3x Cartesian position, 1x gripper yaw, 1x gripper position].),
'state': Tensor(shape=(15,), dtype=float32, description=Robot state, consists of [7x robot joint angles, 7x robot joint velocities,1x gripper position].),
}),
'reward': Scalar(shape=(), dtype=float32, description=Reward if provided, 1 on final step for demos.),
}),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
episode_metadata | FeaturesDict | |||
episode_metadata/file_path | Text | string | Path to the original data file. | |
steps | Dataset | |||
steps/action | Tensor | (5,) | float32 | Robot action, consists of [3x change in end effector position, 1x gripper yaw, 1x open/close gripper (-1 means to open the gripper, 1 means close)]. |
steps/discount | Scalar | float32 | Discount if provided, default to 1. | |
steps/is_first | Tensor | bool | ||
steps/is_last | Tensor | bool | ||
steps/is_terminal | Tensor | bool | ||
steps/language_embedding | Tensor | (512,) | float32 | Kona language embedding. See https://tfhub.dev/google/universal-sentence-encoder-large/5 |
steps/language_instruction | Text | string | Language Instruction. | |
steps/observation | FeaturesDict | |||
steps/observation/end_effector_pose | Tensor | (5,) | float32 | Robot end effector pose, consists of [3x Cartesian position, 1x gripper yaw, 1x gripper position]. This is the state used in the MaskViT paper. |
steps/observation/finger_sensors | Tensor | (1,) | float32 | 1x Sawyer gripper finger sensors. |
steps/observation/high_bound | Tensor | (5,) | float32 | High bound for end effector pose normalization. Consists of [3x Cartesian position, 1x gripper yaw, 1x gripper position]. |
steps/observation/image | Image | (480, 480, 3) | uint8 | Main camera RGB observation. |
steps/observation/low_bound | Tensor | (5,) | float32 | Low bound for end effector pose normalization. Consists of [3x Cartesian position, 1x gripper yaw, 1x gripper position]. |
steps/observation/state | Tensor | (15,) | float32 | Robot state, consists of [7x robot joint angles, 7x robot joint velocities,1x gripper position]. |
steps/reward | Scalar | float32 | Reward if provided, 1 on final step for demos. |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:
@inproceedings{gupta2022maskvit,
title={MaskViT: Masked Visual Pre-Training for Video Prediction},
author={Agrim Gupta and Stephen Tian and Yunzhi Zhang and Jiajun Wu and Roberto Martín-Martín and Li Fei-Fei},
booktitle={International Conference on Learning Representations},
year={2022}
}