maniskill_dataset_converted_externally_to_rlds

Описание :

Имитация Франки, выполняющей различные манипуляционные задачи.

Домашняя страница : https://github.com/haosulab/ManiSkill2 .
Исходный код : tfds.robotics.rtx.ManiskillDatasetConvertedExternallyToRlds
Версии :
- 0.1.0 (по умолчанию): первоначальный выпуск.
Размер загрузки : Unknown size
Размер набора данных : 151.05 GiB
Автокэширование ( документация ): Нет
Расколы :

Расколоть	Примеры
`'train'`	30 213

Структура функции :

FeaturesDict({
    'episode_metadata': FeaturesDict({
        'episode_id': Text(shape=(), dtype=string),
        'file_path': Text(shape=(), dtype=string),
    }),
    'steps': Dataset({
        'action': Tensor(shape=(7,), dtype=float32, description=Robot action, consists of [3x end effector delta target position, 3x end effector delta target orientation in axis-angle format, 1x gripper target position (mimic for two fingers)]. For delta target position, an action of -1 maps to a robot movement of -0.1m, and action of 1 maps to a movement of 0.1m. For delta target orientation, its encoded angle is mapped to a range of [-0.1rad, 0.1rad] for robot execution. For example, an action of [1, 0, 0] means rotating along the x-axis by 0.1 rad. For gripper target position, an action of -1 means close, and an action of 1 means open.),
        'discount': Scalar(shape=(), dtype=float32, description=Discount if provided, default to 1.),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'language_embedding': Tensor(shape=(512,), dtype=float32, description=Kona language embedding. See https://tfhub.dev/google/universal-sentence-encoder-large/5),
        'language_instruction': Text(shape=(), dtype=string),
        'observation': FeaturesDict({
            'base_pose': Tensor(shape=(7,), dtype=float32, description=Robot base pose in the world frame, consists of [x, y, z, qw, qx, qy, qz]. The first three dimensions represent xyz positions in meters. The last four dimensions are the quaternion representation of rotation.),
            'depth': Image(shape=(256, 256, 1), dtype=uint16, description=Main camera Depth observation. Divide the depth value by 2**10 to get the depth in meters.),
            'image': Image(shape=(256, 256, 3), dtype=uint8, description=Main camera RGB observation.),
            'main_camera_cam2world_gl': Tensor(shape=(4, 4), dtype=float32, description=Transformation from the main camera frame to the world frame in OpenGL/Blender convention.),
            'main_camera_extrinsic_cv': Tensor(shape=(4, 4), dtype=float32, description=Main camera extrinsic matrix in OpenCV convention.),
            'main_camera_intrinsic_cv': Tensor(shape=(3, 3), dtype=float32, description=Main camera intrinsic matrix in OpenCV convention.),
            'state': Tensor(shape=(18,), dtype=float32, description=Robot state, consists of [7x robot joint angles, 2x gripper position, 7x robot joint angle velocity, 2x gripper velocity]. Angle in radians, position in meters.),
            'target_object_or_part_final_pose': Tensor(shape=(7,), dtype=float32, description=The final pose towards which the target object or object part needs be manipulated, consists of [x, y, z, qw, qx, qy, qz]. The pose is represented in the world frame. An episode is considered successful if the target object or object part is manipulated to this pose.),
            'target_object_or_part_final_pose_valid': Tensor(shape=(7,), dtype=uint8, description=Whether each dimension of target_object_or_part_final_pose is valid in an environment. 1 = valid; 0 = invalid (in which case one should ignore the corresponding dimensions in target_object_or_part_final_pose). "Invalid" means that there is no success check on the final pose of target object or object part in the corresponding dimensions.),
            'target_object_or_part_initial_pose': Tensor(shape=(7,), dtype=float32, description=The initial pose of the target object or object part to be manipulated, consists of [x, y, z, qw, qx, qy, qz]. The pose is represented in the world frame. This variable is used to specify the target object or object part when multiple objects or object parts are present in an environment),
            'target_object_or_part_initial_pose_valid': Tensor(shape=(7,), dtype=uint8, description=Whether each dimension of target_object_or_part_initial_pose is valid in an environment. 1 = valid; 0 = invalid (in which case one should ignore the corresponding dimensions in target_object_or_part_initial_pose).),
            'tcp_pose': Tensor(shape=(7,), dtype=float32, description=Robot tool-center-point pose in the world frame, consists of [x, y, z, qw, qx, qy, qz]. Tool-center-point is the center between the two gripper fingers.),
            'wrist_camera_cam2world_gl': Tensor(shape=(4, 4), dtype=float32, description=Transformation from the wrist camera frame to the world frame in OpenGL/Blender convention.),
            'wrist_camera_extrinsic_cv': Tensor(shape=(4, 4), dtype=float32, description=Wrist camera extrinsic matrix in OpenCV convention.),
            'wrist_camera_intrinsic_cv': Tensor(shape=(3, 3), dtype=float32, description=Wrist camera intrinsic matrix in OpenCV convention.),
            'wrist_depth': Image(shape=(256, 256, 1), dtype=uint16, description=Wrist camera Depth observation. Divide the depth value by 2**10 to get the depth in meters.),
            'wrist_image': Image(shape=(256, 256, 3), dtype=uint8, description=Wrist camera RGB observation.),
        }),
        'reward': Scalar(shape=(), dtype=float32, description=Reward if provided, 1 on final step for demos.),
    }),
})

Функциональная документация :

Особенность	Сорт	Форма	Дтип	Описание
	ВозможностиDict
эпизод_метаданные	ВозможностиDict
Episode_metadata/episode_id	Текст		нить	Идентификатор эпизода.
метаданные_эпизода/путь_к файлу	Текст		нить	Путь к исходному файлу данных.
шаги	Набор данных
шаги/действия	Тензор	(7,)	float32	Действие робота состоит из [3x целевого положения дельты концевого эффектора, 3x целевой ориентации дельты концевого эффектора в формате «ось-угол», 1x целевого положения захвата (имитация для двух пальцев)]. Для дельта-целевого положения действие -1 соответствует движению робота -0,1 м, а действие 1 соответствует движению 0,1 м. Для дельта-ориентации цели его закодированный угол отображается в диапазоне [-0,1рад, 0,1рад] для выполнения робота. Например, действие [1, 0, 0] означает вращение вдоль оси x на 0,1 рад. Для целевого положения захвата действие -1 означает закрытие, а действие 1 означает открытие.
шаги/скидка	Скаляр		float32	Скидка, если она предусмотрена, по умолчанию равна 1.
шаги/is_first	Тензор		логическое значение
шаги/is_last	Тензор		логическое значение
шаги/is_terminal	Тензор		логическое значение
шаги/language_embedding	Тензор	(512,)	float32	Встраивание языка Kona. См. https://tfhub.dev/google/universal-sentence-encoder-large/5 .
шаги/language_instruction	Текст		нить	Языковое обучение.
шаги/наблюдение	ВозможностиDict
шаги/наблюдение/base_pose	Тензор	(7,)	float32	Базовая поза робота в мировой системе координат состоит из [x, y, z, qw, qx, qy, qz]. Первые три измерения представляют позиции xyz в метрах. Последние четыре измерения представляют собой кватернионное представление вращения.
шаги/наблюдение/глубина	Изображение	(256, 256, 1)	uint16	Основная камера Наблюдение за глубиной. Разделите значение глубины на 2**10, чтобы получить глубину в метрах.
шаги/наблюдение/изображение	Изображение	(256, 256, 3)	uint8	Основная камера наблюдения RGB.
шаги/наблюдение/main_camera_cam2world_gl	Тензор	(4, 4)	float32	Преобразование кадра основной камеры в кадр мира в соглашении OpenGL/Blender.
шаги/наблюдение/main_camera_extrinsic_cv	Тензор	(4, 4)	float32	Внешняя матрица основной камеры в соглашении OpenCV.
шаги/наблюдение/main_camera_intrinsic_cv	Тензор	(3, 3)	float32	Внутренняя матрица основной камеры в соглашении OpenCV.
шаги/наблюдение/состояние	Тензор	(18,)	float32	Состояние робота состоит из [7 углов шарниров робота, 2 положения захвата, 7 угловых скоростей шарниров робота, 2 скорости захвата]. Угол в радианах, положение в метрах.
шаги/наблюдение/target_object_or_part_final_pose	Тензор	(7,)	float32	Конечная поза, к которой необходимо манипулировать целевым объектом или частью объекта, состоит из [x, y, z, qw, qx, qy, qz]. Поза представлена в мировой рамке. Эпизод считается успешным, если целевой объект или часть объекта манипулируют этой позой.
шаги/наблюдение/target_object_or_part_final_pose_valid	Тензор	(7,)	uint8	Допустимо ли каждое измерение target_object_or_part_final_pose в среде. 1 = действительный; 0 = недействительно (в этом случае следует игнорировать соответствующие размеры в target_object_or_part_final_pose). «Недействительный» означает, что проверка окончательного положения целевого объекта или части объекта в соответствующих размерах не прошла успешно.
шаги/наблюдение/target_object_or_part_initial_pose	Тензор	(7,)	float32	Начальная поза целевого объекта или части объекта, которой нужно манипулировать, состоит из [x, y, z, qw, qx, qy, qz]. Поза представлена в мировой рамке. Эта переменная используется для указания целевого объекта или части объекта, когда в среде присутствует несколько объектов или частей объекта.
шаги/наблюдение/target_object_or_part_initial_pose_valid	Тензор	(7,)	uint8	Допустимо ли каждое измерение target_object_or_part_initial_pose в среде. 1 = действительный; 0 = недействительно (в этом случае следует игнорировать соответствующие размеры в target_object_or_part_initial_pose).
шаги/наблюдение/tcp_pose	Тензор	(7,)	float32	Поза центральной точки инструмента робота в мировой системе координат состоит из [x, y, z, qw, qx, qy, qz]. Центр инструмента — это центр между двумя пальцами захвата.
шаги/наблюдение/wrist_camera_cam2world_gl	Тензор	(4, 4)	float32	Преобразование кадра наручной камеры в кадр мира в соглашении OpenGL/Blender.
шаги/наблюдение/wrist_camera_extrinsic_cv	Тензор	(4, 4)	float32	Внешняя матрица наручной камеры в соглашении OpenCV.
шаги/наблюдение/wrist_camera_intrinsic_cv	Тензор	(3, 3)	float32	Внутренняя матрица наручной камеры в соглашении OpenCV.
шаги/наблюдение/wrist_глубина	Изображение	(256, 256, 1)	uint16	Наручная камера. Наблюдение за глубиной. Разделите значение глубины на 2**10, чтобы получить глубину в метрах.
шаги/наблюдение/wrist_image	Изображение	(256, 256, 3)	uint8	Наручная камера наблюдения RGB.
шаги/награда	Скаляр		float32	Награда, если предусмотрена, 1 на последнем этапе демоверсий.

Контролируемые ключи (см. документ as_supervised ): None
Рисунок ( tfds.show_examples ): не поддерживается.
Примеры ( tfds.as_dataframe ):

Цитата :

@inproceedings{gu2023maniskill2,
  title={ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills},
  author={Gu, Jiayuan and Xiang, Fanbo and Li, Xuanlin and Ling, Zhan and Liu, Xiqiang and Mu, Tongzhou and Tang, Yihe and Tao, Stone and Wei, Xinyue and Yao, Yunchao and Yuan, Xiaodi and Xie, Pengwei and Huang, Zhiao and Chen, Rui and Su, Hao},
  booktitle={International Conference on Learning Representations},
  year={2023}
}