maniskill_dataset_converted_externally_to_rlds

説明：

さまざまな操作タスクを実行するシミュレートされたフランカ

ホームページ: https://github.com/haosulab/ManiSkill2
ソースコード: tfds.robotics.rtx.ManiskillDatasetConvertedExternallyToRlds
バージョン:
- 0.1.0 (デフォルト): 初期リリース。
ダウンロードサイズ: Unknown size
データセットのサイズ: 151.05 GiB
自動キャッシュ(ドキュメント): いいえ
分割:

スプリット	例
`'train'`	30,213

機能の構造:

FeaturesDict({
    'episode_metadata': FeaturesDict({
        'episode_id': Text(shape=(), dtype=string),
        'file_path': Text(shape=(), dtype=string),
    }),
    'steps': Dataset({
        'action': Tensor(shape=(7,), dtype=float32, description=Robot action, consists of [3x end effector delta target position, 3x end effector delta target orientation in axis-angle format, 1x gripper target position (mimic for two fingers)]. For delta target position, an action of -1 maps to a robot movement of -0.1m, and action of 1 maps to a movement of 0.1m. For delta target orientation, its encoded angle is mapped to a range of [-0.1rad, 0.1rad] for robot execution. For example, an action of [1, 0, 0] means rotating along the x-axis by 0.1 rad. For gripper target position, an action of -1 means close, and an action of 1 means open.),
        'discount': Scalar(shape=(), dtype=float32, description=Discount if provided, default to 1.),
        'is_first': bool,
        'is_last': bool,
        'is_terminal': bool,
        'language_embedding': Tensor(shape=(512,), dtype=float32, description=Kona language embedding. See https://tfhub.dev/google/universal-sentence-encoder-large/5),
        'language_instruction': Text(shape=(), dtype=string),
        'observation': FeaturesDict({
            'base_pose': Tensor(shape=(7,), dtype=float32, description=Robot base pose in the world frame, consists of [x, y, z, qw, qx, qy, qz]. The first three dimensions represent xyz positions in meters. The last four dimensions are the quaternion representation of rotation.),
            'depth': Image(shape=(256, 256, 1), dtype=uint16, description=Main camera Depth observation. Divide the depth value by 2**10 to get the depth in meters.),
            'image': Image(shape=(256, 256, 3), dtype=uint8, description=Main camera RGB observation.),
            'main_camera_cam2world_gl': Tensor(shape=(4, 4), dtype=float32, description=Transformation from the main camera frame to the world frame in OpenGL/Blender convention.),
            'main_camera_extrinsic_cv': Tensor(shape=(4, 4), dtype=float32, description=Main camera extrinsic matrix in OpenCV convention.),
            'main_camera_intrinsic_cv': Tensor(shape=(3, 3), dtype=float32, description=Main camera intrinsic matrix in OpenCV convention.),
            'state': Tensor(shape=(18,), dtype=float32, description=Robot state, consists of [7x robot joint angles, 2x gripper position, 7x robot joint angle velocity, 2x gripper velocity]. Angle in radians, position in meters.),
            'target_object_or_part_final_pose': Tensor(shape=(7,), dtype=float32, description=The final pose towards which the target object or object part needs be manipulated, consists of [x, y, z, qw, qx, qy, qz]. The pose is represented in the world frame. An episode is considered successful if the target object or object part is manipulated to this pose.),
            'target_object_or_part_final_pose_valid': Tensor(shape=(7,), dtype=uint8, description=Whether each dimension of target_object_or_part_final_pose is valid in an environment. 1 = valid; 0 = invalid (in which case one should ignore the corresponding dimensions in target_object_or_part_final_pose). "Invalid" means that there is no success check on the final pose of target object or object part in the corresponding dimensions.),
            'target_object_or_part_initial_pose': Tensor(shape=(7,), dtype=float32, description=The initial pose of the target object or object part to be manipulated, consists of [x, y, z, qw, qx, qy, qz]. The pose is represented in the world frame. This variable is used to specify the target object or object part when multiple objects or object parts are present in an environment),
            'target_object_or_part_initial_pose_valid': Tensor(shape=(7,), dtype=uint8, description=Whether each dimension of target_object_or_part_initial_pose is valid in an environment. 1 = valid; 0 = invalid (in which case one should ignore the corresponding dimensions in target_object_or_part_initial_pose).),
            'tcp_pose': Tensor(shape=(7,), dtype=float32, description=Robot tool-center-point pose in the world frame, consists of [x, y, z, qw, qx, qy, qz]. Tool-center-point is the center between the two gripper fingers.),
            'wrist_camera_cam2world_gl': Tensor(shape=(4, 4), dtype=float32, description=Transformation from the wrist camera frame to the world frame in OpenGL/Blender convention.),
            'wrist_camera_extrinsic_cv': Tensor(shape=(4, 4), dtype=float32, description=Wrist camera extrinsic matrix in OpenCV convention.),
            'wrist_camera_intrinsic_cv': Tensor(shape=(3, 3), dtype=float32, description=Wrist camera intrinsic matrix in OpenCV convention.),
            'wrist_depth': Image(shape=(256, 256, 1), dtype=uint16, description=Wrist camera Depth observation. Divide the depth value by 2**10 to get the depth in meters.),
            'wrist_image': Image(shape=(256, 256, 3), dtype=uint8, description=Wrist camera RGB observation.),
        }),
        'reward': Scalar(shape=(), dtype=float32, description=Reward if provided, 1 on final step for demos.),
    }),
})

機能ドキュメント:

特徴	クラス	形	Dタイプ	説明
	特徴辞書
エピソード_メタデータ	特徴辞書
エピソードメタデータ/エピソードID	文章		弦	エピソードID。
エピソードメタデータ/ファイルパス	文章		弦	元のデータファイルへのパス。
ステップ	データセット
ステップ/アクション	テンソル	(7,)	float32	ロボットアクションは、[3x エンドエフェクターデルタターゲット位置、3x 軸角度形式のエンドエフェクターデルタターゲット方向、1x グリッパーターゲット位置 (2 本の指の模倣)] で構成されます。デルタターゲット位置の場合、アクション -1 はロボットの -0.1 m の動きにマッピングされ、アクション 1 は 0.1 m の動きにマッピングされます。デルタターゲット方向の場合、そのエンコードされた角度は、ロボットの実行のために [-0.1rad, 0.1rad] の範囲にマッピングされます。たとえば、[1, 0, 0] のアクションは、x 軸に沿って 0.1 ラジアン回転することを意味します。グリッパーのターゲット位置の場合、-1 のアクションは閉じることを意味し、1 のアクションは開くことを意味します。
歩数/割引	スカラー		float32	割引が指定されている場合、デフォルトは 1 です。
ステップ/is_first	テンソル		ブール
ステップ/is_last	テンソル		ブール
ステップ/is_terminal	テンソル		ブール
ステップ/言語_埋め込み	テンソル	(512,)	float32	コナ言語の埋め込み。 https://tfhub.dev/google/universal-sentence-encoder-large/5を参照してください。
ステップ/言語説明	文章		弦	言語指導。
ステップ/観察	特徴辞書
ステップ/観察/base_pose	テンソル	(7,)	float32	ワールドフレーム内のロボットの基本ポーズは、[x, y, z, qw, qx, qy, qz] で構成されます。最初の 3 つの次元は、xyz 位置をメートル単位で表します。最後の 4 つの次元は、回転の四元数表現です。
歩数/観察/深さ	画像	(256, 256, 1)	uint16	メインカメラ深度観測。深度の値を 2**10 で割ると、メートル単位の深度が得られます。
手順・観察・イメージ	画像	(256, 256, 3)	uint8	メインカメラRGB観察。
ステップ/観察/main_camera_cam2world_gl	テンソル	(4, 4)	float32	OpenGL/Blender の規約におけるメインカメラフレームからワールドフレームへの変換。
ステップ/観察/main_camera_extrinsic_cv	テンソル	(4, 4)	float32	OpenCV 規約におけるメインカメラの外部マトリックス。
ステップ/観察/main_camera_intrinsic_cv	テンソル	(3, 3)	float32	OpenCV 規約におけるメインカメラの組み込み行列。
ステップ/観察/状態	テンソル	(18,)	float32	ロボットの状態は、[7x ロボットジョイント角度、2x グリッパー位置、7x ロボットジョイント角度速度、2x グリッパー速度] で構成されます。角度はラジアン、位置はメートルです。
ステップ/観察/target_object_or_part_final_pose	テンソル	(7,)	float32	ターゲットオブジェクトまたはオブジェクトパーツを操作する必要がある最終ポーズは、[x、y、z、qw、qx、qy、qz] で構成されます。ポーズはワールドフレームで表現されます。ターゲットオブジェクトまたはオブジェクトの一部がこのポーズに操作された場合、エピソードは成功したとみなされます。
ステップ/観察/target_object_or_part_final_pose_valid	テンソル	(7,)	uint8	target_object_or_part_final_pose の各ディメンションが環境内で有効かどうか。 1 = 有効。 0 = 無効 (この場合、target_object_or_part_final_pose の対応する寸法を無視する必要があります)。「無効」は、対応する次元のターゲットオブジェクトまたはオブジェクトパーツの最終ポーズの成功チェックが行われていないことを意味します。
ステップ/観察/target_object_or_part_initial_pose	テンソル	(7,)	float32	操作対象のオブジェクトまたはオブジェクト部分の初期姿勢は、[x, y, z, qw, qx, qy, qz] で構成されます。ポーズはワールドフレームで表現されます。この変数は、環境内に複数のオブジェクトまたはオブジェクト部分が存在する場合に、ターゲットオブジェクトまたはオブジェクト部分を指定するために使用されます。
ステップ/観察/target_object_or_part_initial_pose_valid	テンソル	(7,)	uint8	target_object_or_part_initial_pose の各次元が環境内で有効かどうか。 1 = 有効。 0 = 無効 (この場合、target_object_or_part_initial_pose の対応する寸法を無視する必要があります)。
ステップ/観察/tcp_pose	テンソル	(7,)	float32	ワールドフレーム内のロボットツールの中心点のポーズは、[x, y, z, qw, qx, qy, qz] で構成されます。ツール中心点は、2 つのグリッパーフィンガの間の中心です。
歩数/観察/wrist_camera_cam2world_gl	テンソル	(4, 4)	float32	OpenGL/Blender 規約におけるリストカメラフレームからワールドフレームへの変換。
ステップ/観察/wrist_camera_extrinsic_cv	テンソル	(4, 4)	float32	OpenCV 規約のリストカメラ外部行列。
ステップ/観察/wrist_camera_intrinsic_cv	テンソル	(3, 3)	float32	OpenCV 規約のリストカメラの組み込み行列。
歩数/観察/手首の深さ	画像	(256, 256, 1)	uint16	リストカメラ深度観察。深度の値を 2**10 で割ると、メートル単位の深度が得られます。
歩数/観察/手首画像	画像	(256, 256, 3)	uint8	リストカメラRGB観察。
歩数/報酬	スカラー		float32	提供されている場合は報酬、デモの最終ステップで 1。

監視キー( as_supervised docを参照): None
図( tfds.show_examples ): サポートされていません。
例( tfds.as_dataframe ):

引用：

@inproceedings{gu2023maniskill2,
  title={ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills},
  author={Gu, Jiayuan and Xiang, Fanbo and Li, Xuanlin and Ling, Zhan and Liu, Xiqiang and Mu, Tongzhou and Tang, Yihe and Tao, Stone and Wei, Xinyue and Yao, Yunchao and Yuan, Xiaodi and Xie, Pengwei and Huang, Zhiao and Chen, Rui and Su, Hao},
  booktitle={International Conference on Learning Representations},
  year={2023}
}