タオ

説明：

TAO データセットは、2,907 の高解像度ビデオと 833 のオブジェクトカテゴリで構成される大規模なビデオオブジェクト検出データセットです。このデータセットを保存するには、少なくとも 300 GB の空き容量が必要であることに注意してください。

追加ドキュメント:コード付きの論文について調べる
ホームページ： https://taodataset.org/
ソースコード: tfds.video.tao.Tao
バージョン:
- 1.0.0 (デフォルト): リリースノートはありません。
- 1.1.0 : テスト分割を追加しました。
ダウンロードサイズ: 113.96 GiB
手動ダウンロード手順: このデータセットでは、ソースデータをdownload_config.manual_dirに手動でダウンロードする必要があります (デフォルトは~/tensorflow_datasets/downloads/manual/ )。
一部の TAO ファイル (HVACS および AVA ビデオ) は、MOT へのログインが必要なため、手動でダウンロードする必要があります。 https://motchallenge.net/tao_download.phpの指示に従ってこれらのデータをダウンロードしてください。

このデータをダウンロードし、結果の .zip ファイルを ~/tensorflow_datasets/downloads/manual/ に移動します。

手動ダウンロードが必要なデータが存在しない場合はスキップされ、手動ダウンロードが不要なデータのみが使用されます。

自動キャッシュ(ドキュメント): いいえ
分割:

スプリット	例
`'train'`	500
`'validation'`	988

監視キー( as_supervised doc を参照): None
図( tfds.show_examples ): サポートされていません。
引用：

@article{Dave_2020,
   title={TAO: A Large-Scale Benchmark for Tracking Any Object},
   ISBN={9783030585587},
   ISSN={1611-3349},
   url={http://dx.doi.org/10.1007/978-3-030-58558-7_26},
   DOI={10.1007/978-3-030-58558-7_26},
   journal={Lecture Notes in Computer Science},
   publisher={Springer International Publishing},
   author={Dave, Achal and Khurana, Tarasha and Tokmakov, Pavel and Schmid, Cordelia and Ramanan, Deva},
   year={2020},
   pages={436-454}
}

tao/480_640 (デフォルト設定)

構成の説明: すべての画像は 480 X 640 にバイリニアにサイズ変更されます。
データセットのサイズ: 482.30 GiB
機能の構造:

FeaturesDict({
    'metadata': FeaturesDict({
        'dataset': string,
        'height': int32,
        'neg_category_ids': Tensor(shape=(None,), dtype=int32),
        'not_exhaustive_category_ids': Tensor(shape=(None,), dtype=int32),
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=363),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'scale_category': string,
        'track_id': int32,
    }),
    'video': Video(Image(shape=(480, 640, 3), dtype=uint8)),
})

機能ドキュメント:

特徴	クラス	形	Dタイプ
	特徴辞書
メタデータ	特徴辞書
メタデータ/データセット	テンソル		弦
メタデータ/高さ	テンソル		int32
メタデータ/neg_category_ids	テンソル	（なし、）	int32
メタデータ/not_exhaustive_category_ids	テンソル	（なし、）	int32
メタデータ/フレーム数	テンソル		int32
メタデータ/ビデオ名	テンソル		弦
メタデータ/幅	テンソル		int32
トラック	順序
トラック/Bボックス	シーケンス(BBoxFeature)	(なし、4)	float32
トラック/カテゴリー	クラスラベル		int64
トラック/フレーム	シーケンス(テンソル)	（なし、）	int32
トラック/is_crowd	テンソル		ブール
トラック/スケールカテゴリー	テンソル		弦
トラック/track_id	テンソル		int32
ビデオ	動画(画像)	(なし、480、640、3)	uint8

例( tfds.as_dataframe ):

タオ/フル解像度

構成の説明: データセットのフル解像度バージョン。
データセットのサイズ: 171.24 GiB
機能の構造:

FeaturesDict({
    'metadata': FeaturesDict({
        'dataset': string,
        'height': int32,
        'neg_category_ids': Tensor(shape=(None,), dtype=int32),
        'not_exhaustive_category_ids': Tensor(shape=(None,), dtype=int32),
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=363),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'scale_category': string,
        'track_id': int32,
    }),
    'video': Video(Image(shape=(None, None, 3), dtype=uint8)),
})

機能ドキュメント:

特徴	クラス	形	Dタイプ
	特徴辞書
メタデータ	特徴辞書
メタデータ/データセット	テンソル		弦
メタデータ/高さ	テンソル		int32
メタデータ/neg_category_ids	テンソル	（なし、）	int32
メタデータ/not_exhaustive_category_ids	テンソル	（なし、）	int32
メタデータ/フレーム数	テンソル		int32
メタデータ/ビデオ名	テンソル		弦
メタデータ/幅	テンソル		int32
トラック	順序
トラック/Bボックス	シーケンス(BBoxFeature)	(なし、4)	float32
トラック/カテゴリー	クラスラベル		int64
トラック/フレーム	シーケンス(テンソル)	（なし、）	int32
トラック/is_crowd	テンソル		ブール
トラック/スケールカテゴリー	テンソル		弦
トラック/track_id	テンソル		int32
ビデオ	動画(画像)	(なし、なし、なし、3)	uint8

例( tfds.as_dataframe ):