TFDS はCroissant 🥐 形式をサポートするようになりました。詳細については、ドキュメントをお読みください。

このページは Cloud Translation API によって翻訳されました。

youtube_vis

説明:

Youtube-vis は、ビデオインスタンスセグメンテーションデータセットです。 2,883 本の高解像度 YouTube ビデオ、人物、動物、車両などの 40 個の一般的なオブジェクトを含むピクセルごとのカテゴリラベルセット、4,883 個の固有のビデオインスタンス、および 131,000 個の高品質の手動注釈が含まれています。

YouTube-VIS データセットは、2,238 のトレーニングビデオ、302 の検証ビデオ、および 343 のテストビデオに分割されます。

前処理中に削除または変更されたファイルはありません。

追加のドキュメント:コードを使用したペーパーの探索
ホームページ: https://youtube-vos.org/dataset/vis/
ソースコード: tfds.video.youtube_vis.YoutubeVis
バージョン:
- 1.0.0 (デフォルト): 初期リリース。
ダウンロードサイズ: サイズUnknown size
手動ダウンロードの手順: このデータセットでは、ソースデータを手動でdownload_config.manual_dir (デフォルトは~/tensorflow_datasets/downloads/manual/ ) にダウンロードする必要があります。
2019 バージョンのデータセット (test_all_frames.zip、test.json、train_all_frames.zip、train.json、valid_all_frames.zip、valid.json) のすべてのファイルを youtube-vis Web サイトからダウンロードし、~/tensorflow_datasets/ に移動してください。ダウンロード/マニュアル/.

データセットのランディングページはhttps://youtube-vos.org/dataset/vis/ にあり、2019 バージョンをダウンロードできるhttps://competitions.codalab.orgのページにリダイレクトされます。データセットの。データをダウンロードするには、codalab でアカウントを作成する必要があります。これを書いている時点では、codalab にアクセスするときに「接続が安全ではありません」という警告をバイパスする必要があることに注意してください。

自動キャッシュ(ドキュメント): いいえ
監視されたキー( as_supervised docを参照): None
図( tfds.show_examples ): サポートされていません。
引用：

@article{DBLP:journals/corr/abs-1905-04804,
  author    = {Linjie Yang and
               Yuchen Fan and
               Ning Xu},
  title     = {Video Instance Segmentation},
  journal   = {CoRR},
  volume    = {abs/1905.04804},
  year      = {2019},
  url       = {http://arxiv.org/abs/1905.04804},
  archivePrefix = {arXiv},
  eprint    = {1905.04804},
  timestamp = {Tue, 28 May 2019 12:48:08 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1905-04804.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

youtube_vis/full (デフォルト設定)

構成の説明: データセットのフル解像度バージョンで、ラベルのないものを含むすべてのフレームが含まれています。
データセットサイズ: 33.31 GiB
スプリット:

スプリット	例
`'test'`	343
`'train'`	2,238
`'validation'`	302

機能構造:

FeaturesDict({
    'metadata': FeaturesDict({
        'height': int32,
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'areas': Sequence(float32),
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=40),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'segmentations': Video(Image(shape=(None, None, 1), dtype=uint8)),
    }),
    'video': Video(Image(shape=(None, None, 3), dtype=uint8)),
})

機能のドキュメント:

特徴	クラス	形	Dtype
	特徴辞書
メタデータ	特徴辞書
メタデータ/高さ	テンソル		int32
メタデータ/num_frames	テンソル		int32
メタデータ/ビデオ名	テンソル		ストリング
メタデータ/幅	テンソル		int32
トラック	順序
トラック/エリア	シーケンス(テンソル)	（なし、）	float32
トラック/bbox	シーケンス(BBoxFeature)	(なし、4)	float32
トラック/カテゴリ	クラスラベル		int64
トラック/フレーム	シーケンス(テンソル)	（なし、）	int32
トラック/is_crowd	テンソル		ブール
トラック/セグメンテーション	動画(画像)	(なし、なし、なし、1)	uint8
ビデオ	動画(画像)	(なし、なし、なし、3)	uint8

例( tfds.as_dataframe ):

youtube_vis/480_640_full

構成の説明: すべての画像は、すべてのフレームを含めてバイリニアで 480 X 640 にサイズ変更されます。
データセットサイズ: 130.02 GiB
スプリット:

スプリット	例
`'test'`	343
`'train'`	2,238
`'validation'`	302

機能構造:

FeaturesDict({
    'metadata': FeaturesDict({
        'height': int32,
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'areas': Sequence(float32),
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=40),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'segmentations': Video(Image(shape=(480, 640, 1), dtype=uint8)),
    }),
    'video': Video(Image(shape=(480, 640, 3), dtype=uint8)),
})

機能のドキュメント:

特徴	クラス	形	Dtype
	特徴辞書
メタデータ	特徴辞書
メタデータ/高さ	テンソル		int32
メタデータ/num_frames	テンソル		int32
メタデータ/ビデオ名	テンソル		ストリング
メタデータ/幅	テンソル		int32
トラック	順序
トラック/エリア	シーケンス(テンソル)	（なし、）	float32
トラック/bbox	シーケンス(BBoxFeature)	(なし、4)	float32
トラック/カテゴリ	クラスラベル		int64
トラック/フレーム	シーケンス(テンソル)	（なし、）	int32
トラック/is_crowd	テンソル		ブール
トラック/セグメンテーション	動画(画像)	(なし、480、640、1)	uint8
ビデオ	動画(画像)	(なし、480、640、3)	uint8

例( tfds.as_dataframe ):

youtube_vis/480_640_only_frames_with_labels

構成の説明: すべての画像は、ラベル付きのフレームのみを含む 480 X 640 にバイリニアでサイズ変更されます。
データセットサイズ: 26.27 GiB
スプリット:

スプリット	例
`'test'`	343
`'train'`	2,238
`'validation'`	302

機能構造:

FeaturesDict({
    'metadata': FeaturesDict({
        'height': int32,
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'areas': Sequence(float32),
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=40),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'segmentations': Video(Image(shape=(480, 640, 1), dtype=uint8)),
    }),
    'video': Video(Image(shape=(480, 640, 3), dtype=uint8)),
})

機能のドキュメント:

特徴	クラス	形	Dtype
	特徴辞書
メタデータ	特徴辞書
メタデータ/高さ	テンソル		int32
メタデータ/num_frames	テンソル		int32
メタデータ/ビデオ名	テンソル		ストリング
メタデータ/幅	テンソル		int32
トラック	順序
トラック/エリア	シーケンス(テンソル)	（なし、）	float32
トラック/bbox	シーケンス(BBoxFeature)	(なし、4)	float32
トラック/カテゴリ	クラスラベル		int64
トラック/フレーム	シーケンス(テンソル)	（なし、）	int32
トラック/is_crowd	テンソル		ブール
トラック/セグメンテーション	動画(画像)	(なし、480、640、1)	uint8
ビデオ	動画(画像)	(なし、480、640、3)	uint8

例( tfds.as_dataframe ):

youtube_vis/only_frames_with_labels

構成の説明: ネイティブ解像度でラベルが含まれている画像のみ。
データセットサイズ: 6.91 GiB
スプリット:

スプリット	例
`'test'`	343
`'train'`	2,238
`'validation'`	302

機能構造:

FeaturesDict({
    'metadata': FeaturesDict({
        'height': int32,
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'areas': Sequence(float32),
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=40),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'segmentations': Video(Image(shape=(None, None, 1), dtype=uint8)),
    }),
    'video': Video(Image(shape=(None, None, 3), dtype=uint8)),
})

機能のドキュメント:

特徴	クラス	形	Dtype
	特徴辞書
メタデータ	特徴辞書
メタデータ/高さ	テンソル		int32
メタデータ/num_frames	テンソル		int32
メタデータ/ビデオ名	テンソル		ストリング
メタデータ/幅	テンソル		int32
トラック	順序
トラック/エリア	シーケンス(テンソル)	（なし、）	float32
トラック/bbox	シーケンス(BBoxFeature)	(なし、4)	float32
トラック/カテゴリ	クラスラベル		int64
トラック/フレーム	シーケンス(テンソル)	（なし、）	int32
トラック/is_crowd	テンソル		ブール
トラック/セグメンテーション	動画(画像)	(なし、なし、なし、1)	uint8
ビデオ	動画(画像)	(なし、なし、なし、3)	uint8

例( tfds.as_dataframe ):

youtube_vis/full_train_split

構成の説明: データセットのフル解像度バージョンで、ラベルのないものを含むすべてのフレームが含まれています。 val と test の分割は、トレーニングデータから作成されます。
データセットサイズ: 26.09 GiB
スプリット:

スプリット	例
`'test'`	200
`'train'`	1,838
`'validation'`	200

機能構造:

FeaturesDict({
    'metadata': FeaturesDict({
        'height': int32,
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'areas': Sequence(float32),
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=40),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'segmentations': Video(Image(shape=(None, None, 1), dtype=uint8)),
    }),
    'video': Video(Image(shape=(None, None, 3), dtype=uint8)),
})

機能のドキュメント:

特徴	クラス	形	Dtype
	特徴辞書
メタデータ	特徴辞書
メタデータ/高さ	テンソル		int32
メタデータ/num_frames	テンソル		int32
メタデータ/ビデオ名	テンソル		ストリング
メタデータ/幅	テンソル		int32
トラック	順序
トラック/エリア	シーケンス(テンソル)	（なし、）	float32
トラック/bbox	シーケンス(BBoxFeature)	(なし、4)	float32
トラック/カテゴリ	クラスラベル		int64
トラック/フレーム	シーケンス(テンソル)	（なし、）	int32
トラック/is_crowd	テンソル		ブール
トラック/セグメンテーション	動画(画像)	(なし、なし、なし、1)	uint8
ビデオ	動画(画像)	(なし、なし、なし、3)	uint8

例( tfds.as_dataframe ):

youtube_vis/480_640_full_train_split

構成の説明: すべての画像は、すべてのフレームを含めてバイリニアで 480 X 640 にサイズ変更されます。 val と test の分割は、トレーニングデータから作成されます。
データセットサイズ: 101.57 GiB
スプリット:

スプリット	例
`'test'`	200
`'train'`	1,838
`'validation'`	200

機能構造:

FeaturesDict({
    'metadata': FeaturesDict({
        'height': int32,
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'areas': Sequence(float32),
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=40),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'segmentations': Video(Image(shape=(480, 640, 1), dtype=uint8)),
    }),
    'video': Video(Image(shape=(480, 640, 3), dtype=uint8)),
})

機能のドキュメント:

特徴	クラス	形	Dtype
	特徴辞書
メタデータ	特徴辞書
メタデータ/高さ	テンソル		int32
メタデータ/num_frames	テンソル		int32
メタデータ/ビデオ名	テンソル		ストリング
メタデータ/幅	テンソル		int32
トラック	順序
トラック/エリア	シーケンス(テンソル)	（なし、）	float32
トラック/bbox	シーケンス(BBoxFeature)	(なし、4)	float32
トラック/カテゴリ	クラスラベル		int64
トラック/フレーム	シーケンス(テンソル)	（なし、）	int32
トラック/is_crowd	テンソル		ブール
トラック/セグメンテーション	動画(画像)	(なし、480、640、1)	uint8
ビデオ	動画(画像)	(なし、480、640、3)	uint8

例( tfds.as_dataframe ):

youtube_vis/480_640_only_frames_with_labels_train_split

構成の説明: すべての画像は、ラベル付きのフレームのみを含む 480 X 640 にバイリニアでサイズ変更されます。 val と test の分割は、トレーニングデータから作成されます。
データセットサイズ: 20.55 GiB
スプリット:

スプリット	例
`'test'`	200
`'train'`	1,838
`'validation'`	200

機能構造:

FeaturesDict({
    'metadata': FeaturesDict({
        'height': int32,
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'areas': Sequence(float32),
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=40),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'segmentations': Video(Image(shape=(480, 640, 1), dtype=uint8)),
    }),
    'video': Video(Image(shape=(480, 640, 3), dtype=uint8)),
})

機能のドキュメント:

特徴	クラス	形	Dtype
	特徴辞書
メタデータ	特徴辞書
メタデータ/高さ	テンソル		int32
メタデータ/num_frames	テンソル		int32
メタデータ/ビデオ名	テンソル		ストリング
メタデータ/幅	テンソル		int32
トラック	順序
トラック/エリア	シーケンス(テンソル)	（なし、）	float32
トラック/bbox	シーケンス(BBoxFeature)	(なし、4)	float32
トラック/カテゴリ	クラスラベル		int64
トラック/フレーム	シーケンス(テンソル)	（なし、）	int32
トラック/is_crowd	テンソル		ブール
トラック/セグメンテーション	動画(画像)	(なし、480、640、1)	uint8
ビデオ	動画(画像)	(なし、480、640、3)	uint8

例( tfds.as_dataframe ):

youtube_vis/only_frames_with_labels_train_split

構成の説明: ネイティブ解像度でラベルが含まれている画像のみ。 val と test の分割は、トレーニングデータから作成されます。
データセットサイズ: 5.46 GiB
スプリット:

スプリット	例
`'test'`	200
`'train'`	1,838
`'validation'`	200

機能構造:

FeaturesDict({
    'metadata': FeaturesDict({
        'height': int32,
        'num_frames': int32,
        'video_name': string,
        'width': int32,
    }),
    'tracks': Sequence({
        'areas': Sequence(float32),
        'bboxes': Sequence(BBoxFeature(shape=(4,), dtype=float32)),
        'category': ClassLabel(shape=(), dtype=int64, num_classes=40),
        'frames': Sequence(int32),
        'is_crowd': bool,
        'segmentations': Video(Image(shape=(None, None, 1), dtype=uint8)),
    }),
    'video': Video(Image(shape=(None, None, 3), dtype=uint8)),
})

機能のドキュメント:

特徴	クラス	形	Dtype
	特徴辞書
メタデータ	特徴辞書
メタデータ/高さ	テンソル		int32
メタデータ/num_frames	テンソル		int32
メタデータ/ビデオ名	テンソル		ストリング
メタデータ/幅	テンソル		int32
トラック	順序
トラック/エリア	シーケンス(テンソル)	（なし、）	float32
トラック/bbox	シーケンス(BBoxFeature)	(なし、4)	float32
トラック/カテゴリ	クラスラベル		int64
トラック/フレーム	シーケンス(テンソル)	（なし、）	int32
トラック/is_crowd	テンソル		ブール
トラック/セグメンテーション	動画(画像)	(なし、なし、なし、1)	uint8
ビデオ	動画(画像)	(なし、なし、なし、3)	uint8

例( tfds.as_dataframe ):