サムスクリーン

説明:

SummScreen 要約データセット、非匿名化、非トークン化バージョン。

トレーニング/検証/テストの分割とフィルタリングは最終的なトークン化されたデータセットに基づいていますが、提供されるトランスクリプトと要約はトークン化されていないテキストに基づいています。

次の 2 つの機能があります。

トランスクリプト: 完全なエピソードのトランスクリプト、改行で区切られた台詞の各行
recap: エピソードの要約または要約
ホームページ: https://github.com/mingdachen/SummScreen
ソースコード: tfds.datasets.summscreen.Builder
バージョン:
- 1.0.0 (デフォルト): 初期リリース。
ダウンロードサイズ: 841.27 MiB
監視されたキー( as_supervised docを参照): ('transcript', 'recap')
図( tfds.show_examples ): サポートされていません。
引用：

@article{DBLP:journals/corr/abs-2104-07091,
  author    = {Mingda Chen and
               Zewei Chu and
               Sam Wiseman and
               Kevin Gimpel},
  title     = {SummScreen: {A} Dataset for Abstractive Screenplay Summarization},
  journal   = {CoRR},
  volume    = {abs/2104.07091},
  year      = {2021},
  url       = {https://arxiv.org/abs/2104.07091},
  archivePrefix = {arXiv},
  eprint    = {2104.07091},
  timestamp = {Mon, 19 Apr 2021 16:45:47 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-07091.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

summscreen/fd (デフォルト設定)

構成の説明: ForeverDreaming
データセットのサイズ: 132.99 MiB
自動キャッシュ(ドキュメント): はい
スプリット:

スプリット	例
`'test'`	337
`'train'`	3,673
`'validation'`	338

機能構造:

FeaturesDict({
    'episode_number': Text(shape=(), dtype=string),
    'episode_title': Text(shape=(), dtype=string),
    'recap': Text(shape=(), dtype=string),
    'show_title': Text(shape=(), dtype=string),
    'transcript': Text(shape=(), dtype=string),
    'transcript_author': Text(shape=(), dtype=string),
})

機能のドキュメント:

特徴	クラス	Dtype
	特徴辞書
エピソード番号	文章	ストリング
episode_title	文章	ストリング
要約	文章	ストリング
show_title	文章	ストリング
トランスクリプト	文章	ストリング
トランスクリプト_著者	文章	ストリング

例( tfds.as_dataframe ):

サムスクリーン/tms

構成の説明: TVMegaSite
データセットサイズ: 592.53 MiB
自動キャッシュ(ドキュメント): いいえ
スプリット:

スプリット	例
`'test'`	1,793
`'train'`	18,915
`'validation'`	1,795

機能構造:

FeaturesDict({
    'episode_summary': Text(shape=(), dtype=string),
    'recap': Text(shape=(), dtype=string),
    'recap_author': Text(shape=(), dtype=string),
    'show_title': Text(shape=(), dtype=string),
    'transcript': Text(shape=(), dtype=string),
    'transcript_author': Tensor(shape=(None,), dtype=string),
})

機能のドキュメント:

特徴	クラス	形	Dtype
	特徴辞書
episode_summary	文章		ストリング
要約	文章		ストリング
recap_author	文章		ストリング
show_title	文章		ストリング
トランスクリプト	文章		ストリング
トランスクリプト_著者	テンソル	（なし、）	ストリング

例( tfds.as_dataframe ):

サムスクリーン コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

summscreen/fd (デフォルト設定)

サムスクリーン/tms

サムスクリーン