irc_disentanglement

説明:

IRC Disentanglement データセットには、Ubuntu IRC チャネルからの 77,563 を超えるメッセージが含まれています。

機能には、メッセージ ID、メッセージテキスト、およびタイムスタンプが含まれます。ターゲットは、現在のメッセージが返信するメッセージのリストです。各レコードには、1 日の IRC チャットからのメッセージのリストが含まれています。

追加のドキュメント:コードを使用したペーパーの探索
ホームページ: https://jkk.name/irc-disentanglement
ソースコード: tfds.datasets.irc_disentanglement.Builder
バージョン:
- 2.0.0 (デフォルト): リリースノートはありません。
ダウンロードサイズ: 113.53 MiB
データセットサイズ: 26.59 MiB
自動キャッシュ(ドキュメント): はい
スプリット:

スプリット	例
`'test'`	10
`'train'`	153
`'validation'`	10

機能構造:

FeaturesDict({
    'day': Sequence({
        'id': Text(shape=(), dtype=string),
        'parents': Sequence(Text(shape=(), dtype=string)),
        'text': Text(shape=(), dtype=string),
        'timestamp': Text(shape=(), dtype=string),
    }),
})

機能のドキュメント:

特徴	クラス	形	Dtype
	特徴辞書
日	順序
日/ID	文章		ストリング
日/両親	シーケンス(テキスト)	（なし、）	ストリング
日/テキスト	文章		ストリング
日付/タイムスタンプ	文章		ストリング

監視されたキー( as_supervised docを参照): None
図( tfds.show_examples ): サポートされていません。
例( tfds.as_dataframe ):

引用：

@InProceedings{acl19disentangle,
  author    = {Jonathan K. Kummerfeld and Sai R. Gouravajhala and Joseph Peper and Vignesh Athreya and Chulaka Gunasekara and Jatin Ganhotra and Siva Sankalp Patel and Lazaros Polymenakos and Walter S. Lasecki},
  title     = {A Large-Scale Corpus for Conversation Disentanglement},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  location  = {Florence, Italy},
  month     = {July},
  year      = {2019},
  doi       = {10.18653/v1/P19-1374},
  pages     = {3846--3856},
  url       = {https://aclweb.org/anthology/papers/P/P19/P19-1374/},
  arxiv     = {https://arxiv.org/abs/1810.11118},
  software  = {https://jkk.name/irc-disentanglement},
  data      = {https://jkk.name/irc-disentanglement},
}

irc_disentanglement コレクションでコンテンツを整理 必要に応じて、コンテンツの保存と分類を行います。

irc_disentanglement