TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

samsum

Description:

SAMSum Corpus contains over 16k chat dialogues with manually annotated summaries.

There are two features:

dialogue: text of dialogue.
summary: human written summary of the dialogue.
id: id of an example.
Additional Documentation: Explore on Papers With Code
Homepage: https://arxiv.org/src/1911.12237v2/anc
Source code: tfds.datasets.samsum.Builder
Versions:
- 1.0.0 (default): No release notes.
Download size: Unknown size
Dataset size: 10.71 MiB
Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
Download https://arxiv.org/src/1911.12237v2/anc/corpus.7z, decompress and place train.json, val.json and test.json in the manual follder.
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	819
`'train'`	14,732
`'validation'`	818

Feature structure:

FeaturesDict({
    'dialogue': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'summary': Text(shape=(), dtype=string),
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
dialogue	Text	string
id	Text	string
summary	Text	string

Supervised keys (See as_supervised doc): ('dialogue', 'summary')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@article{gliwa2019samsum,
  title={SAMSum Corpus: A Human-annotated Dialogue Dataset for Abstractive Summarization},
  author={Gliwa, Bogdan and Mochol, Iwona and Biesek, Maciej and Wawer, Aleksander},
  journal={arXiv preprint arXiv:1911.12237},
  year={2019}
}