TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

asset

Description:

ASSET is a dataset for evaluating Sentence Simplification systems with multiple rewriting transformations, as described in "ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations." The corpus is composed of 2000 validation and 359 test original sentences that were each simplified 10 times by different annotators. The corpus also contains human judgments of meaning preservation, fluency and simplicity for the outputs of several automatic text simplification systems.

Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/facebookresearch/asset
Source code: tfds.datasets.asset.Builder
Versions:
- 1.0.0 (default): Initial release.
Download size: 3.47 MiB
Auto-cached (documentation): Yes
Supervised keys (See as_supervised doc): None
Figure (tfds.show_examples): Not supported.
Citation:

@inproceedings{alva-manchego-etal-2020-asset,
    title = "{ASSET}: {A} Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations",
    author = "Alva-Manchego, Fernando  and
      Martin, Louis  and
      Bordes, Antoine  and
      Scarton, Carolina  and
      Sagot, Benoit  and
      Specia, Lucia",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.424",
    pages = "4668--4679",
}

asset/simplification (default config)

Config description: A set of original sentences aligned with 10 possible simplifications for each.
Dataset size: 2.64 MiB
Splits:

Split	Examples
`'test'`	359
`'validation'`	2,000

Feature structure:

FeaturesDict({
    'original': Text(shape=(), dtype=string),
    'simplifications': Sequence(Text(shape=(), dtype=string)),
})

Feature documentation:

Feature	Class	Shape	Dtype
	FeaturesDict
original	Text		string
simplifications	Sequence(Text)	(None,)	string

Examples (tfds.as_dataframe):

asset/ratings

Config description: Human ratings of automatically produced text simplification.
Dataset size: 1.44 MiB
Splits:

Split	Examples
`'full'`	4,500

Feature structure:

FeaturesDict({
    'aspect': ClassLabel(shape=(), dtype=int64, num_classes=3),
    'original': Text(shape=(), dtype=string),
    'original_sentence_id': int32,
    'rating': int32,
    'simplification': Text(shape=(), dtype=string),
    'worker_id': int32,
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
aspect	ClassLabel	int64
original	Text	string
original_sentence_id	Tensor	int32
rating	Tensor	int32
simplification	Text	string
worker_id	Tensor	int32

Examples (tfds.as_dataframe):

asset Stay organized with collections Save and categorize content based on your preferences.

asset/simplification (default config)

asset/ratings

asset