TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

dart

Description:

DART (DAta Record to Text generation) contains RDF entity-relation annotated with sentence descriptions that cover all facts in the triple set. DART was constructed using existing datasets such as: WikiTableQuestions, WikiSQL, WebNLG and Cleaned E2E. The tables from WikiTableQuestions and WikiSQL were transformed to subject-predicate-object triples, and its text annotations were mainly collected from MTurk. The meaningful representations in E2E were also transformed to triples and its descriptions were used, some that couldn't be transformed were dropped.

The dataset splits of E2E and WebNLG are kept, and for the WikiTableQuestions and WikiSQL the Jaccard similarity is used to keep similar tables in the same set (train/dev/tes).

This dataset is constructed following a standarized table format.

Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/Yale-LILY/dart
Source code: tfds.structured.dart.Dart
Versions:
- 0.1.0 (default): No release notes.
Download size: 249.71 MiB
Dataset size: 38.83 MiB
Auto-cached (documentation): Yes
Splits:

Split	Examples
`'test'`	12,552
`'train'`	62,659
`'validation'`	6,980

Feature structure:

FeaturesDict({
    'input_text': FeaturesDict({
        'table': Sequence({
            'column_header': string,
            'content': string,
            'row_number': int16,
        }),
    }),
    'target_text': string,
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
input_text	FeaturesDict
input_text/table	Sequence
input_text/table/column_header	Tensor	string
input_text/table/content	Tensor	string
input_text/table/row_number	Tensor	int16
target_text	Tensor	string

Supervised keys (See as_supervised doc): ('input_text', 'target_text')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):

Citation:

@article{radev2020dart,
  title={DART: Open-Domain Structured Data Record to Text Generation},
  author={Dragomir Radev and Rui Zhang and Amrit Rau and Abhinand Sivaprasad and Chiachun Hsieh and Nazneen Fatema Rajani and Xiangru Tang and Aadit Vyas and Neha Verma and Pranav Krishna and Yangxiaokang Liu and Nadia Irwanto and Jessica Pan and Faiaz Rahman and Ahmad Zaidi and Murori Mutuma and Yasin Tarabar and Ankit Gupta and Tao Yu and Yi Chern Tan and Xi Victoria Lin and Caiming Xiong and Richard Socher},
  journal={arXiv preprint arXiv:2007.02871},
  year={2020}