- Description:
DART (DAta Record to Text generation) contains RDF entity-relation annotated with sentence descriptions that cover all facts in the triple set. DART was constructed using existing datasets such as: WikiTableQuestions, WikiSQL, WebNLG and Cleaned E2E. The tables from WikiTableQuestions and WikiSQL were transformed to subject-predicate-object triples, and its text annotations were mainly collected from MTurk. The meaningful representations in E2E were also transformed to triples and its descriptions were used, some that couldn't be transformed were dropped.
The dataset splits of E2E and WebNLG are kept, and for the WikiTableQuestions and WikiSQL the Jaccard similarity is used to keep similar tables in the same set (train/dev/tes).
This dataset is constructed following a standarized table format.
Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/Yale-LILY/dart
Source code:
tfds.structured.dart.Dart
Versions:
0.1.0
(default): No release notes.
Download size:
249.71 MiB
Dataset size:
38.83 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'test' |
12,552 |
'train' |
62,659 |
'validation' |
6,980 |
- Feature structure:
FeaturesDict({
'input_text': FeaturesDict({
'table': Sequence({
'column_header': string,
'content': string,
'row_number': int16,
}),
}),
'target_text': string,
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
input_text | FeaturesDict | |||
input_text/table | Sequence | |||
input_text/table/column_header | Tensor | string | ||
input_text/table/content | Tensor | string | ||
input_text/table/row_number | Tensor | int16 | ||
target_text | Tensor | string |
Supervised keys (See
as_supervised
doc):('input_text', 'target_text')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@article{radev2020dart,
title={DART: Open-Domain Structured Data Record to Text Generation},
author={Dragomir Radev and Rui Zhang and Amrit Rau and Abhinand Sivaprasad and Chiachun Hsieh and Nazneen Fatema Rajani and Xiangru Tang and Aadit Vyas and Neha Verma and Pranav Krishna and Yangxiaokang Liu and Nadia Irwanto and Jessica Pan and Faiaz Rahman and Ahmad Zaidi and Murori Mutuma and Yasin Tarabar and Ankit Gupta and Tao Yu and Yi Chern Tan and Xi Victoria Lin and Caiming Xiong and Richard Socher},
journal={arXiv preprint arXiv:2007.02871},
year={2020}