asqa

  • Description:

ASQA is the first long-form question answering dataset that focuses on ambiguous factoid questions. Different from previous long-form answers datasets, each question is annotated with both long-form answers and extractive question-answer pairs, which should be answerable by the generated passage. A generated long-form answer will be evaluated using both ROUGE and QA accuracy. We showed that these evaluation metrics correlated with human judgment well. In this repostory we release the ASQA dataset, together with the evaluation code: <a href="https://github.com/google-research/language/tree/master/language/asqa">https://github.com/google-research/language/tree/master/language/asqa</a>

Split Examples
'dev' 948
'train' 4,353
  • Feature structure:
FeaturesDict({
    'ambiguous_question': Text(shape=(), dtype=string),
    'annotations': Sequence({
        'knowledge': Sequence({
            'content': Text(shape=(), dtype=string),
            'wikipage': Text(shape=(), dtype=string),
        }),
        'long_answer': Text(shape=(), dtype=string),
    }),
    'qa_pairs': Sequence({
        'context': Text(shape=(), dtype=string),
        'question': Text(shape=(), dtype=string),
        'short_answers': Sequence(Text(shape=(), dtype=string)),
        'wikipage': Text(shape=(), dtype=string),
    }),
    'sample_id': int64,
    'wikipages': Sequence({
        'title': Text(shape=(), dtype=string),
        'url': Text(shape=(), dtype=string),
    }),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
ambiguous_question Text string Disambiguated question from AmbigQA.
annotations Sequence Long-form answers to the ambiguous question constructed by ASQA annotators.
annotations/knowledge Sequence List of additional knowledge pieces.
annotations/knowledge/content Text string A passage from Wikipedia.
annotations/knowledge/wikipage Text string Title of the Wikipedia page the passage was taken from.
annotations/long_answer Text string Annotation.
qa_pairs Sequence Q&A pairs from AmbigQA which are used for disambiguation.
qa_pairs/context Text string Additional context provided.
qa_pairs/question Text string
qa_pairs/short_answers Sequence(Text) (None,) string List of short answers from AmbigQA.
qa_pairs/wikipage Text string Title of the Wikipedia page the additional context was taken from.
sample_id Tensor int64
wikipages Sequence List of Wikipedia pages visited by AmbigQA annotators.
wikipages/title Text string Title of the Wikipedia page.
wikipages/url Text string Link to the Wikipedia page.
  • Citation:
@misc{https://doi.org/10.48550/arxiv.2204.06092,
doi = {10.48550/ARXIV.2204.06092},
url = {https://arxiv.org/abs/2204.06092},
author = {Stelmakh, Ivan and Luan, Yi and Dhingra, Bhuwan and Chang, Ming-Wei},
keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
title = {ASQA: Factoid Questions Meet Long-Form Answers},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}