domande_naturali

Descrizione :

Il corpus NQ contiene domande di utenti reali e richiede ai sistemi di QA di leggere e comprendere un intero articolo di Wikipedia che può contenere o meno la risposta alla domanda. L'inclusione di domande degli utenti reali e il requisito che le soluzioni debbano leggere un'intera pagina per trovare la risposta, fanno sì che NQ sia un'attività più realistica e impegnativa rispetto ai precedenti set di dati di QA.

Documentazione aggiuntiva : Esplora documenti con codice
Home page : https://ai.google.com/research/NaturalQuestions/dataset
Codice sorgente : tfds.datasets.natural_questions.Builder
Versioni :
- 0.0.2 : Nessuna nota di rilascio.
- 0.1.0 (impostazione predefinita): nessuna nota di rilascio.
Dimensioni del download : 41.97 GiB
Cache automatica ( documentazione ): No
Divisioni :

Diviso	Esempi
`'train'`	307.373
`'validation'`	7.830

Chiavi supervisionate (Vedi as_supervised doc ): None
Figura ( tfds.show_examples ): non supportato.
Citazione :

@article{47761,
title = {Natural Questions: a Benchmark for Question Answering Research},
author = {Tom Kwiatkowski and Jennimaria Palomaki and Olivia Redfield and Michael Collins and Ankur Parikh and Chris Alberti and Danielle Epstein and Illia Polosukhin and Matthew Kelcey and Jacob Devlin and Kenton Lee and Kristina N. Toutanova and Llion Jones and Ming-Wei Chang and Andrew Dai and Jakob Uszkoreit and Quoc Le and Slav Petrov},
year = {2019},
journal = {Transactions of the Association of Computational Linguistics}
}

domande_naturali/default (configurazione predefinita)

Descrizione della configurazione : default natural_questions config
Dimensione del set di dati: 90.26 GiB
Struttura delle caratteristiche :

FeaturesDict({
    'annotations': Sequence({
        'id': string,
        'long_answer': FeaturesDict({
            'end_byte': int64,
            'end_token': int64,
            'start_byte': int64,
            'start_token': int64,
        }),
        'short_answers': Sequence({
            'end_byte': int64,
            'end_token': int64,
            'start_byte': int64,
            'start_token': int64,
            'text': Text(shape=(), dtype=string),
        }),
        'yes_no_answer': ClassLabel(shape=(), dtype=int64, num_classes=2),
    }),
    'document': FeaturesDict({
        'html': Text(shape=(), dtype=string),
        'title': Text(shape=(), dtype=string),
        'tokens': Sequence({
            'is_html': bool,
            'token': Text(shape=(), dtype=string),
        }),
        'url': Text(shape=(), dtype=string),
    }),
    'id': string,
    'question': FeaturesDict({
        'text': Text(shape=(), dtype=string),
        'tokens': Sequence(string),
    }),
})

Documentazione delle funzionalità :

Caratteristica	Classe	Forma	Tipo D
	CaratteristicheDict
annotazioni	Sequenza
annotazioni/id	Tensore		corda
annotazioni/risposta_lunga	CaratteristicheDict
annotazioni/risposta_lunga/byte_end	Tensore		int64
annotations/long_answer/end_token	Tensore		int64
annotazioni/risposta_lunga/byte_iniziale	Tensore		int64
annotations/long_answer/start_token	Tensore		int64
annotazioni/risposte_brevi	Sequenza
annotazioni/risposte_brevi/end_byte	Tensore		int64
annotazioni/risposte_brevi/end_token	Tensore		int64
annotazioni/risposte_brevi/byte_iniziale	Tensore		int64
annotazioni/risposte_brevi/token_iniziale	Tensore		int64
annotazioni/risposte_brevi/testo	Testo		corda
annotazioni/sì_no_risposta	ClassLabel		int64
documento	CaratteristicheDict
documento/html	Testo		corda
titolo del documento	Testo		corda
documento/token	Sequenza
documento/token/is_html	Tensore		bool
documento/token/token	Testo		corda
documento/url	Testo		corda
id	Tensore		corda
domanda	CaratteristicheDict
domanda/testo	Testo		corda
domanda/gettoni	Sequenza (tensore)	(Nessuno,)	corda

Esempi ( tfds.as_dataframe ):

domande_naturali/longt5

Descrizione della configurazione : domande_naturali preelaborate come nel benchmark longT5
Dimensione del set di dati: 8.91 GiB
Struttura delle caratteristiche :

FeaturesDict({
    'all_answers': Sequence(Text(shape=(), dtype=string)),
    'answer': Text(shape=(), dtype=string),
    'context': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'question': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})

Documentazione delle funzionalità :

Caratteristica	Classe	Forma	Tipo D
	CaratteristicheDict
tutte_risposte	Sequenza(Testo)	(Nessuno,)	corda
risposta	Testo		corda
contesto	Testo		corda
id	Testo		corda
domanda	Testo		corda
titolo	Testo		corda

Esempi ( tfds.as_dataframe ):