- Description:
DocNLI is a large-scale dataset for document-level natural language inference (NLI). DocNLI is transformed from a broad range of NLP problems and covers multiple genres of text. The premises always stay in the document granularity, whereas the hypotheses vary in length from single sentences to passages with hundreds of words. In contrast to some existing sentence-level NLI datasets, DocNLI has pretty limited artifacts.
Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/salesforce/DocNLI/
Source code:
tfds.text.docnli.DocNLI
Versions:
1.0.0
(default): Initial release.
Download size:
313.89 MiB
Dataset size:
3.07 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'test' |
267,086 |
'train' |
942,314 |
'validation' |
234,258 |
- Feature structure:
FeaturesDict({
'hypothesis': Text(shape=(), dtype=string),
'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
'premise': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
hypothesis | Text | string | ||
label | ClassLabel | int64 | ||
premise | Text | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@inproceedings{yin-etal-2021-docnli,
title={DocNLI: A Large-scale Dataset for Document-level Natural Language Inference},
author={Wenpeng Yin and Dragomir Radev and Caiming Xiong},
booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
month = aug,
year = "2021",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
}