doc_nli

설명 :

DocNLI는 문서 수준 자연어 추론(NLI)을 위한 대규모 데이터 세트입니다. DocNLI는 광범위한 NLP 문제에서 변형되었으며 여러 장르의 텍스트를 다룹니다. 전제는 항상 문서 입도에 머무르는 반면 가설은 단일 문장에서 수백 단어가 포함된 구절에 이르기까지 길이가 다양합니다. 기존의 일부 문장 수준 NLI 데이터 세트와 달리 DocNLI에는 상당히 제한된 아티팩트가 있습니다.

추가 문서 : 코드가 있는 논문에서 탐색
홈페이지 : https://github.com/salesforce/DocNLI/
소스 코드 : tfds.text.docnli.DocNLI
버전 :
- 1.0.0 (기본값): 최초 릴리스.
다운로드 크기 : 313.89 MiB
데이터세트 크기 : 3.07 GiB
자동 캐시 ( 문서 ): 아니요
분할 :

나뉘다	예
`'test'`	267,086
`'train'`	942,314
`'validation'`	234,258

기능 구조 :

FeaturesDict({
    'hypothesis': Text(shape=(), dtype=string),
    'label': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'premise': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	D타입
	풍모Dict
가설	텍스트	끈
상표	클래스 레이블	int64
전제	텍스트	끈

감독된 키 ( as_supervised 문서 참조): None
그림 ( tfds.show_examples ): 지원되지 않습니다.
예 ( tfds.as_dataframe ):

인용 :

@inproceedings{yin-etal-2021-docnli,
    title={DocNLI: A Large-scale Dataset for Document-level Natural Language Inference},
    author={Wenpeng Yin and Dragomir Radev and Caiming Xiong},
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
}