다트

설명 :

DART(DAta Record to Text generation)는 트리플 세트의 모든 사실을 다루는 문장 설명으로 주석이 달린 RDF 엔터티 관계를 포함합니다. DART는 WikiTableQuestions, WikiSQL, WebNLG 및 Cleaned E2E와 같은 기존 데이터 세트를 사용하여 구성되었습니다. WikiTableQuestions 및 WikiSQL의 테이블은 주제-술어-객체 트리플로 변환되었으며 해당 텍스트 주석은 주로 MTurk에서 수집되었습니다. E2E의 의미있는 표현도 트리플로 변환되고 해당 설명이 사용되었으며 변환할 수 없는 일부는 삭제되었습니다.

E2E 및 WebNLG의 데이터 세트 분할이 유지되고 WikiTableQuestions 및 WikiSQL의 경우 Jaccard 유사성이 동일한 세트(train/dev/tes)에서 유사한 테이블을 유지하는 데 사용됩니다.

이 데이터 세트는 표준화된 테이블 형식에 따라 구성됩니다.

추가 문서 : 코드가 있는 논문에서 탐색
홈페이지 : https://github.com/Yale-LILY/dart
소스 코드 : tfds.structured.dart.Dart
버전 :
- 0.1.0 (기본값): 릴리스 정보가 없습니다.
다운로드 크기 : 249.71 MiB
데이터 세트 크기 : 38.83 MiB
자동 캐시 ( 문서 ): 예
분할 :

나뉘다	예
`'test'`	12,552
`'train'`	62,659
`'validation'`	6,980

기능 구조 :

FeaturesDict({
    'input_text': FeaturesDict({
        'table': Sequence({
            'column_header': string,
            'content': string,
            'row_number': int16,
        }),
    }),
    'target_text': string,
})

기능 문서 :

특징	수업	D타입
	풍모Dict
input_text	풍모Dict
입력_텍스트/테이블	순서
input_text/테이블/column_header	텐서	끈
input_text/테이블/콘텐츠	텐서	끈
입력_텍스트/테이블/행_번호	텐서	정수16
target_text	텐서	끈

감독 키 ( as_supervised 문서 참조): ('input_text', 'target_text')
그림 ( tfds.show_examples ): 지원되지 않습니다.
예 ( tfds.as_dataframe ):

인용 :

@article{radev2020dart,
  title={DART: Open-Domain Structured Data Record to Text Generation},
  author={Dragomir Radev and Rui Zhang and Amrit Rau and Abhinand Sivaprasad and Chiachun Hsieh and Nazneen Fatema Rajani and Xiangru Tang and Aadit Vyas and Neha Verma and Pranav Krishna and Yangxiaokang Liu and Nadia Irwanto and Jessica Pan and Faiaz Rahman and Ahmad Zaidi and Murori Mutuma and Yasin Tarabar and Ankit Gupta and Tao Yu and Yi Chern Tan and Xi Victoria Lin and Caiming Xiong and Richard Socher},
  journal={arXiv preprint arXiv:2007.02871},
  year={2020}

다트 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

다트