salient_span_wikipedia

설명 :

레이블이 지정된 현저한 범위가 있는 Wikipedia 문장입니다.

홈페이지 : https://www.tensorflow.org/datasets/catalog/salient_span_wikipedia
소스 코드 : tfds.datasets.salient_span_wikipedia.Builder
버전 :
- 1.0.0 (기본값): 릴리스 정보가 없습니다.
다운로드 크기 : Unknown size
자동 캐시 ( 문서 ): 아니요
감독된 키 ( as_supervised 문서 참조): None
그림 ( tfds.show_examples ): 지원되지 않습니다.
인용 :

@article{guu2020realm,
    title={REALM: Retrieval-Augmented Language Model Pre-Training},
    author={Kelvin Guu and Kenton Lee and Zora Tung and Panupong Pasupat and Ming-Wei Chang},
    year={2020},
    journal = {arXiv e-prints},
    archivePrefix = {arXiv},
    eprint={2002.08909},
}

salient_span_wikipedia/sentences(기본 구성)

구성 설명 : 예는 엔터티를 포함하는 개별 문장입니다.
데이터세트 크기 : 20.57 GiB
분할 :

나뉘다	예
`'train'`	82,291,706

기능 구조 :

FeaturesDict({
    'spans': Sequence({
        'limit': int32,
        'start': int32,
        'type': string,
    }),
    'text': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	D타입
	풍모Dict
경간	순서
범위/제한	텐서	int32
스팬/시작	텐서	int32
스팬/유형	텐서	끈
텍스트	텍스트	끈
제목	텍스트	끈

예 ( tfds.as_dataframe ):

salient_span_wikipedia/문서

구성 설명 : 예는 전체 문서입니다.
데이터세트 크기 : 16.52 GiB
분할 :

나뉘다	예
`'train'`	13,353,718

기능 구조 :

FeaturesDict({
    'sentences': Sequence({
        'limit': int32,
        'start': int32,
    }),
    'spans': Sequence({
        'limit': int32,
        'start': int32,
        'type': string,
    }),
    'text': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	D타입
	풍모Dict
문장	순서
문장/제한	텐서	int32
문장/시작	텐서	int32
경간	순서
범위/제한	텐서	int32
스팬/시작	텐서	int32
스팬/유형	텐서	끈
텍스트	텍스트	끈
제목	텍스트	끈

예 ( tfds.as_dataframe ):

salient_span_wikipedia 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

salient_span_wikipedia/sentences(기본 구성)

salient_span_wikipedia/문서

salient_span_wikipedia