TFDS는 이제 Croissant 🥐 형식을 지원합니다! 자세한 내용은 설명서를 읽어보세요.

이 페이지는 Cloud Translation API를 통해 번역되었습니다.

gov_report

설명 :

정부 보고서 데이터 세트는 Congressional Research Service 및 US Government Accountability Office를 포함한 정부 연구 기관에서 작성한 보고서로 구성됩니다.

추가 문서 : 코드가 있는 논문에서 탐색
홈페이지 : https://gov-report-data.github.io/
소스코드 : tfds.summarization.gov_report.GovReport
버전 :
- 1.0.0 (기본값): 최초 릴리스.
다운로드 크기 : 320.59 MiB
자동 캐시 ( 문서 ): 아니요
그림 ( tfds.show_examples ): 지원되지 않습니다.
인용 :

@inproceedings{
anonymous2022efficiently,
title={Efficiently Modeling Long Sequences with Structured State Spaces},
author={Anonymous},
booktitle={Submitted to The Tenth International Conference on Learning Representations },
year={2022},
url={https://openreview.net/forum?id=uYLFoz1vlAC},
note={under review}
}

gov_report/crs_whitespace(기본 구성)

구성 설명 : 요약이 포함된 CRS 보고서입니다. 구조가 평평해지고 공백으로 결합됩니다. 원본 용지에서 사용하는 형식입니다.
데이터 세트 크기 : 349.76 MiB
분할 :

나뉘다	예
`'test'`	362
`'train'`	6,514
`'validation'`	362

기능 구조 :

FeaturesDict({
    'id': Text(shape=(), dtype=string),
    'released_date': Text(shape=(), dtype=string),
    'reports': Text(shape=(), dtype=string),
    'summary': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	D타입
	풍모Dict
ID	텍스트	끈
출시일	텍스트	끈
보고서	텍스트	끈
요약	텍스트	끈
제목	텍스트	끈

감독 키 ( as_supervised 문서 참조): ('reports', 'summary')
예 ( tfds.as_dataframe ):

gov_report/gao_whitespace

구성 설명 : 강조 구조가 있는 GAO 보고서가 평면화되고 공백으로 결합됩니다. 원본 용지에서 사용하는 형식입니다.
데이터 세트 크기 : 690.24 MiB
분할 :

나뉘다	예
`'test'`	611
`'train'`	11,005
`'validation'`	612

기능 구조 :

FeaturesDict({
    'fastfact': Text(shape=(), dtype=string),
    'highlight': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'published_date': Text(shape=(), dtype=string),
    'released_date': Text(shape=(), dtype=string),
    'report': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
    'url': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	D타입
	풍모Dict
패스트팩트	텍스트	끈
가장 밝은 부분	텍스트	끈
ID	텍스트	끈
게시된_날짜	텍스트	끈
출시일	텍스트	끈
보고서	텍스트	끈
제목	텍스트	끈
URL	텍스트	끈

감독된 키 ( as_supervised 문서 참조): ('report', 'highlight')
예 ( tfds.as_dataframe ):

gov_report/crs_html

구성 설명 : 요약이 포함된 CRS 보고서입니다. html 태그를 추가하는 동안 구조가 평면화되고 줄 바꿈으로 결합됩니다. 태그는 <h2>xxx<h2> 와 같은 형식의 secition_title에만 추가됩니다.
데이터 세트 크기 : 351.25 MiB
분할 :

나뉘다	예
`'test'`	362
`'train'`	6,514
`'validation'`	362

기능 구조 :

FeaturesDict({
    'id': Text(shape=(), dtype=string),
    'released_date': Text(shape=(), dtype=string),
    'reports': Text(shape=(), dtype=string),
    'summary': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	D타입
	풍모Dict
ID	텍스트	끈
출시일	텍스트	끈
보고서	텍스트	끈
요약	텍스트	끈
제목	텍스트	끈

감독 키 ( as_supervised 문서 참조): ('reports', 'summary')
예 ( tfds.as_dataframe ):

gov_report/gao_html

구성 설명 : 강조 표시 구조가 있는 GAO 보고서는 html 태그를 추가하는 동안 평면화되고 줄바꿈으로 결합됩니다. 태그는 <h2>xxx<h2> 와 같은 형식의 secition_title에만 추가됩니다.
데이터 세트 크기 : 692.72 MiB
분할 :

나뉘다	예
`'test'`	611
`'train'`	11,005
`'validation'`	612

기능 구조 :

FeaturesDict({
    'fastfact': Text(shape=(), dtype=string),
    'highlight': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'published_date': Text(shape=(), dtype=string),
    'released_date': Text(shape=(), dtype=string),
    'report': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
    'url': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	D타입
	풍모Dict
패스트팩트	텍스트	끈
가장 밝은 부분	텍스트	끈
ID	텍스트	끈
게시된_날짜	텍스트	끈
출시일	텍스트	끈
보고서	텍스트	끈
제목	텍스트	끈
URL	텍스트	끈

감독된 키 ( as_supervised 문서 참조): ('report', 'highlight')
예 ( tfds.as_dataframe ):

gov_report/crs_json

구성 설명 : 요약이 포함된 CRS 보고서입니다. 원시 json으로 표현되는 구조.
데이터 세트 크기 : 361.92 MiB
분할 :

나뉘다	예
`'test'`	362
`'train'`	6,514
`'validation'`	362

기능 구조 :

FeaturesDict({
    'id': Text(shape=(), dtype=string),
    'released_date': Text(shape=(), dtype=string),
    'reports': Text(shape=(), dtype=string),
    'summary': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	D타입
	풍모Dict
ID	텍스트	끈
출시일	텍스트	끈
보고서	텍스트	끈
요약	텍스트	끈
제목	텍스트	끈

감독 키 ( as_supervised 문서 참조): ('reports', 'summary')
예 ( tfds.as_dataframe ):

gov_report/gao_json

구성 설명 : 원시 json으로 표시되는 구조 강조 표시가 있는 GAO 보고서.
데이터 세트 크기 : 712.82 MiB
분할 :

나뉘다	예
`'test'`	611
`'train'`	11,005
`'validation'`	612

기능 구조 :

FeaturesDict({
    'fastfact': Text(shape=(), dtype=string),
    'highlight': Text(shape=(), dtype=string),
    'id': Text(shape=(), dtype=string),
    'published_date': Text(shape=(), dtype=string),
    'released_date': Text(shape=(), dtype=string),
    'report': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
    'url': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	D타입
	풍모Dict
패스트팩트	텍스트	끈
가장 밝은 부분	텍스트	끈
ID	텍스트	끈
게시된_날짜	텍스트	끈
출시일	텍스트	끈
보고서	텍스트	끈
제목	텍스트	끈
URL	텍스트	끈

감독된 키 ( as_supervised 문서 참조): ('report', 'highlight')
예 ( tfds.as_dataframe ):

gov_report 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.