요약 화면

설명 :

SummScreen 요약 데이터 세트, 비익명화, 비토큰화 버전.

학습/평가/테스트 분할 및 필터링은 최종 토큰화된 데이터 세트를 기반으로 하지만 제공되는 기록 및 요약은 토큰화되지 않은 텍스트를 기반으로 합니다.

두 가지 기능이 있습니다.

대본: 전체 에피소드 대본, 줄바꿈으로 구분된 대화의 각 줄
recap: 에피소드 요약 또는 요약
홈페이지 : https://github.com/mingdachen/SummScreen
소스 코드 : tfds.datasets.summscreen.Builder
버전 :
- 1.0.0 (기본값): 최초 릴리스.
다운로드 크기 : 841.27 MiB
감독 키 ( as_supervised 문서 참조): ('transcript', 'recap')
그림 ( tfds.show_examples ): 지원되지 않습니다.
인용 :

@article{DBLP:journals/corr/abs-2104-07091,
  author    = {Mingda Chen and
               Zewei Chu and
               Sam Wiseman and
               Kevin Gimpel},
  title     = {SummScreen: {A} Dataset for Abstractive Screenplay Summarization},
  journal   = {CoRR},
  volume    = {abs/2104.07091},
  year      = {2021},
  url       = {https://arxiv.org/abs/2104.07091},
  archivePrefix = {arXiv},
  eprint    = {2104.07091},
  timestamp = {Mon, 19 Apr 2021 16:45:47 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-07091.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

summscreen/fd(기본 구성)

구성 설명 : ForeverDreaming
데이터 세트 크기 : 132.99 MiB
자동 캐시 ( 문서 ): 예
분할 :

나뉘다	예
`'test'`	337
`'train'`	3,673
`'validation'`	338

기능 구조 :

FeaturesDict({
    'episode_number': Text(shape=(), dtype=string),
    'episode_title': Text(shape=(), dtype=string),
    'recap': Text(shape=(), dtype=string),
    'show_title': Text(shape=(), dtype=string),
    'transcript': Text(shape=(), dtype=string),
    'transcript_author': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	D타입
	풍모Dict
episode_number	텍스트	끈
episode_title	텍스트	끈
요약	텍스트	끈
show_title	텍스트	끈
성적 증명서	텍스트	끈
transcript_author	텍스트	끈

예 ( tfds.as_dataframe ):

요약 화면/tms

구성 설명 : TVMegaSite
데이터 세트 크기 : 592.53 MiB
자동 캐시 ( 문서 ): 아니요
분할 :

나뉘다	예
`'test'`	1,793
`'train'`	18,915
`'validation'`	1,795

기능 구조 :

FeaturesDict({
    'episode_summary': Text(shape=(), dtype=string),
    'recap': Text(shape=(), dtype=string),
    'recap_author': Text(shape=(), dtype=string),
    'show_title': Text(shape=(), dtype=string),
    'transcript': Text(shape=(), dtype=string),
    'transcript_author': Tensor(shape=(None,), dtype=string),
})

기능 문서 :

특징	수업	모양	D타입
	풍모Dict
episode_summary	텍스트		끈
요약	텍스트		끈
recap_author	텍스트		끈
show_title	텍스트		끈
성적 증명서	텍스트		끈
transcript_author	텐서	(없음,)	끈

예 ( tfds.as_dataframe ):

요약 화면 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

summscreen/fd(기본 구성)

요약 화면/tms

요약 화면