리브리트

설명 :

LibriTTS는 Heiga Zen이 Google Speech 및 Google Brain 팀원의 도움을 받아 준비한 24kHz 샘플링 속도로 약 585시간 동안 읽은 영어 음성의 다중 화자 영어 코퍼스입니다. LibriTTS 코퍼스는 TTS 연구를 위해 설계되었습니다. LibriSpeech 말뭉치의 원본 자료(LibriVox의 mp3 오디오 파일 및 Project Gutenberg의 텍스트 파일)에서 파생됩니다. LibriSpeech 코퍼스와의 주요 차이점은 다음과 같습니다.

오디오 파일은 24kHz 샘플링 속도입니다.
문장이 끊어지면 음성이 분리됩니다.
원본과 정규화된 텍스트가 모두 포함되어 있습니다.
컨텍스트 정보(예: 인접 문장)를 추출할 수 있습니다.
상당한 배경 소음이 있는 발화는 제외됩니다.

추가 문서 : 코드가 있는 논문에서 탐색
홈페이지 : http://www.openslr.org/60
소스 코드 : tfds.datasets.libritts.Builder
버전 :
- 1.0.1 (기본값): 릴리스 정보가 없습니다.
다운로드 크기 : 78.42 GiB
데이터세트 크기 : 271.41 GiB
자동 캐시 ( 문서 ): 아니요
분할 :

나뉘다	예
`'dev_clean'`	5,736
`'dev_other'`	4,613
`'test_clean'`	4,837
`'test_other'`	5,120
`'train_clean100'`	33,236
`'train_clean360'`	116,500
`'train_other500'`	205,044

기능 구조 :

FeaturesDict({
    'chapter_id': int64,
    'id': string,
    'speaker_id': int64,
    'speech': Audio(shape=(None,), dtype=int64),
    'text_normalized': Text(shape=(), dtype=string),
    'text_original': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	모양	D타입
	풍모Dict
chapter_id	텐서		int64
ID	텐서		끈
speaker_id	텐서		int64
연설	오디오	(없음,)	int64
text_normalized	텍스트		끈
text_original	텍스트		끈

감독 키 ( as_supervised 문서 참조): ('text_normalized', 'speech')
그림 ( tfds.show_examples ): 지원되지 않습니다.
예 ( tfds.as_dataframe ):

인용 :

@inproceedings{zen2019libritts,
  title = {LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech},
  author = {H. Zen and V. Dang and R. Clark and Y. Zhang and R. J. Weiss and Y. Jia and Z. Chen and Y. Wu},
  booktitle = {Proc. Interspeech},
  month = sep,
  year = {2019},
  doi = {10.21437/Interspeech.2019-2441},
}