vctk

설명 :

이 CSTR VCTK 말뭉치에는 다양한 억양을 가진 110명의 영어 사용자가 발화한 음성 데이터가 포함되어 있습니다. 각 화자는 신문, 무지개 구절, 억양 아카이브에 사용된 도출 단락에서 선택한 약 400개의 문장을 읽습니다.

하드 디스크 오류로 인해 'p315' 텍스트가 손실되었습니다.

추가 문서 : 코드가 있는 논문에서 탐색
홈페이지 : https://doi.org/10.7488/ds/2645
소스 코드 : tfds.audio.Vctk
버전 :
- 1.0.0 : VCTK 릴리스 0.92.0.
- 1.0.1 (기본값): dtype=tf.int16으로 음성 데이터 유형을 수정합니다.
다운로드 크기 : 10.94 GiB
자동 캐시 ( 문서 ): 아니요
기능 구조 :

FeaturesDict({
    'accent': ClassLabel(shape=(), dtype=int64, num_classes=13),
    'gender': ClassLabel(shape=(), dtype=int64, num_classes=2),
    'id': string,
    'speaker': ClassLabel(shape=(), dtype=int64, num_classes=110),
    'speech': Audio(shape=(None,), dtype=int16),
    'text': Text(shape=(), dtype=string),
})

기능 문서 :

특징	수업	모양	D타입
	풍모Dict
악센트	클래스 레이블		int64
성별	클래스 레이블		int64
ID	텐서		끈
스피커	클래스 레이블		int64
연설	오디오	(없음,)	정수16
텍스트	텍스트		끈

감독 키 ( as_supervised 문서 참조): ('text', 'speech')
그림 ( tfds.show_examples ): 지원되지 않습니다.
인용 :

@misc{yamagishi2019vctk,
  author={Yamagishi, Junichi and Veaux, Christophe and MacDonald, Kirsten},
  title={ {CSTR VCTK Corpus}: English Multi-speaker Corpus for {CSTR} Voice Cloning Toolkit (version 0.92)},
  publisher={University of Edinburgh. The Centre for Speech Technology Research (CSTR)},
  year=2019,
  doi={10.7488/ds/2645},
}

vctk/mic1(기본 구성)

구성 설명 : 무지향성 마이크(DPA 4035)를 사용하여 녹음된 오디오. 매우 낮은 주파수의 노이즈가 포함되어 있습니다.
```
      This is the same audio released in previous versions of VCTK:
      https://doi.org/10.7488/ds/1994
```
데이터세트 크기 : 39.87 GiB
분할 :

나뉘다	예
`'train'`	44,455

예 ( tfds.as_dataframe ):

vctk/mic2

구성 설명 : 대역폭이 매우 넓은 소형 다이어프램 콘덴서 마이크(Sennheiser MKH 800)를 사용하여 녹음한 오디오.
```
      Two speakers, p280 and p315 had technical issues of the audio
      recordings using MKH 800.
```
데이터세트 크기 : 38.86 GiB
분할 :

나뉘다	예
`'train'`	43,873

예 ( tfds.as_dataframe ):