- Description:
This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.
Note that the 'p315' text was lost due to a hard disk error.
Additional Documentation: Explore on Papers With Code
Homepage: https://doi.org/10.7488/ds/2645
Source code:
tfds.audio.Vctk
Versions:
1.0.0
: VCTK release 0.92.0.1.0.1
(default): Fix speech data type with dtype=tf.int16.
Download size:
10.94 GiB
Auto-cached (documentation): No
Feature structure:
FeaturesDict({
'accent': ClassLabel(shape=(), dtype=int64, num_classes=13),
'gender': ClassLabel(shape=(), dtype=int64, num_classes=2),
'id': string,
'speaker': ClassLabel(shape=(), dtype=int64, num_classes=110),
'speech': Audio(shape=(None,), dtype=int16),
'text': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
accent | ClassLabel | int64 | ||
gender | ClassLabel | int64 | ||
id | Tensor | string | ||
speaker | ClassLabel | int64 | ||
speech | Audio | (None,) | int16 | |
text | Text | string |
Supervised keys (See
as_supervised
doc):('text', 'speech')
Figure (tfds.show_examples): Not supported.
Citation:
@misc{yamagishi2019vctk,
author={Yamagishi, Junichi and Veaux, Christophe and MacDonald, Kirsten},
title={ {CSTR VCTK Corpus}: English Multi-speaker Corpus for {CSTR} Voice Cloning Toolkit (version 0.92)},
publisher={University of Edinburgh. The Centre for Speech Technology Research (CSTR)},
year=2019,
doi={10.7488/ds/2645},
}
vctk/mic1 (default config)
Config description: Audio recorded using an omni-directional microphone (DPA 4035). Contains very low frequency noises.
This is the same audio released in previous versions of VCTK: https://doi.org/10.7488/ds/1994
Dataset size:
39.87 GiB
Splits:
Split | Examples |
---|---|
'train' |
44,455 |
- Examples (tfds.as_dataframe):
vctk/mic2
Config description: Audio recorded using a small diaphragm condenser microphone with very wide bandwidth (Sennheiser MKH 800).
Two speakers, p280 and p315 had technical issues of the audio recordings using MKH 800.
Dataset size:
38.86 GiB
Splits:
Split | Examples |
---|---|
'train' |
43,873 |
- Examples (tfds.as_dataframe):