- Description:
An large scale dataset for speaker identification. This data is collected from over 1,251 speakers, with over 150k samples in total. This release contains the audio part of the voxceleb1.1 dataset.
Additional Documentation: Explore on Papers With Code
Homepage: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html
Source code:
tfds.audio.Voxceleb
Versions:
1.2.1
(default): Add youtube_id field
Download size:
4.68 MiB
Dataset size:
107.98 GiB
Manual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir
(defaults to~/tensorflow_datasets/downloads/manual/
):
manual_dir should contain the file vox_dev_wav.zip. The instructions for downloading this file are found in http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1.html This dataset requires registration.Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'test' |
7,972 |
'train' |
134,000 |
'validation' |
6,670 |
- Feature structure:
FeaturesDict({
'audio': Audio(shape=(None,), dtype=int64),
'label': ClassLabel(shape=(), dtype=int64, num_classes=1252),
'youtube_id': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
audio | Audio | (None,) | int64 | |
label | ClassLabel | int64 | ||
youtube_id | Text | string |
Supervised keys (See
as_supervised
doc):('audio', 'label')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@InProceedings{Nagrani17,
author = "Nagrani, A. and Chung, J.~S. and Zisserman, A.",
title = "VoxCeleb: a large-scale speaker identification dataset",
booktitle = "INTERSPEECH",
year = "2017",
}