TFDS artık Kruvasan 🥐 formatını destekliyor! Daha fazlasını öğrenmek için belgeleri okuyun.

Bu sayfa, Cloud Translation API ile çevrilmiştir.

çok dilli_librispeech

Referanslar:

cila

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:multilingual_librispeech/polish')

Tanım :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

Lisans : Bilinen lisans yok
Sürüm : 2.1.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	520
`'train'`	25043
`'train.1h'`	238
`'train.9h'`	2173
`'validation'`	512

Özellikler :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

Almanca

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:multilingual_librispeech/german')

Tanım :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

Lisans : Bilinen lisans yok
Sürüm : 2.1.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	3394
`'train'`	469942
`'train.1h'`	241
`'train.9h'`	2194
`'validation'`	3469

Özellikler :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

Flemenkçe

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:multilingual_librispeech/dutch')

Tanım :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

Lisans : Bilinen lisans yok
Sürüm : 2.1.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	3075
`'train'`	374287
`'train.1h'`	234
`'train.9h'`	2153
`'validation'`	3095

Özellikler :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

Fransızca

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:multilingual_librispeech/french')

Tanım :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

Lisans : Bilinen lisans yok
Sürüm : 2.1.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	2426
`'train'`	258213
`'train.1h'`	241
`'train.9h'`	2167
`'validation'`	2416

Özellikler :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

İspanyol

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:multilingual_librispeech/spanish')

Tanım :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

Lisans : Bilinen lisans yok
Sürüm : 2.1.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	2385
`'train'`	220701
`'train.1h'`	233
`'train.9h'`	2110
`'validation'`	2408

Özellikler :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

İtalyan

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:multilingual_librispeech/italian')

Tanım :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

Lisans : Bilinen lisans yok
Sürüm : 2.1.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	1262
`'train'`	59623
`'train.1h'`	240
`'train.9h'`	2173
`'validation'`	1248

Özellikler :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

Portekizce

Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:

ds = tfds.load('huggingface:multilingual_librispeech/portuguese')

Tanım :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

Lisans : Bilinen lisans yok
Sürüm : 2.1.0
Bölünmeler :

Bölmek	Örnekler
`'test'`	871
`'train'`	37533
`'train.1h'`	236
`'train.9h'`	2116
`'validation'`	826

Özellikler :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}