يدعم TFDS الآن تنسيق الكرواسون 🥐 ! اقرأ الوثائق لمعرفة المزيد.

تمت ترجمة هذه الصفحة بواسطة Cloud Translation API‏.

multilingual_librispeech

مراجع:

بولندي

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:multilingual_librispeech/polish')

وصف :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

الترخيص : لا يوجد ترخيص معروف
الإصدار : 2.1.0
الإنشقاقات :

ينقسم	أمثلة
`'test'`	520
`'train'`	25043
`'train.1h'`	238
`'train.9h'`	2173
`'validation'`	512

سمات :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ألمانية

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:multilingual_librispeech/german')

وصف :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

الترخيص : لا يوجد ترخيص معروف
الإصدار : 2.1.0
الإنشقاقات :

ينقسم	أمثلة
`'test'`	3394
`'train'`	469942
`'train.1h'`	241
`'train.9h'`	2194
`'validation'`	3469

سمات :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

هولندي

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:multilingual_librispeech/dutch')

وصف :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

الترخيص : لا يوجد ترخيص معروف
الإصدار : 2.1.0
الإنشقاقات :

ينقسم	أمثلة
`'test'`	3075
`'train'`	374287
`'train.1h'`	234
`'train.9h'`	2153
`'validation'`	3095

سمات :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

فرنسي

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:multilingual_librispeech/french')

وصف :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

الترخيص : لا يوجد ترخيص معروف
الإصدار : 2.1.0
الإنشقاقات :

ينقسم	أمثلة
`'test'`	2426
`'train'`	258213
`'train.1h'`	241
`'train.9h'`	2167
`'validation'`	2416

سمات :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

الأسبانية

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:multilingual_librispeech/spanish')

وصف :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

الترخيص : لا يوجد ترخيص معروف
الإصدار : 2.1.0
الإنشقاقات :

ينقسم	أمثلة
`'test'`	2385
`'train'`	220701
`'train.1h'`	233
`'train.9h'`	2110
`'validation'`	2408

سمات :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ايطالي

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:multilingual_librispeech/italian')

وصف :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

الترخيص : لا يوجد ترخيص معروف
الإصدار : 2.1.0
الإنشقاقات :

ينقسم	أمثلة
`'test'`	1262
`'train'`	59623
`'train.1h'`	240
`'train.9h'`	2173
`'validation'`	1248

سمات :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

البرتغالية

استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:

ds = tfds.load('huggingface:multilingual_librispeech/portuguese')

وصف :

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

الترخيص : لا يوجد ترخيص معروف
الإصدار : 2.1.0
الإنشقاقات :

ينقسم	أمثلة
`'test'`	871
`'train'`	37533
`'train.1h'`	236
`'train.9h'`	2116
`'validation'`	826

سمات :

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}