TFDS はCroissant 🥐 形式をサポートするようになりました。詳細については、ドキュメントをお読みください。

このページは Cloud Translation API によって翻訳されました。

multilingual_librispeech

参考文献:

研磨

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:multilingual_librispeech/polish')

説明：

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

ライセンス: 不明なライセンス
バージョン: 2.1.0
分割:

スプリット	例
`'test'`	520
`'train'`	25043
`'train.1h'`	238
`'train.9h'`	2173
`'validation'`	512

特徴：

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ドイツ人

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:multilingual_librispeech/german')

説明：

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

ライセンス: 不明なライセンス
バージョン: 2.1.0
分割:

スプリット	例
`'test'`	3394
`'train'`	469942
`'train.1h'`	241
`'train.9h'`	2194
`'validation'`	3469

特徴：

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

オランダ人

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:multilingual_librispeech/dutch')

説明：

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

ライセンス: 不明なライセンス
バージョン: 2.1.0
分割:

スプリット	例
`'test'`	3075
`'train'`	374287
`'train.1h'`	234
`'train.9h'`	2153
`'validation'`	3095

特徴：

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

フランス語

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:multilingual_librispeech/french')

説明：

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

ライセンス: 不明なライセンス
バージョン: 2.1.0
分割:

スプリット	例
`'test'`	2426
`'train'`	258213
`'train.1h'`	241
`'train.9h'`	2167
`'validation'`	2416

特徴：

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

スペイン語

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:multilingual_librispeech/spanish')

説明：

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

ライセンス: 不明なライセンス
バージョン: 2.1.0
分割:

スプリット	例
`'test'`	2385
`'train'`	220701
`'train.1h'`	233
`'train.9h'`	2110
`'validation'`	2408

特徴：

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

イタリア語

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:multilingual_librispeech/italian')

説明：

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

ライセンス: 不明なライセンス
バージョン: 2.1.0
分割:

スプリット	例
`'test'`	1262
`'train'`	59623
`'train.1h'`	240
`'train.9h'`	2173
`'validation'`	1248

特徴：

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ポルトガル語

次のコマンドを使用して、このデータセットを TFDS にロードします。

ds = tfds.load('huggingface:multilingual_librispeech/portuguese')

説明：

Multilingual LibriSpeech (MLS) dataset is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish.

ライセンス: 不明なライセンス
バージョン: 2.1.0
分割:

スプリット	例
`'test'`	871
`'train'`	37533
`'train.1h'`	236
`'train.9h'`	2116
`'validation'`	826

特徴：

{
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "audio": {
        "sampling_rate": 16000,
        "mono": true,
        "id": null,
        "_type": "Audio"
    },
    "text": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "speaker_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "chapter_id": {
        "dtype": "int64",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}