TFDS اکنون از فرمت Croissant 🥐 پشتیبانی می کند! برای دانستن بیشتر مستندات را بخوانید.

این صفحه به‌وسیله ‏Cloud Translation API‏ ترجمه شده است.

covost2

مراجع:

en_de

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_de')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_tr

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_tr')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_fa

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_fa')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_sv-SE

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_sv-SE')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_mn

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_mn')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_zh-CN

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_zh-CN')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_cy

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_cy')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_ca

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_ca')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_sl

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_sl')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_et

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_et')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_id

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_id')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_ar

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_ar')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_ta

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_ta')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_lv

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_lv')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_ja

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/en_ja')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	15531
`'train'`	289430
`'validation'`	15531

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

fr_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/fr_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	14760
`'train'`	207374
`'validation'`	14760

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

de_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/de_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	13511
`'train'`	127834
`'validation'`	13511

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

es_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/es_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	13221
`'train'`	79015
`'validation'`	13221

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ca_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/ca_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	12730
`'train'`	95854
`'validation'`	12730

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

it_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/it_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	8951
`'train'`	31698
`'validation'`	8940

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ru_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/ru_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	6300
`'train'`	12112
`'validation'`	6110

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

zh-CN_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/zh-CN_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	4898
`'train'`	7085
`'validation'`	4843

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

pt_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/pt_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	4023
`'train'`	9158
`'validation'`	3318

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

fa_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/fa_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	3445
`'train'`	53949
`'validation'`	3445

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

et_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/et_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	1571
`'train'`	1782
`'validation'`	1576

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

mn_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/mn_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	1759
`'train'`	2067
`'validation'`	1761

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

nl_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/nl_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	1699
`'train'`	7108
`'validation'`	1699

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

tr_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/tr_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	1629
`'train'`	3966
`'validation'`	1624

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ar_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/ar_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	1695
`'train'`	2283
`'validation'`	1758

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

sv-SE_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/sv-SE_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	1595
`'train'`	2160
`'validation'`	1349

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

lv_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/lv_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	1629
`'train'`	2337
`'validation'`	1125

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

sl_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/sl_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	360
`'train'`	1843
`'validation'`	509

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ta_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/ta_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	786
`'train'`	1358
`'validation'`	384

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ja_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/ja_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	684
`'train'`	1119
`'validation'`	635

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

id_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/id_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	844
`'train'`	1243
`'validation'`	792

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cy_en

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:covost2/cy_en')

توضیحات :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

مجوز : مجوز شناخته شده ای وجود ندارد
نسخه : 1.0.0
تقسیم ها :

تقسیم کنید	نمونه ها
`'test'`	690
`'train'`	1241
`'validation'`	690

ویژگی ها :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}