TFDS obsługuje teraz format Croissant 🥐 ! Przeczytaj dokumentację , aby dowiedzieć się więcej.

Ta strona została przetłumaczona przez Cloud Translation API.

covost2

Referencje:

en_de

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_de')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_tr

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_tr')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_fa

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_fa')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_sv-SE

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_sv-SE')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

pl_mn

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_mn')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_zh-CN

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_zh-CN')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_cy

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_cy')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_ca

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_ca')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_sl

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_sl')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_et

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_et')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_id

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_id')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_ar

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_ar')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_ta

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_ta')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_lv

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_lv')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

en_ja

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/en_ja')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	15531
`'train'`	289430
`'validation'`	15531

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

fr_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/fr_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	14760
`'train'`	207374
`'validation'`	14760

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

de_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/de_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	13511
`'train'`	127834
`'validation'`	13511

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

es_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/es_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	13221
`'train'`	79015
`'validation'`	13221

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ca_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/ca_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	12730
`'train'`	95854
`'validation'`	12730

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

to_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/it_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	8951
`'train'`	31698
`'validation'`	8940

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ru_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/ru_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	6300
`'train'`	12112
`'validation'`	6110

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

zh-CN_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/zh-CN_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	4898
`'train'`	7085
`'validation'`	4843

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

pt_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/pt_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	4023
`'train'`	9158
`'validation'`	3318

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

fa_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/fa_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	3445
`'train'`	53949
`'validation'`	3445

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

et_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/et_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	1571
`'train'`	1782
`'validation'`	1576

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

mn_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/mn_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	1759
`'train'`	2067
`'validation'`	1761

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

nl_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/nl_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	1699
`'train'`	7108
`'validation'`	1699

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

tr_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/tr_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	1629
`'train'`	3966
`'validation'`	1624

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ar_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/ar_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	1695
`'train'`	2283
`'validation'`	1758

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

sv-SE_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/sv-SE_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	1595
`'train'`	2160
`'validation'`	1349

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

lv_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/lv_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	1629
`'train'`	2337
`'validation'`	1125

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

sl_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/sl_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	360
`'train'`	1843
`'validation'`	509

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ta_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/ta_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	786
`'train'`	1358
`'validation'`	384

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

ja_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/ja_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	684
`'train'`	1119
`'validation'`	635

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

id_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/id_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	844
`'train'`	1243
`'validation'`	792

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}

cy_en

Użyj następującego polecenia, aby załadować ten zestaw danych do TFDS:

ds = tfds.load('huggingface:covost2/cy_en')

Opis :

CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.

Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:


python
import torchaudio

def map_to_array(batch):
    speech_array, _ = torchaudio.load(batch["file"])
    batch["speech"] = speech_array.numpy()
    return batch

dataset = dataset.map(map_to_array, remove_columns=["file"])

Licencja : Brak znanej licencji
Wersja : 1.0.0
Podziały :

Podział	Przykłady
`'test'`	690
`'train'`	1241
`'validation'`	690

Cechy :

{
    "client_id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "file": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "sentence": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "translation": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    }
}