Referencias:
es_de
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_de')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_tr
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_tr')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_fa
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_fa')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_sv-SE
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_sv-SE')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_mn
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_mn')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_zh-CN
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_zh-CN')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
en_cy
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_cy')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_ca
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_ca')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_sl
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_sl')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
en_et
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_et')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_id
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_id')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_ar
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_ar')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_ta
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_ta')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_lv
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_lv')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_ja
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/en_ja')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
fr_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/fr_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 14760 |
'train' | 207374 |
'validation' | 14760 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
de_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/de_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 13511 |
'train' | 127834 |
'validation' | 13511 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/es_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 13221 |
'train' | 79015 |
'validation' | 13221 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ca_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/ca_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 12730 |
'train' | 95854 |
'validation' | 12730 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/it_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 8951 |
'train' | 31698 |
'validation' | 8940 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ru_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/ru_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 6300 |
'train' | 12112 |
'validation' | 6110 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
zh-CN_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/zh-CN_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 4898 |
'train' | 7085 |
'validation' | 4843 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
pt_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/pt_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 4023 |
'train' | 9158 |
'validation' | 3318 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
fa_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/fa_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 3445 |
'train' | 53949 |
'validation' | 3445 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
et_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/et_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 1571 |
'train' | 1782 |
'validation' | 1576 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
mn_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/mn_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 1759 |
'train' | 2067 |
'validation' | 1761 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
nl_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/nl_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 1699 |
'train' | 7108 |
'validation' | 1699 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/tr_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 1629 |
'train' | 3966 |
'validation' | 1624 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ar_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/ar_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 1695 |
'train' | 2283 |
'validation' | 1758 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
sv-SE_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/sv-SE_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 1595 |
'train' | 2160 |
'validation' | 1349 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lv_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/lv_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 1629 |
'train' | 2337 |
'validation' | 1125 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
sl_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/sl_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 360 |
'train' | 1843 |
'validation' | 509 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ta_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/ta_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 786 |
'train' | 1358 |
'validation' | 384 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ja_en
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/ja_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 684 |
'train' | 1119 |
'validation' | 635 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
id_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/id_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 844 |
'train' | 1243 |
'validation' | 792 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cy_es
Utilice el siguiente comando para cargar este conjunto de datos en TFDS:
ds = tfds.load('huggingface:covost2/cy_en')
- Descripción :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Licencia : Sin licencia conocida
- Versión : 1.0.0
- Divisiones :
Separar | Ejemplos |
---|---|
'test' | 690 |
'train' | 1241 |
'validation' | 690 |
- Características :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}