Referanslar:
tr_de
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_de')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_tr')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_fa
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_fa')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_sv-SE
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_sv-SE')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_mn
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_mn')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_zh-CN
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_zh-CN')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
en_cy
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_cy')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_ca
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_ca')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_sl
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_sl')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_et
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_et')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_id
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_id')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_ar
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_ar')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_ta
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_ta')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_lv
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_lv')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_ja
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/en_ja')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 15531 |
'train' | 289430 |
'validation' | 15531 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
fr_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/fr_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 14760 |
'train' | 207374 |
'validation' | 14760 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
de_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/de_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 13511 |
'train' | 127834 |
'validation' | 13511 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
es_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/es_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 13221 |
'train' | 79015 |
'validation' | 13221 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ca_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/ca_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 12730 |
'train' | 95854 |
'validation' | 12730 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
it_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/it_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 8951 |
'train' | 31698 |
'validation' | 8940 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ru_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/ru_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 6300 |
'train' | 12112 |
'validation' | 6110 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
zh-CN_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/zh-CN_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 4898 |
'train' | 7085 |
'validation' | 4843 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
pt_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/pt_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 4023 |
'train' | 9158 |
'validation' | 3318 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
fa_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/fa_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 3445 |
'train' | 53949 |
'validation' | 3445 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
et_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/et_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 1571 |
'train' | 1782 |
'validation' | 1576 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
mn_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/mn_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 1759 |
'train' | 2067 |
'validation' | 1761 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
nl_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/nl_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 1699 |
'train' | 7108 |
'validation' | 1699 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
tr_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/tr_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 1629 |
'train' | 3966 |
'validation' | 1624 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ar_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/ar_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 1695 |
'train' | 2283 |
'validation' | 1758 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
sv-SE_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/sv-SE_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 1595 |
'train' | 2160 |
'validation' | 1349 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
lv_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/lv_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 1629 |
'train' | 2337 |
'validation' | 1125 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
sl_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/sl_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 360 |
'train' | 1843 |
'validation' | 509 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ta_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/ta_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 786 |
'train' | 1358 |
'validation' | 384 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
ja_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/ja_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 684 |
'train' | 1119 |
'validation' | 635 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
id_tr
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/id_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 844 |
'train' | 1243 |
'validation' | 792 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
cy_en
Bu veri kümesini TFDS'ye yüklemek için aşağıdaki komutu kullanın:
ds = tfds.load('huggingface:covost2/cy_en')
- Tanım :
CoVoST 2, a large-scale multilingual speech translation corpus covering translations from 21 languages into English and from English into 15 languages. The dataset is created using Mozilla’s open source Common Voice database of crowdsourced voice recordings.
Note that in order to limit the required storage for preparing this dataset, the audio
is stored in the .mp3 format and is not converted to a float32 array. To convert, the audio
file to a float32 array, please make use of the `.map()` function as follows:
python
import torchaudio
def map_to_array(batch):
speech_array, _ = torchaudio.load(batch["file"])
batch["speech"] = speech_array.numpy()
return batch
dataset = dataset.map(map_to_array, remove_columns=["file"])
- Lisans : Bilinen lisans yok
- Sürüm : 1.0.0
- Bölünmeler :
Bölmek | Örnekler |
---|---|
'test' | 690 |
'train' | 1241 |
'validation' | 690 |
- Özellikler :
{
"client_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"file": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"sentence": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}