References:
ar-cs
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ar-cs')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
52128 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ar",
"cs"
],
"id": null,
"_type": "Translation"
}
}
ar-de
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ar-de')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
68916 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ar",
"de"
],
"id": null,
"_type": "Translation"
}
}
cs-de
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/cs-de')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
172706 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"cs",
"de"
],
"id": null,
"_type": "Translation"
}
}
ar-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ar-en')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
83187 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ar",
"en"
],
"id": null,
"_type": "Translation"
}
}
cs-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/cs-en')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
177278 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"cs",
"en"
],
"id": null,
"_type": "Translation"
}
}
de-en
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/de-en')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
223153 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"de",
"en"
],
"id": null,
"_type": "Translation"
}
}
ar-es
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ar-es')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
78074 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ar",
"es"
],
"id": null,
"_type": "Translation"
}
}
cs-es
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/cs-es')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
170489 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"cs",
"es"
],
"id": null,
"_type": "Translation"
}
}
de-es
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/de-es')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
209839 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"de",
"es"
],
"id": null,
"_type": "Translation"
}
}
en-es
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/en-es')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
238872 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"en",
"es"
],
"id": null,
"_type": "Translation"
}
}
ar-fr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ar-fr')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
69157 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ar",
"fr"
],
"id": null,
"_type": "Translation"
}
}
cs-fr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/cs-fr')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
148578 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"cs",
"fr"
],
"id": null,
"_type": "Translation"
}
}
de-fr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/de-fr')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
185442 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"de",
"fr"
],
"id": null,
"_type": "Translation"
}
}
en-fr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/en-fr')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
209479 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"en",
"fr"
],
"id": null,
"_type": "Translation"
}
}
es-fr
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/es-fr')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
195241 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"es",
"fr"
],
"id": null,
"_type": "Translation"
}
}
ar-it
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ar-it')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
17227 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ar",
"it"
],
"id": null,
"_type": "Translation"
}
}
cs-it
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/cs-it')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
30547 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"cs",
"it"
],
"id": null,
"_type": "Translation"
}
}
de-it
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/de-it')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
38961 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"de",
"it"
],
"id": null,
"_type": "Translation"
}
}
en-it
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/en-it')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
40009 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"en",
"it"
],
"id": null,
"_type": "Translation"
}
}
es-it
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/es-it')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
41497 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"es",
"it"
],
"id": null,
"_type": "Translation"
}
}
fr-it
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/fr-it')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
38485 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"fr",
"it"
],
"id": null,
"_type": "Translation"
}
}
ar-ja
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ar-ja')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
569 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ar",
"ja"
],
"id": null,
"_type": "Translation"
}
}
cs-ja
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/cs-ja')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
622 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"cs",
"ja"
],
"id": null,
"_type": "Translation"
}
}
de-ja
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/de-ja')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
582 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"de",
"ja"
],
"id": null,
"_type": "Translation"
}
}
en-ja
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/en-ja')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
637 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"en",
"ja"
],
"id": null,
"_type": "Translation"
}
}
es-ja
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/es-ja')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
602 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"es",
"ja"
],
"id": null,
"_type": "Translation"
}
}
fr-ja
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/fr-ja')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
519 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"fr",
"ja"
],
"id": null,
"_type": "Translation"
}
}
ar-nl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ar-nl')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
9047 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ar",
"nl"
],
"id": null,
"_type": "Translation"
}
}
cs-nl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/cs-nl')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
17358 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"cs",
"nl"
],
"id": null,
"_type": "Translation"
}
}
de-nl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/de-nl')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
21439 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"de",
"nl"
],
"id": null,
"_type": "Translation"
}
}
en-nl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/en-nl')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
19399 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"en",
"nl"
],
"id": null,
"_type": "Translation"
}
}
es-nl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/es-nl')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
21012 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"es",
"nl"
],
"id": null,
"_type": "Translation"
}
}
fr-nl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/fr-nl')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
20898 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"fr",
"nl"
],
"id": null,
"_type": "Translation"
}
}
it-nl
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/it-nl')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
15428 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"it",
"nl"
],
"id": null,
"_type": "Translation"
}
}
ar-pt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ar-pt')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
11433 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ar",
"pt"
],
"id": null,
"_type": "Translation"
}
}
cs-pt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/cs-pt')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
18356 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"cs",
"pt"
],
"id": null,
"_type": "Translation"
}
}
de-pt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/de-pt')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
21884 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"de",
"pt"
],
"id": null,
"_type": "Translation"
}
}
en-pt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/en-pt')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
25929 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"en",
"pt"
],
"id": null,
"_type": "Translation"
}
}
es-pt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/es-pt')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
25551 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"es",
"pt"
],
"id": null,
"_type": "Translation"
}
}
fr-pt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/fr-pt')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
25642 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"fr",
"pt"
],
"id": null,
"_type": "Translation"
}
}
it-pt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/it-pt')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
11407 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"it",
"pt"
],
"id": null,
"_type": "Translation"
}
}
nl-pt
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/nl-pt')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
10598 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"nl",
"pt"
],
"id": null,
"_type": "Translation"
}
}
ar-ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ar-ru')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
84455 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ar",
"ru"
],
"id": null,
"_type": "Translation"
}
}
cs-ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/cs-ru')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
161133 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"cs",
"ru"
],
"id": null,
"_type": "Translation"
}
}
de-ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/de-ru')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
175905 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"de",
"ru"
],
"id": null,
"_type": "Translation"
}
}
en-ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/en-ru')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
190104 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"en",
"ru"
],
"id": null,
"_type": "Translation"
}
}
es-ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/es-ru')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
180217 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"es",
"ru"
],
"id": null,
"_type": "Translation"
}
}
fr-ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/fr-ru')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
160740 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"fr",
"ru"
],
"id": null,
"_type": "Translation"
}
}
it-ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/it-ru')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
27267 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"it",
"ru"
],
"id": null,
"_type": "Translation"
}
}
ja-ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ja-ru')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
586 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ja",
"ru"
],
"id": null,
"_type": "Translation"
}
}
nl-ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/nl-ru')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
19112 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"nl",
"ru"
],
"id": null,
"_type": "Translation"
}
}
pt-ru
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/pt-ru')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
18458 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"pt",
"ru"
],
"id": null,
"_type": "Translation"
}
}
ar-zh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ar-zh')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
66021 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ar",
"zh"
],
"id": null,
"_type": "Translation"
}
}
cs-zh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/cs-zh')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
45424 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"cs",
"zh"
],
"id": null,
"_type": "Translation"
}
}
de-zh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/de-zh')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
59020 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"de",
"zh"
],
"id": null,
"_type": "Translation"
}
}
en-zh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/en-zh')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
69206 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"en",
"zh"
],
"id": null,
"_type": "Translation"
}
}
es-zh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/es-zh')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
65424 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"es",
"zh"
],
"id": null,
"_type": "Translation"
}
}
fr-zh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/fr-zh')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
59060 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"fr",
"zh"
],
"id": null,
"_type": "Translation"
}
}
it-zh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/it-zh')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
14652 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"it",
"zh"
],
"id": null,
"_type": "Translation"
}
}
ja-zh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ja-zh')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
570 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ja",
"zh"
],
"id": null,
"_type": "Translation"
}
}
nl-zh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/nl-zh')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
8433 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"nl",
"zh"
],
"id": null,
"_type": "Translation"
}
}
pt-zh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/pt-zh')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
10873 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"pt",
"zh"
],
"id": null,
"_type": "Translation"
}
}
ru-zh
Use the following command to load this dataset in TFDS:
ds = tfds.load('huggingface:news_commentary/ru-zh')
- Description:
A parallel corpus of News Commentaries provided by WMT for training SMT. The source is taken from CASMACAT: http://www.casmacat.eu/corpus/news-commentary.html
12 languages, 63 bitexts
total number of files: 61,928
total number of tokens: 49.66M
total number of sentence fragments: 1.93M
- License: No known license
- Version: 11.0.0
- Splits:
Split | Examples |
---|---|
'train' |
47687 |
- Features:
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ru",
"zh"
],
"id": null,
"_type": "Translation"
}
}