Riferimenti:
bs-eo
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:open_subtitles/bs-eo')
- Descrizione :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- Licenza : nessuna licenza conosciuta
- Versione : 2018.0.0
- Divide :
Diviso | Esempi |
---|---|
'train' | 10989 |
- Caratteristiche :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"bs": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"eo": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"bs": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"eo": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"bs",
"eo"
],
"id": null,
"_type": "Translation"
}
}
fr-hy
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:open_subtitles/fr-hy')
- Descrizione :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- Licenza : nessuna licenza conosciuta
- Versione : 2018.0.0
- Divide :
Diviso | Esempi |
---|---|
'train' | 668 |
- Caratteristiche :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"fr": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"hy": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"fr": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"hy": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"fr",
"hy"
],
"id": null,
"_type": "Translation"
}
}
da-ru
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:open_subtitles/da-ru')
- Descrizione :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- Licenza : nessuna licenza conosciuta
- Versione : 2018.0.0
- Divide :
Diviso | Esempi |
---|---|
'train' | 7543012 |
- Caratteristiche :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"da": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"ru": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"da": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"ru": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"da",
"ru"
],
"id": null,
"_type": "Translation"
}
}
it-ciao
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:open_subtitles/en-hi')
- Descrizione :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- Licenza : nessuna licenza conosciuta
- Versione : 2018.0.0
- Divide :
Diviso | Esempi |
---|---|
'train' | 93016 |
- Caratteristiche :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"en": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"hi": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"en": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"hi": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"en",
"hi"
],
"id": null,
"_type": "Translation"
}
}
bn-is
Utilizzare il comando seguente per caricare questo set di dati in TFDS:
ds = tfds.load('huggingface:open_subtitles/bn-is')
- Descrizione :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- Licenza : nessuna licenza conosciuta
- Versione : 2018.0.0
- Divide :
Diviso | Esempi |
---|---|
'train' | 38272 |
- Caratteristiche :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"bn": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"is": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"bn": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"is": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"bn",
"is"
],
"id": null,
"_type": "Translation"
}
}