مراجع:
بكالوريوس EO
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:open_subtitles/bs-eo')
- وصف :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 2018.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'train' | 10989 |
- سمات :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"bs": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"eo": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"bs": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"eo": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"bs",
"eo"
],
"id": null,
"_type": "Translation"
}
}
fr-hy
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:open_subtitles/fr-hy')
- وصف :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 2018.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'train' | 668 |
- سمات :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"fr": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"hy": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"fr": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"hy": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"fr",
"hy"
],
"id": null,
"_type": "Translation"
}
}
دا رو
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:open_subtitles/da-ru')
- وصف :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 2018.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'train' | 7543012 |
- سمات :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"da": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"ru": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"da": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"ru": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"da",
"ru"
],
"id": null,
"_type": "Translation"
}
}
أون مرحبا
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:open_subtitles/en-hi')
- وصف :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 2018.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'train' | 93016 |
- سمات :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"en": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"hi": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"en": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"hi": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"en",
"hi"
],
"id": null,
"_type": "Translation"
}
}
مليار هو
استخدم الأمر التالي لتحميل مجموعة البيانات هذه في TFDS:
ds = tfds.load('huggingface:open_subtitles/bn-is')
- وصف :
This is a new collection of translated movie subtitles from http://www.opensubtitles.org/.
Important: If you use the OpenSubtitle corpus: Please, add a link to http://www.opensubtitles.org/ to your website and to your reports and publications produced with the data!
This is a slightly cleaner version of the subtitle collection using improved sentence alignment and better language checking.
62 languages, 1,782 bitexts
total number of files: 3,735,070
total number of tokens: 22.10G
total number of sentence fragments: 3.35G
- الترخيص : لا يوجد ترخيص معروف
- الإصدار : 2018.0.0
- الإنشقاقات :
ينقسم | أمثلة |
---|---|
'train' | 38272 |
- سمات :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"meta": {
"year": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"imdbId": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"subtitleId": {
"bn": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"is": {
"dtype": "uint32",
"id": null,
"_type": "Value"
}
},
"sentenceIds": {
"bn": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"is": {
"feature": {
"dtype": "uint32",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
},
"translation": {
"languages": [
"bn",
"is"
],
"id": null,
"_type": "Translation"
}
}