مراجع:
ar-ko
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:qed_amara/ar-ko')
- توضیحات :
The QCRI Educational Domain Corpus (formerly QCRI AMARA Corpus) is an open multilingual collection of subtitles for educational videos and lectures collaboratively transcribed and translated over the AMARA web-based platform.
Developed by: Qatar Computing Research Institute, Arabic Language Technologies Group
The QED Corpus is made public for RESEARCH purpose only.
The corpus is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Copyright Qatar Computing Research Institute. All rights reserved.
225 languages, 9,291 bitexts
total number of files: 271,558
total number of tokens: 371.76M
total number of sentence fragments: 30.93M
- مجوز : مجوز شناخته شده ای وجود ندارد
- نسخه : 2.0.0
- تقسیم ها :
تقسیم کنید | نمونه ها |
---|---|
'train' | 592589 |
- ویژگی ها :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"ar",
"ko"
],
"id": null,
"_type": "Translation"
}
}
de-fr
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:qed_amara/de-fr')
- توضیحات :
The QCRI Educational Domain Corpus (formerly QCRI AMARA Corpus) is an open multilingual collection of subtitles for educational videos and lectures collaboratively transcribed and translated over the AMARA web-based platform.
Developed by: Qatar Computing Research Institute, Arabic Language Technologies Group
The QED Corpus is made public for RESEARCH purpose only.
The corpus is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Copyright Qatar Computing Research Institute. All rights reserved.
225 languages, 9,291 bitexts
total number of files: 271,558
total number of tokens: 371.76M
total number of sentence fragments: 30.93M
- مجوز : مجوز شناخته شده ای وجود ندارد
- نسخه : 2.0.0
- تقسیم ها :
تقسیم کنید | نمونه ها |
---|---|
'train' | 407224 |
- ویژگی ها :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"de",
"fr"
],
"id": null,
"_type": "Translation"
}
}
es-it
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:qed_amara/es-it')
- توضیحات :
The QCRI Educational Domain Corpus (formerly QCRI AMARA Corpus) is an open multilingual collection of subtitles for educational videos and lectures collaboratively transcribed and translated over the AMARA web-based platform.
Developed by: Qatar Computing Research Institute, Arabic Language Technologies Group
The QED Corpus is made public for RESEARCH purpose only.
The corpus is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Copyright Qatar Computing Research Institute. All rights reserved.
225 languages, 9,291 bitexts
total number of files: 271,558
total number of tokens: 371.76M
total number of sentence fragments: 30.93M
- مجوز : مجوز شناخته شده ای وجود ندارد
- نسخه : 2.0.0
- تقسیم ها :
تقسیم کنید | نمونه ها |
---|---|
'train' | 447369 |
- ویژگی ها :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"es",
"it"
],
"id": null,
"_type": "Translation"
}
}
en-ja
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:qed_amara/en-ja')
- توضیحات :
The QCRI Educational Domain Corpus (formerly QCRI AMARA Corpus) is an open multilingual collection of subtitles for educational videos and lectures collaboratively transcribed and translated over the AMARA web-based platform.
Developed by: Qatar Computing Research Institute, Arabic Language Technologies Group
The QED Corpus is made public for RESEARCH purpose only.
The corpus is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Copyright Qatar Computing Research Institute. All rights reserved.
225 languages, 9,291 bitexts
total number of files: 271,558
total number of tokens: 371.76M
total number of sentence fragments: 30.93M
- مجوز : مجوز شناخته شده ای وجود ندارد
- نسخه : 2.0.0
- تقسیم ها :
تقسیم کنید | نمونه ها |
---|---|
'train' | 497531 |
- ویژگی ها :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"en",
"ja"
],
"id": null,
"_type": "Translation"
}
}
he-nl
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:qed_amara/he-nl')
- توضیحات :
The QCRI Educational Domain Corpus (formerly QCRI AMARA Corpus) is an open multilingual collection of subtitles for educational videos and lectures collaboratively transcribed and translated over the AMARA web-based platform.
Developed by: Qatar Computing Research Institute, Arabic Language Technologies Group
The QED Corpus is made public for RESEARCH purpose only.
The corpus is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Copyright Qatar Computing Research Institute. All rights reserved.
225 languages, 9,291 bitexts
total number of files: 271,558
total number of tokens: 371.76M
total number of sentence fragments: 30.93M
- مجوز : مجوز شناخته شده ای وجود ندارد
- نسخه : 2.0.0
- تقسیم ها :
تقسیم کنید | نمونه ها |
---|---|
'train' | 273165 |
- ویژگی ها :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"he",
"nl"
],
"id": null,
"_type": "Translation"
}
}