مراجع:
alt-موازی
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:alt/alt-parallel')
- توضیحات :
The ALT project aims to advance the state-of-the-art Asian natural language processing (NLP) techniques through the open collaboration for developing and using ALT. It was first conducted by NICT and UCSY as described in Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch and Eiichiro Sumita (2016). Then, it was developed under ASEAN IVO as described in this Web page. The process of building ALT began with sampling about 20,000 sentences from English Wikinews, and then these sentences were translated into the other languages. ALT now has 13 languages: Bengali, English, Filipino, Hindi, Bahasa Indonesia, Japanese, Khmer, Lao, Malay, Myanmar (Burmese), Thai, Vietnamese, Chinese (Simplified Chinese).
- مجوز : مجوز شناخته شده ای وجود ندارد
- نسخه : 1.0.0
- تقسیمات :
تقسیم کنید | نمونه ها |
---|---|
'test' | 1019 |
'train' | 18094 |
'validation' | 1004 |
- ویژگی ها :
{
"SNT.URLID": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"SNT.URLID.SNTID": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"translation": {
"languages": [
"bg",
"en",
"en_tok",
"fil",
"hi",
"id",
"ja",
"khm",
"lo",
"ms",
"my",
"th",
"vi",
"zh"
],
"id": null,
"_type": "Translation"
}
}
alt-en
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:alt/alt-en')
- توضیحات :
The ALT project aims to advance the state-of-the-art Asian natural language processing (NLP) techniques through the open collaboration for developing and using ALT. It was first conducted by NICT and UCSY as described in Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch and Eiichiro Sumita (2016). Then, it was developed under ASEAN IVO as described in this Web page. The process of building ALT began with sampling about 20,000 sentences from English Wikinews, and then these sentences were translated into the other languages. ALT now has 13 languages: Bengali, English, Filipino, Hindi, Bahasa Indonesia, Japanese, Khmer, Lao, Malay, Myanmar (Burmese), Thai, Vietnamese, Chinese (Simplified Chinese).
- مجوز : مجوز شناخته شده ای وجود ندارد
- نسخه : 1.0.0
- تقسیم ها :
تقسیم کنید | نمونه ها |
---|---|
'test' | 1017 |
'train' | 17889 |
'validation' | 988 |
- ویژگی ها :
{
"SNT.URLID": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"SNT.URLID.SNTID": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"status": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"value": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
alt-jp
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:alt/alt-jp')
- توضیحات :
The ALT project aims to advance the state-of-the-art Asian natural language processing (NLP) techniques through the open collaboration for developing and using ALT. It was first conducted by NICT and UCSY as described in Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch and Eiichiro Sumita (2016). Then, it was developed under ASEAN IVO as described in this Web page. The process of building ALT began with sampling about 20,000 sentences from English Wikinews, and then these sentences were translated into the other languages. ALT now has 13 languages: Bengali, English, Filipino, Hindi, Bahasa Indonesia, Japanese, Khmer, Lao, Malay, Myanmar (Burmese), Thai, Vietnamese, Chinese (Simplified Chinese).
- مجوز : مجوز شناخته شده ای وجود ندارد
- نسخه : 1.0.0
- تقسیمات :
تقسیم کنید | نمونه ها |
---|---|
'test' | 931 |
'train' | 17202 |
'validation' | 953 |
- ویژگی ها :
{
"SNT.URLID": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"SNT.URLID.SNTID": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"status": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"value": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"word_alignment": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"jp_tokenized": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"en_tokenized": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
alt-my
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:alt/alt-my')
- توضیحات :
The ALT project aims to advance the state-of-the-art Asian natural language processing (NLP) techniques through the open collaboration for developing and using ALT. It was first conducted by NICT and UCSY as described in Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch and Eiichiro Sumita (2016). Then, it was developed under ASEAN IVO as described in this Web page. The process of building ALT began with sampling about 20,000 sentences from English Wikinews, and then these sentences were translated into the other languages. ALT now has 13 languages: Bengali, English, Filipino, Hindi, Bahasa Indonesia, Japanese, Khmer, Lao, Malay, Myanmar (Burmese), Thai, Vietnamese, Chinese (Simplified Chinese).
- مجوز : مجوز شناخته شده ای وجود ندارد
- نسخه : 1.0.0
- تقسیمات :
تقسیم کنید | نمونه ها |
---|---|
'test' | 1018 |
'train' | 18088 |
'validation' | 1000 |
- ویژگی ها :
{
"SNT.URLID": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"SNT.URLID.SNTID": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"value": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
alt-km
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:alt/alt-km')
- توضیحات :
The ALT project aims to advance the state-of-the-art Asian natural language processing (NLP) techniques through the open collaboration for developing and using ALT. It was first conducted by NICT and UCSY as described in Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch and Eiichiro Sumita (2016). Then, it was developed under ASEAN IVO as described in this Web page. The process of building ALT began with sampling about 20,000 sentences from English Wikinews, and then these sentences were translated into the other languages. ALT now has 13 languages: Bengali, English, Filipino, Hindi, Bahasa Indonesia, Japanese, Khmer, Lao, Malay, Myanmar (Burmese), Thai, Vietnamese, Chinese (Simplified Chinese).
- مجوز : مجوز شناخته شده ای وجود ندارد
- نسخه : 1.0.0
- تقسیمات :
تقسیم کنید | نمونه ها |
---|---|
'test' | 1018 |
'train' | 18088 |
'validation' | 1000 |
- ویژگی ها :
{
"SNT.URLID": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"SNT.URLID.SNTID": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"url": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"km_pos_tag": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"km_tokenized": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}
alt-my-transliteration
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:alt/alt-my-transliteration')
- توضیحات :
The ALT project aims to advance the state-of-the-art Asian natural language processing (NLP) techniques through the open collaboration for developing and using ALT. It was first conducted by NICT and UCSY as described in Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch and Eiichiro Sumita (2016). Then, it was developed under ASEAN IVO as described in this Web page. The process of building ALT began with sampling about 20,000 sentences from English Wikinews, and then these sentences were translated into the other languages. ALT now has 13 languages: Bengali, English, Filipino, Hindi, Bahasa Indonesia, Japanese, Khmer, Lao, Malay, Myanmar (Burmese), Thai, Vietnamese, Chinese (Simplified Chinese).
- مجوز : مجوز شناخته شده ای وجود ندارد
- نسخه : 1.0.0
- تقسیم ها :
تقسیم کنید | نمونه ها |
---|---|
'train' | 84022 |
- ویژگی ها :
{
"en": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"my": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
alt-my-west-transliteration
برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:
ds = tfds.load('huggingface:alt/alt-my-west-transliteration')
- توضیحات :
The ALT project aims to advance the state-of-the-art Asian natural language processing (NLP) techniques through the open collaboration for developing and using ALT. It was first conducted by NICT and UCSY as described in Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch and Eiichiro Sumita (2016). Then, it was developed under ASEAN IVO as described in this Web page. The process of building ALT began with sampling about 20,000 sentences from English Wikinews, and then these sentences were translated into the other languages. ALT now has 13 languages: Bengali, English, Filipino, Hindi, Bahasa Indonesia, Japanese, Khmer, Lao, Malay, Myanmar (Burmese), Thai, Vietnamese, Chinese (Simplified Chinese).
- مجوز : مجوز شناخته شده ای وجود ندارد
- نسخه : 1.0.0
- تقسیم ها :
تقسیم کنید | نمونه ها |
---|---|
'train' | 107121 |
- ویژگی ها :
{
"en": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"my": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}