তথ্যসূত্র:
tlcv1.0
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:tlc/tlcv1.0')
- বর্ণনা :
Thai Literature Corpora (TLC): Corpora of machine-ingestible Thai classical literature texts.
Release: 6/25/19
It consists of two datasets:
## TLC set
It is texts from [Vajirayana Digital Library](https://vajirayana.org/), stored by chapters and stanzas (non-tokenized).
tlc v.2.0 (6/17/19 : a total of 34 documents, 292,270 lines, 31,790,734 characters)
tlc v.1.0 (6/11/19 : a total of 25 documents, 113,981 lines, 28,775,761 characters)
## TNHC set
It is texts from Thai National Historical Corpus, stored by lines (manually tokenized).
tnhc v.1.0 (6/25/19 : a total of 47 documents, 756,478 lines, 13,361,142 characters)
- লাইসেন্স : কোনো পরিচিত লাইসেন্স নেই
- সংস্করণ : 1.0.0
- বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1 |
- বৈশিষ্ট্য :
{
"ch_num": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"text": {
"feature": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
tlcv2.0
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:tlc/tlcv2.0')
- বর্ণনা :
Thai Literature Corpora (TLC): Corpora of machine-ingestible Thai classical literature texts.
Release: 6/25/19
It consists of two datasets:
## TLC set
It is texts from [Vajirayana Digital Library](https://vajirayana.org/), stored by chapters and stanzas (non-tokenized).
tlc v.2.0 (6/17/19 : a total of 34 documents, 292,270 lines, 31,790,734 characters)
tlc v.1.0 (6/11/19 : a total of 25 documents, 113,981 lines, 28,775,761 characters)
## TNHC set
It is texts from Thai National Historical Corpus, stored by lines (manually tokenized).
tnhc v.1.0 (6/25/19 : a total of 47 documents, 756,478 lines, 13,361,142 characters)
- লাইসেন্স : কোনো পরিচিত লাইসেন্স নেই
- সংস্করণ : 2.0.0
- বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 1 |
- বৈশিষ্ট্য :
{
"ch_num": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"title": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"text": {
"feature": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
tnhcv1.0
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:tlc/tnhcv1.0')
- বর্ণনা :
Thai Literature Corpora (TLC): Corpora of machine-ingestible Thai classical literature texts.
Release: 6/25/19
It consists of two datasets:
## TLC set
It is texts from [Vajirayana Digital Library](https://vajirayana.org/), stored by chapters and stanzas (non-tokenized).
tlc v.2.0 (6/17/19 : a total of 34 documents, 292,270 lines, 31,790,734 characters)
tlc v.1.0 (6/11/19 : a total of 25 documents, 113,981 lines, 28,775,761 characters)
## TNHC set
It is texts from Thai National Historical Corpus, stored by lines (manually tokenized).
tnhc v.1.0 (6/25/19 : a total of 47 documents, 756,478 lines, 13,361,142 characters)
- লাইসেন্স : কোনো পরিচিত লাইসেন্স নেই
- সংস্করণ : 1.0.0
- বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'train' | 152 |
- বৈশিষ্ট্য :
{
"text": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}