তথ্যসূত্র:
মিশ্রিত
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:c3/mixed')
- বর্ণনা :
Machine reading comprehension tasks require a machine reader to answer questions relevant to the given document. In this paper, we present the first free-form multiple-Choice Chinese machine reading Comprehension dataset (C^3), containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chinese-as-a-second-language examinations.
We present a comprehensive analysis of the prior knowledge (i.e., linguistic, domain-specific, and general world knowledge) needed for these real-world problems. We implement rule-based and popular neural methods and find that there is still a significant performance gap between the best performing model (68.5%) and human readers (96.0%), especially on problems that require prior knowledge. We further study the effects of distractor plausibility and data augmentation based on translated relevant datasets for English on model performance. We expect C^3 to present great challenges to existing systems as answering 86.8% of questions requires both knowledge within and beyond the accompanying document, and we hope that C^3 can serve as a platform to study how to leverage various kinds of prior knowledge to better understand a given written or orally oriented text.
- লাইসেন্স : কোনো পরিচিত লাইসেন্স নেই
- সংস্করণ : 1.0.0
- বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'test' | 1045 |
'train' | 3138 |
'validation' | 1046 |
- বৈশিষ্ট্য :
{
"documents": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"document_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"questions": {
"feature": {
"question": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"answer": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"choice": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}
ডায়ালগ
TFDS এ এই ডেটাসেট লোড করতে নিম্নলিখিত কমান্ডটি ব্যবহার করুন:
ds = tfds.load('huggingface:c3/dialog')
- বর্ণনা :
Machine reading comprehension tasks require a machine reader to answer questions relevant to the given document. In this paper, we present the first free-form multiple-Choice Chinese machine reading Comprehension dataset (C^3), containing 13,369 documents (dialogues or more formally written mixed-genre texts) and their associated 19,577 multiple-choice free-form questions collected from Chinese-as-a-second-language examinations.
We present a comprehensive analysis of the prior knowledge (i.e., linguistic, domain-specific, and general world knowledge) needed for these real-world problems. We implement rule-based and popular neural methods and find that there is still a significant performance gap between the best performing model (68.5%) and human readers (96.0%), especially on problems that require prior knowledge. We further study the effects of distractor plausibility and data augmentation based on translated relevant datasets for English on model performance. We expect C^3 to present great challenges to existing systems as answering 86.8% of questions requires both knowledge within and beyond the accompanying document, and we hope that C^3 can serve as a platform to study how to leverage various kinds of prior knowledge to better understand a given written or orally oriented text.
- লাইসেন্স : কোনো পরিচিত লাইসেন্স নেই
- সংস্করণ : 1.0.0
- বিভাজন :
বিভক্ত | উদাহরণ |
---|---|
'test' | 1627 |
'train' | 4885 |
'validation' | 1628 |
- বৈশিষ্ট্য :
{
"documents": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"document_id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"questions": {
"feature": {
"question": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"answer": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"choice": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}