参考文献:
次のコマンドを使用して、このデータセットを TFDS にロードします。
ds = tfds.load('huggingface:trec')
- 説明:
The Text REtrieval Conference (TREC) Question Classification dataset contains 5500 labeled questions in training set and another 500 for test set. The dataset has 6 labels, 47 level-2 labels. Average length of each sentence is 10, vocabulary size of 8700.
Data are collected from four sources: 4,500 English questions published by USC (Hovy et al., 2001), about 500 manually constructed questions for a few rare classes, 894 TREC 8 and TREC 9 questions, and also 500 questions from TREC 10 which serves as the test set.
- ライセンス: 既知のライセンスはありません
- バージョン: 1.1.0
- 分割:
スプリット | 例 |
---|---|
'test' | 500 |
'train' | 5452 |
- 特徴:
{
"label-coarse": {
"num_classes": 6,
"names": [
"DESC",
"ENTY",
"ABBR",
"HUM",
"NUM",
"LOC"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"label-fine": {
"num_classes": 47,
"names": [
"manner",
"cremat",
"animal",
"exp",
"ind",
"gr",
"title",
"def",
"date",
"reason",
"event",
"state",
"desc",
"count",
"other",
"letter",
"religion",
"food",
"country",
"color",
"termeq",
"city",
"body",
"dismed",
"mount",
"money",
"product",
"period",
"substance",
"sport",
"plant",
"techmeth",
"volsize",
"instru",
"abb",
"speed",
"word",
"lang",
"perc",
"code",
"dist",
"temp",
"symbol",
"ord",
"veh",
"weight",
"currency"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"text": {
"dtype": "string",
"id": null,
"_type": "Value"
}
}