참고자료:
세균발_14
TFDS에 이 데이터세트를 로드하려면 다음 명령어를 사용하세요.
ds = tfds.load('huggingface:germeval_14/germeval_14')
- 설명 :
The GermEval 2014 NER Shared Task builds on a new dataset with German Named Entity annotation with the following properties: - The data was sampled from German Wikipedia and News Corpora as a collection of citations. - The dataset covers over 31,000 sentences corresponding to over 590,000 tokens. - The NER annotation uses the NoSta-D guidelines, which extend the Tübingen Treebank guidelines, using four main NER categories with sub-structure, and annotating embeddings among NEs such as [ORG FC Kickers [LOC Darmstadt]].
- 라이센스 : 알려진 라이센스 없음
- 버전 : 2.0.0
- 분할 :
나뉘다 | 예 |
---|---|
'test' | 5100 |
'train' | 24000 |
'validation' | 2200 |
- 특징 :
{
"id": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"source": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"tokens": {
"feature": {
"dtype": "string",
"id": null,
"_type": "Value"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"ner_tags": {
"feature": {
"num_classes": 25,
"names": [
"O",
"B-LOC",
"I-LOC",
"B-LOCderiv",
"I-LOCderiv",
"B-LOCpart",
"I-LOCpart",
"B-ORG",
"I-ORG",
"B-ORGderiv",
"I-ORGderiv",
"B-ORGpart",
"I-ORGpart",
"B-OTH",
"I-OTH",
"B-OTHderiv",
"I-OTHderiv",
"B-OTHpart",
"I-OTHpart",
"B-PER",
"I-PER",
"B-PERderiv",
"I-PERderiv",
"B-PERpart",
"I-PERpart"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"length": -1,
"id": null,
"_type": "Sequence"
},
"nested_ner_tags": {
"feature": {
"num_classes": 25,
"names": [
"O",
"B-LOC",
"I-LOC",
"B-LOCderiv",
"I-LOCderiv",
"B-LOCpart",
"I-LOCpart",
"B-ORG",
"I-ORG",
"B-ORGderiv",
"I-ORGderiv",
"B-ORGpart",
"I-ORGpart",
"B-OTH",
"I-OTH",
"B-OTHderiv",
"I-OTHderiv",
"B-OTHpart",
"I-OTHpart",
"B-PER",
"I-PER",
"B-PERderiv",
"I-PERderiv",
"B-PERpart",
"I-PERpart"
],
"names_file": null,
"id": null,
"_type": "ClassLabel"
},
"length": -1,
"id": null,
"_type": "Sequence"
}
}