- Description:
The shared task of CoNLL-2002 concerns language-independent named entity recognition. The types of named entities include: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups. The participants of the shared task were offered training and test data for at least two languages. Information sources other than the training data might have been used in this shared task.
Homepage: https://aclanthology.org/W02-2024/
Source code:
tfds.text.conll2002.Conll2002
Versions:
1.0.0
(default): Initial release.
Auto-cached (documentation): Yes
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Citation:
@inproceedings{tjong-kim-sang-2002-introduction,
title = "Introduction to the {C}o{NLL}-2002 Shared Task: Language-Independent Named Entity Recognition",
author = "Tjong Kim Sang, Erik F.",
booktitle = "{COLING}-02: The 6th Conference on Natural Language Learning 2002 ({C}o{NLL}-2002)",
year = "2002",
url = "https://aclanthology.org/W02-2024",
}
conll2002/es (default config)
Download size:
3.95 MiB
Dataset size:
3.52 MiB
Splits:
Split | Examples |
---|---|
'dev' |
1,916 |
'test' |
1,518 |
'train' |
8,324 |
- Feature structure:
FeaturesDict({
'ner': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=9)),
'pos': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=60)),
'tokens': Sequence(Text(shape=(), dtype=string)),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
ner | Sequence(ClassLabel) | (None,) | int64 | |
pos | Sequence(ClassLabel) | (None,) | int64 | |
tokens | Sequence(Text) | (None,) | string |
- Examples (tfds.as_dataframe):
conll2002/nl
Download size:
3.47 MiB
Dataset size:
3.55 MiB
Splits:
Split | Examples |
---|---|
'dev' |
2,896 |
'test' |
5,196 |
'train' |
15,807 |
- Feature structure:
FeaturesDict({
'ner': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=9)),
'pos': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=12)),
'tokens': Sequence(Text(shape=(), dtype=string)),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
ner | Sequence(ClassLabel) | (None,) | int64 | |
pos | Sequence(ClassLabel) | (None,) | int64 | |
tokens | Sequence(Text) | (None,) | string |
- Examples (tfds.as_dataframe):