- Description:
MLQA (Multilingual Question Answering Dataset) is a benchmark dataset for evaluating multilingual question answering performance. The dataset consists of 7 languages: Arabic, German, Spanish, English, Hindi, Vietnamese, Chinese.
Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/facebookresearch/MLQA
Source code:
tfds.datasets.mlqa.Builder
Versions:
1.0.0
(default): No release notes.
Download size:
72.21 MiB
Auto-cached (documentation): Yes
Feature structure:
FeaturesDict({
'answers': Sequence({
'answer_start': int32,
'text': Text(shape=(), dtype=string),
}),
'context': Text(shape=(), dtype=string),
'id': string,
'question': Text(shape=(), dtype=string),
'title': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
answers | Sequence | |||
answers/answer_start | Tensor | int32 | ||
answers/text | Text | string | ||
context | Text | string | ||
id | Tensor | string | ||
question | Text | string | ||
title | Text | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Citation:
@article{lewis2019mlqa,
title={MLQA: Evaluating Cross-lingual Extractive Question Answering},
author={Lewis, Patrick and Ouguz, Barlas and Rinott, Ruty and Riedel, Sebastian and Schwenk, Holger},
journal={arXiv preprint arXiv:1910.07475},
year={2019}
}
mlqa/ar (default config)
Config description: MLQA 'ar' dev and test splits.
Dataset size:
9.28 MiB
Splits:
Split | Examples |
---|---|
'test' |
5,335 |
'validation' |
517 |
- Examples (tfds.as_dataframe):
mlqa/de
Config description: MLQA 'de' dev and test splits.
Dataset size:
5.06 MiB
Splits:
Split | Examples |
---|---|
'test' |
4,517 |
'validation' |
512 |
- Examples (tfds.as_dataframe):
mlqa/en
Config description: MLQA 'en' dev and test splits.
Dataset size:
15.72 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,590 |
'validation' |
1,148 |
- Examples (tfds.as_dataframe):
mlqa/es
Config description: MLQA 'es' dev and test splits.
Dataset size:
5.09 MiB
Splits:
Split | Examples |
---|---|
'test' |
5,253 |
'validation' |
500 |
- Examples (tfds.as_dataframe):
mlqa/hi
Config description: MLQA 'hi' dev and test splits.
Dataset size:
12.83 MiB
Splits:
Split | Examples |
---|---|
'test' |
4,918 |
'validation' |
507 |
- Examples (tfds.as_dataframe):
mlqa/vi
Config description: MLQA 'vi' dev and test splits.
Dataset size:
8.77 MiB
Splits:
Split | Examples |
---|---|
'test' |
5,495 |
'validation' |
511 |
- Examples (tfds.as_dataframe):
mlqa/zh
Config description: MLQA 'zh' dev and test splits.
Dataset size:
5.13 MiB
Splits:
Split | Examples |
---|---|
'test' |
5,137 |
'validation' |
504 |
- Examples (tfds.as_dataframe):