- Description:
The CFQ dataset (and it's splits) for measuring compositional generalization.
See https://arxiv.org/abs/1912.09713.pdf for background.
A note about the validation set: Since it has the same distribution as the test set and we are interested in measuring the compositional generalization of a model with respect to an unknown test distribution we suggest that any tuning should be done on a subset of the train set only (see section 5.1 of the paper).
Example usage:
data = tfds.load('cfq/mcd1')
Additional Documentation: Explore on Papers With Code
Homepage: https://github.com/google-research/google-research/tree/master/cfq
Source code:
tfds.text.CFQ
Versions:
1.2.0
(default): No release notes.
Download size:
255.20 MiB
Auto-cached (documentation): Yes
Feature structure:
FeaturesDict({
'query': Text(shape=(), dtype=string),
'question': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
query | Text | string | ||
question | Text | string |
Supervised keys (See
as_supervised
doc):('question', 'query')
Figure (tfds.show_examples): Not supported.
Citation:
@inproceedings{Keysers2020,
title={Measuring Compositional Generalization: A Comprehensive Method on
Realistic Data},
author={Daniel Keysers and Nathanael Sch"{a}rli and Nathan Scales and
Hylke Buisman and Daniel Furrer and Sergii Kashubin and
Nikola Momchev and Danila Sinopalnikov and Lukasz Stafiniak and
Tibor Tihon and Dmitry Tsarkov and Xiao Wang and Marc van Zee and
Olivier Bousquet},
booktitle={ICLR},
year={2020},
url={https://arxiv.org/abs/1912.09713.pdf},
}
cfq/mcd1 (default config)
Dataset size:
49.75 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/mcd2
Dataset size:
51.39 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/mcd3
Dataset size:
50.22 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/question_complexity_split
Dataset size:
52.81 MiB
Splits:
Split | Examples |
---|---|
'test' |
10,340 |
'train' |
98,999 |
'validation' |
10,339 |
- Examples (tfds.as_dataframe):
cfq/question_pattern_split
Dataset size:
52.81 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,909 |
'train' |
95,654 |
'validation' |
12,115 |
- Examples (tfds.as_dataframe):
cfq/query_complexity_split
Dataset size:
52.81 MiB
Splits:
Split | Examples |
---|---|
'test' |
9,512 |
'train' |
100,654 |
'validation' |
9,512 |
- Examples (tfds.as_dataframe):
cfq/query_pattern_split
Dataset size:
52.81 MiB
Splits:
Split | Examples |
---|---|
'test' |
12,589 |
'train' |
94,600 |
'validation' |
12,489 |
- Examples (tfds.as_dataframe):
cfq/random_split
Dataset size:
52.81 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,967 |
'train' |
95,744 |
'validation' |
11,967 |
- Examples (tfds.as_dataframe):
cfq/cd0_r1
Dataset size:
49.72 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0_r2
Dataset size:
49.45 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0_r3
Dataset size:
49.76 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0_r4
Dataset size:
50.04 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0_r5
Dataset size:
49.36 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0_r6
Dataset size:
49.36 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0_r7
Dataset size:
49.93 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0_r8
Dataset size:
48.58 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0_r9
Dataset size:
49.23 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.1_r1
Dataset size:
51.54 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.1_r2
Dataset size:
48.18 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.1_r3
Dataset size:
49.23 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.1_r4
Dataset size:
49.76 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.1_r5
Dataset size:
49.10 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.1_r6
Dataset size:
49.10 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.1_r7
Dataset size:
48.02 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.1_r8
Dataset size:
48.13 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.1_r9
Dataset size:
48.44 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.2_r1
Dataset size:
50.90 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.2_r2
Dataset size:
49.12 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.2_r3
Dataset size:
50.68 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.2_r4
Dataset size:
52.61 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.2_r5
Dataset size:
50.63 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.2_r6
Dataset size:
50.63 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.2_r7
Dataset size:
52.39 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.2_r8
Dataset size:
50.53 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.2_r9
Dataset size:
50.64 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.3_r1
Dataset size:
52.08 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.3_r2
Dataset size:
52.11 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.3_r3
Dataset size:
51.25 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.3_r4
Dataset size:
49.94 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.3_r5
Dataset size:
49.71 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.3_r6
Dataset size:
49.71 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.3_r7
Dataset size:
50.86 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.3_r8
Dataset size:
49.43 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.3_r9
Dataset size:
50.32 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.4_r1
Dataset size:
48.52 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.4_r2
Dataset size:
47.40 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.4_r3
Dataset size:
49.18 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.4_r4
Dataset size:
47.86 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.4_r5
Dataset size:
49.01 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.4_r6
Dataset size:
49.01 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.4_r7
Dataset size:
51.70 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.4_r8
Dataset size:
50.94 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.4_r9
Dataset size:
51.98 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.5_r1
Dataset size:
51.00 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.5_r2
Dataset size:
49.91 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.5_r3
Dataset size:
51.26 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.5_r4
Dataset size:
51.27 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.5_r5
Dataset size:
51.87 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.5_r6
Dataset size:
51.87 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.5_r7
Dataset size:
48.52 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.5_r8
Dataset size:
48.62 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.5_r9
Dataset size:
48.48 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.6_r1
Dataset size:
49.32 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.6_r2
Dataset size:
49.36 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.6_r3
Dataset size:
49.85 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.6_r4
Dataset size:
50.07 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.6_r5
Dataset size:
49.86 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.6_r6
Dataset size:
49.86 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.6_r7
Dataset size:
49.42 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.6_r8
Dataset size:
48.52 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd0.6_r9
Dataset size:
49.54 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd1_r1
Dataset size:
49.75 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd1_r2
Dataset size:
49.07 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd1_r3
Dataset size:
50.22 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd1_r4
Dataset size:
49.12 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd1_r5
Dataset size:
48.80 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd1_r6
Dataset size:
48.80 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd1_r7
Dataset size:
50.69 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd1_r8
Dataset size:
51.39 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):
cfq/cd1_r9
Dataset size:
49.94 MiB
Splits:
Split | Examples |
---|---|
'test' |
11,968 |
'train' |
95,743 |
'validation' |
11,968 |
- Examples (tfds.as_dataframe):