- Description:
The Google RefExp dataset is a collection of text descriptions of objects in images which builds on the publicly available MS-COCO dataset. Whereas the image captions in MS-COCO apply to the entire image, this dataset focuses on text descriptions that allow one to uniquely identify a single object or region within an image. See more details in this paper: Generation and Comprehension of Unambiguous Object Descriptions.
Additional Documentation: Explore on Papers With Code
Source code:
tfds.vision_language.gref.Gref
Versions:
1.0.0
(default): Initial release.
Download size:
Unknown size
Dataset size:
4.60 GiB
Manual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir
(defaults to~/tensorflow_datasets/downloads/manual/
):
Follow instructions at https://github.com/mjhucla/Google_Refexp_toolbox to download and pre-process the data into aligned format with COCO. The directory contains 2 files and one folder:google_refexp_train_201511_coco_aligned_catg.json
google_refexp_val_201511_coco_aligned_catg.json
coco_train2014/
The coco_train2014 folder contains all of COCO 2014 training images.
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
24,698 |
'validation' |
4,650 |
- Feature structure:
FeaturesDict({
'image': Image(shape=(None, None, 3), dtype=uint8),
'image/id': int64,
'objects': Sequence({
'area': int64,
'bbox': BBoxFeature(shape=(4,), dtype=float32),
'id': int64,
'label': int64,
'label_name': ClassLabel(shape=(), dtype=int64, num_classes=80),
'refexp': Sequence({
'raw': Text(shape=(), dtype=string),
'referent': Text(shape=(), dtype=string),
'refexp_id': int64,
'tokens': Sequence(Text(shape=(), dtype=string)),
}),
}),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
image | Image | (None, None, 3) | uint8 | |
image/id | Tensor | int64 | ||
objects | Sequence | |||
objects/area | Tensor | int64 | ||
objects/bbox | BBoxFeature | (4,) | float32 | |
objects/id | Tensor | int64 | ||
objects/label | Tensor | int64 | ||
objects/label_name | ClassLabel | int64 | ||
objects/refexp | Sequence | |||
objects/refexp/raw | Text | string | ||
objects/refexp/referent | Text | string | ||
objects/refexp/refexp_id | Tensor | int64 | ||
objects/refexp/tokens | Sequence(Text) | (None,) | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples):
- Examples (tfds.as_dataframe):
- Citation:
@inproceedings{mao2016generation,
title={Generation and Comprehension of Unambiguous Object Descriptions},
author={Mao, Junhua and Huang, Jonathan and Toshev, Alexander and Camburu, Oana and Yuille, Alan and Murphy, Kevin},
booktitle={CVPR},
year={2016}
}