- Description:
Criteo Uplift Modeling Dataset
This dataset is released along with the paper: “A Large Scale Benchmark for Uplift Modeling” Eustache Diemert, Artem Betlei, Christophe Renaudin; (Criteo AI Lab), Massih-Reza Amini (LIG, Grenoble INP)
This work was published in: AdKDD 2018 Workshop, in conjunction with KDD 2018.
Data description
This dataset is constructed by assembling data resulting from several incrementality tests, a particular randomized trial procedure where a random part of the population is prevented from being targeted by advertising. it consists of 25M rows, each one representing a user with 11 features, a treatment indicator and 2 labels (visits and conversions).
Fields
Here is a detailed description of the fields (they are comma-separated in the file):
- f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, f10, f11: feature values (dense, float)
- treatment: treatment group (1 = treated, 0 = control)
- conversion: whether a conversion occured for this user (binary, label)
- visit: whether a visit occured for this user (binary, label)
- exposure: treatment effect, whether the user has been effectively exposed (binary)
Key figures
- Format: CSV
- Size: 459MB (compressed)
- Rows: 25,309,483
- Average Visit Rate: .04132
- Average Conversion Rate: .00229
- Treatment Ratio: .846
Tasks
The dataset was collected and prepared with uplift prediction in mind as the main task. Additionally we can foresee related usages such as but not limited to:
- benchmark for causal inference
- uplift modeling
- interactions between features and treatment
- heterogeneity of treatment
benchmark for observational causality methods
Additional Documentation: Explore on Papers With Code
Homepage: https://ailab.criteo.com/criteo-uplift-prediction-dataset/
Source code:
tfds.recommendation.criteo.Criteo
Versions:
1.0.0
: Initial release.1.0.1
(default): Fixed parsing of fieldsconversion
,visit
andexposure
.
Download size:
297.00 MiB
Dataset size:
3.55 GiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'train' |
13,979,592 |
- Feature structure:
FeaturesDict({
'conversion': bool,
'exposure': bool,
'f0': float32,
'f1': float32,
'f10': float32,
'f11': float32,
'f2': float32,
'f3': float32,
'f4': float32,
'f5': float32,
'f6': float32,
'f7': float32,
'f8': float32,
'f9': float32,
'treatment': int64,
'visit': bool,
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
conversion | Tensor | bool | ||
exposure | Tensor | bool | ||
f0 | Tensor | float32 | ||
f1 | Tensor | float32 | ||
f10 | Tensor | float32 | ||
f11 | Tensor | float32 | ||
f2 | Tensor | float32 | ||
f3 | Tensor | float32 | ||
f4 | Tensor | float32 | ||
f5 | Tensor | float32 | ||
f6 | Tensor | float32 | ||
f7 | Tensor | float32 | ||
f8 | Tensor | float32 | ||
f9 | Tensor | float32 | ||
treatment | Tensor | int64 | ||
visit | Tensor | bool |
Supervised keys (See
as_supervised
doc):({'exposure': 'exposure', 'f0': 'f0', 'f1': 'f1', 'f10': 'f10', 'f11': 'f11', 'f2': 'f2', 'f3': 'f3', 'f4': 'f4', 'f5': 'f5', 'f6': 'f6', 'f7': 'f7', 'f8': 'f8', 'f9': 'f9', 'treatment': 'treatment'}, 'visit')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@inproceedings{Diemert2018,
author = { {Diemert Eustache, Betlei Artem} and Renaudin, Christophe and Massih-Reza, Amini},
title={A Large Scale Benchmark for Uplift Modeling},
publisher = {ACM},
booktitle = {Proceedings of the AdKDD and TargetAd Workshop, KDD, London,United Kingdom, August, 20, 2018},
year = {2018}
}