- Description:
The Istella datasets are three large-scale Learning-to-Rank datasets released by Istella. Each dataset consists of query-document pairs represented as feature vectors and corresponding relevance judgment labels.
The dataset contains three versions:
main
("Istella LETOR"): Containing 10,454,629 query-document pairs.s
("Istella-S LETOR"): Containing 3,408,630 query-document pairs.x
("Istella-X LETOR"): Containing 26,791,447 query-document pairs.
You can specify whether to use the main
, s
or x
version of the dataset as
follows:
ds = tfds.load("istella/main")
ds = tfds.load("istella/s")
ds = tfds.load("istella/x")
If only istella
is specified, the istella/main
option is selected by
default:
# This is the same as `tfds.load("istella/main")`
ds = tfds.load("istella")
Source code:
tfds.ranking.istella.Istella
Versions:
1.0.0
: Initial release.1.0.1
: Fix serialization to support float64.1.1.0
: Bundle features into a single 'float_features' feature.1.2.0
(default): Add query and document identifiers.
Auto-cached (documentation): No
Feature structure:
FeaturesDict({
'doc_id': Tensor(shape=(None,), dtype=int64),
'float_features': Tensor(shape=(None, 220), dtype=float64),
'label': Tensor(shape=(None,), dtype=float64),
'query_id': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
doc_id | Tensor | (None,) | int64 | |
float_features | Tensor | (None, 220) | float64 | |
label | Tensor | (None,) | float64 | |
query_id | Text | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Citation:
@article{10.1145/2987380,
author = {Dato, Domenico and Lucchese, Claudio and Nardini, Franco Maria and Orlando, Salvatore and Perego, Raffaele and Tonellotto, Nicola and Venturini, Rossano},
title = {Fast Ranking with Additive Ensembles of Oblivious and Non-Oblivious Regression Trees},
year = {2016},
publisher = {ACM},
address = {New York, NY, USA},
volume = {35},
number = {2},
issn = {1046-8188},
url = {https://doi.org/10.1145/2987380},
doi = {10.1145/2987380},
journal = {ACM Transactions on Information Systems},
articleno = {15},
numpages = {31},
}
istella/main (default config)
Download size:
1.20 GiB
Dataset size:
1.12 GiB
Splits:
Split | Examples |
---|---|
'test' |
9,799 |
'train' |
23,219 |
- Examples (tfds.as_dataframe):
istella/s
Download size:
450.26 MiB
Dataset size:
421.88 MiB
Splits:
Split | Examples |
---|---|
'test' |
6,562 |
'train' |
19,245 |
'vali' |
7,211 |
- Examples (tfds.as_dataframe):
istella/x
Download size:
4.42 GiB
Dataset size:
2.46 GiB
Splits:
Split | Examples |
---|---|
'test' |
2,000 |
'train' |
6,000 |
'vali' |
2,000 |
- Examples (tfds.as_dataframe):