- Description:
Pre-trained Global Vectors for Word Representation (GloVe) embeddings for approximate nearest neighbor search. This dataset consists of two splits:
- 'database': consists of 1,183,514 data points, each has features: 'embedding' (100 floats), 'index' (int64), 'neighbors' (empty list).
- 'test': consists of 10,000 data points, each has features: 'embedding' (100 floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the nearest neighbors in the database.)
Homepage: https://nlp.stanford.edu/projects/glove/
Source code:
tfds.nearest_neighbors.glove_100_angular.Glove100Angular
Versions:
1.0.0
(default): Initial release.
Download size:
462.93 MiB
Dataset size:
567.90 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'database' |
1,183,514 |
'test' |
10,000 |
- Feature structure:
FeaturesDict({
'embedding': Tensor(shape=(100,), dtype=float32),
'index': Scalar(shape=(), dtype=int64, description=Index within the split.),
'neighbors': Sequence({
'distance': Scalar(shape=(), dtype=float32, description=Neighbor distance.),
'index': Scalar(shape=(), dtype=int64, description=Neighbor index.),
}),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
embedding | Tensor | (100,) | float32 | |
index | Scalar | int64 | Index within the split. | |
neighbors | Sequence | The computed neighbors, which is only available for the test split. | ||
neighbors/distance | Scalar | float32 | Neighbor distance. | |
neighbors/index | Scalar | int64 | Neighbor index. |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
@inproceedings{pennington2014glove,
author = {Jeffrey Pennington and Richard Socher and Christopher D. Manning},
booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
title = {GloVe: Global Vectors for Word Representation},
year = {2014},
pages = {1532--1543},
url = {http://www.aclweb.org/anthology/D14-1162},
}