glove100_angular

  • Description:

Pre-trained Global Vectors for Word Representation (GloVe) embeddings for approximate nearest neighbor search. This dataset consists of two splits:

  1. 'database': consists of 1,183,514 data points, each has features: 'embedding' (100 floats), 'index' (int64), 'neighbors' (empty list).
  2. 'test': consists of 10,000 data points, each has features: 'embedding' (100 floats), 'index' (int64), 'neighbors' (list of 'index' and 'distance' of the nearest neighbors in the database.)
Split Examples
'database' 1,183,514
'test' 10,000
  • Feature structure:
FeaturesDict({
    'embedding': Tensor(shape=(100,), dtype=float32),
    'index': Scalar(shape=(), dtype=int64, description=Index within the split.),
    'neighbors': Sequence({
        'distance': Scalar(shape=(), dtype=float32, description=Neighbor distance.),
        'index': Scalar(shape=(), dtype=int64, description=Neighbor index.),
    }),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
embedding Tensor (100,) float32
index Scalar int64 Index within the split.
neighbors Sequence The computed neighbors, which is only available for the test split.
neighbors/distance Scalar float32 Neighbor distance.
neighbors/index Scalar int64 Neighbor index.
  • Citation:
@inproceedings{pennington2014glove,
  author = {Jeffrey Pennington and Richard Socher and Christopher D. Manning},
  booktitle = {Empirical Methods in Natural Language Processing (EMNLP)},
  title = {GloVe: Global Vectors for Word Representation},
  year = {2014},
  pages = {1532--1543},
  url = {http://www.aclweb.org/anthology/D14-1162},
}