Word embeddings
Word embeddings map words to vectors in relatively low dimension compared to one-hot encodings.
Blog posts:
- Off the Convex Path: Semantic word embeddings (1 , 2 , 3 ) (Arora et al, 2016)
- Sebastian Ruder: introduction and recent trends
- Omer Levy’s blog
Stanford resources:
- Lecture by Chris Manning for CS 276 (slides)
- Talk by Pramod Viswanath : Geometries of word embeddings (slides)
Software
Pre-trained word embeddings
- word2vec (Google Code , GitHub ) (Mikolov et al, 2013; Levy & Goldberg, 2014)
- GloVe : Global vectors (Pennington et al, 2014)
Literature
Influential papers
- Mikolov, Sutskever, Chen, Corrado, Dean, 2013: Distributed representations of words and phrases and their compositionality (pdf, arxiv)
- Pennington, Socher, Manning, 2014: GloVe: Global vector for word
representations (pdf)
- Simple but effective method based on weighted SVD
Probabilistic embeddings (probability distributions, not points):
- Vilnis & McCallum, 2014: Word representations via Gaussian embedding (arxiv,
GitHub )
- Points replaced by Gaussian distributions, with variance capturing word specificity
- Containment of constant-density ellipsoids models entailment
- Athiwaratkun & Wilson, 2017: Multimodal word distributions (pdf, arxiv)
Theory