AAAI Publications, Thirty-Second AAAI Conference on Artificial Intelligence

Font Size: 
Spectral Word Embedding with Negative Sampling
Behrouz Haji Soleimani, Stan Matwin

Last modified: 2018-04-27

Abstract


In this work, we investigate word embedding algorithms in the context of natural language processing. In particular, we examine the notion of ``negative examples'', the unobserved or insignificant word-context co-occurrences, in spectral methods. we provide a new formulation for the word embedding problem by proposing a new intuitive objective function that perfectly justifies the use of negative examples. In fact, our algorithm not only learns from the important word-context co-occurrences, but also it learns from the abundance of unobserved or insignificant co-occurrences to improve the distribution of words in the latent embedded space. We analyze the algorithm theoretically and provide an optimal solution for the problem using spectral analysis. We have trained various word embedding algorithms on articles of Wikipedia with 2.1 billion tokens and show that negative sampling can boost the quality of spectral methods. Our algorithm provides results as good as the state-of-the-art but in a much faster and efficient way.

Keywords


Word Embedding; Natural Language Processing; Unsupervised Learning; Matrix Factorization; Spectral Algorithms; Singular Value Decomposition

Full Text: PDF