On What Latent Semantic Analysis (LSA/LSI) Does and Doesn’t Do

Thomas Landauer

Latent Semantic Analysis (LSA) is at once a remarkably simple and remarkably effective model of language. Its foundation is the following extreme simplification: The meaning of a passage is assumed to be the sum of the meanings of its contained words (with, of course a special restricted meaning of "meaning" relative to all that has been said about meaning in philosophy, linguistics, and literature.). This simplification allows observed natural language, for example a large corpus of ordinary text to be treated as a set of simultaneous linear equations that can be solved for the average meaning of the words, and consequently the meaning of any passage. The solution technique used by LSA is Singular Value Decomposition (SVD) followed by empirically optimal dimension reduction. The dimension reduction made possible by SVD has the property of inducing continuous-valued similarity relations between every word and every other, including the greater than 98 percent of pairs that never cooccur in a typical training corpus.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.