Sarah Zelikovitz and Finella Marquez, City University of New York
This paper presents work that evaluates background knowledge for use in improving accuracy for text classification using Latent Semantic Indexing (LSI). LSI’s singular value decomposition process can be performed on a combination of training data and background knowledge. Intuitively, the closer the background knowledge is to the classification task, the more helpful it will be in terms of creating a reduced space that will be effective in performing classification. Using a variety of data sets, we evaluate sets of background knowledge in terms of how close they are to training data, and in terms of how much they improve classification.