Improving Text Classification Using EM with Background Text

Sarah Zelikovitz, College of Staten Island, City University of New York; and Haym Hirsh, Rutgers University

For many text classification tasks, sets of background text are easily available from the Web and other online sources. We show that such background text can greatly improve text classification performance by treating the background text as unlabeled data and using existing techniques based on EM for iteratively labeling this background text. Although results are most pronounced when the background text falls into categories that mirror those present in the training and test data, we show improved classification accuracy even though the use of background text violates many of the assumptions underlying the original approach, especially in the presence of limited training data.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.