Text Classification Using Stochastic Keyword Generation

Cong Li, Ji-Rong Wen, and Hang Li

This paper considers improving the performance of text classification, when summaries of the texts, as well as the texts themselves, are available during learning. Summaries can be more accurately classified than texts, so the question is how to effectively use the summaries in learning. This paper proposes a new method for addressing the problem, using a technique referred to as ’stochastic keyword generation' (SKG). In the proposed method, the SKG model is trained using the texts and their associated summaries. In classification, a text is first mapped, with SKG, into a vector of probability values, each of which corresponds to a keyword. Text classification is then conducted on the mapped vector. This method has been applied to email classification for an automated help desk. Experimental results indicate that the proposed method based on SKG significantly outperforms other methods.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.