Victoria Bobicev, Marina Sokolova
Classification of texts potentially containing a complex and specific terminology requires the use of learning methods that do not rely on extensive feature engineering. In this work we use prediction by partial matching ( PPM), a method that compresses texts to capture text features and creates a language model adapted to a particular text. We show that the method achieves a high accuracy of text classification and can be used as an alternative to state-of-art learning algorithms.
Subjects: 13. Natural Language Processing; 12. Machine Learning and Discovery
Submitted: Apr 11, 2008