An Automatic Classification of Book Texts to User-Defined Tags

Sharon Givon, Theresa Wilson

We describe work on automatically assigning labels to books using user-defined tags as the label set. Using supervised learning and exploring both binary and multiclass classification, we train and test classifiers on several sets of features, focusing on the size of the sets, part-of-speech classes and named entities. Results indicate that a binary classifier, trained and tested on a feature space that consists of a limited selection of parts of speech as well as all frequent named entities, achieves a classification precision of 81%, significantly outperforming a baseline which assigns the top-10 most popular tags to each book.

Subjects: 12. Machine Learning and Discovery; 13. Natural Language Processing

Submitted: Feb 15, 2008

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.