Sharon Givon, Theresa Wilson
We describe work on automatically assigning labels to books using user-defined tags as the label set. Using supervised learning and exploring both binary and multiclass classification, we train and test classifiers on several sets of features, focusing on the size of the sets, part-of-speech classes and named entities. Results indicate that a binary classifier, trained and tested on a feature space that consists of a limited selection of parts of speech as well as all frequent named entities, achieves a classification precision of 81%, significantly outperforming a baseline which assigns the top-10 most popular tags to each book.
Subjects: 12. Machine Learning and Discovery; 13. Natural Language Processing
Submitted: Feb 15, 2008