Stephan Raaijmakers, Wessel Kraaij
We present a shallow linguistic approach to subjectivity classification. Using multinomial kernel machines, we demonstrate that a data representation based on counting character n-grams is able to improve on results previously attained on the MPQA corpus using word-based n-grams and syntactic information. We compare two types of string-based representations: key substring groups and character n-grams. We find that word-spanning character n-grams significantly reduce the bias of a classifier, and boost its accuracy.
Subjects: 13. Natural Language Processing; 12. Machine Learning and Discovery
Submitted: Feb 15, 2008