Kamal Nigam and Matthew Hurst
This paper describes an automated system for detecting polar expressions about a topic of interest. The two elementary components of this approach are a shallow NLP polar language extraction system and a machine learning based topic classifier. These components are composed together by making a simple but accurate collocation assumption: if a topical sentence contains polar language, the system predicts that the polar language is reflective of the topic, and not some other subject matter. We evaluate our system, components and assumption on a corpus of online consumer messages. Based on these components, we discuss how to measure the overall sentiment about a particular topic as expressed in online messages authored by many different people. We propose to use the fundamentals of Bayesian statistics to form an aggregate authorial opinion metric. This metric would propagate uncertainties introduced by the polarity and topic modules to facilitate statistically valid comparisons of opinion across multiple topics.