AAAI Publications, The Twenty-Ninth International Flairs Conference

Font Size: 
Necessity of Feature Selection when Augmenting Tweet Sentiment Feature Spaces with Emoticons
Joseph D. Prusa, Taghi M. Khoshgoftaar, Amri Napolitano

Last modified: 2016-03-30


Tweet sentiment classification seeks to identify the emotional polarity of a tweet. One potential way to enhance classification performance is to include emoticons as features. Emoticons are representations of faces expressing various emotions in text. They are created through combinations of letters, punctuation marks and symbols, and are frequently found within tweets. While emoticons have been used as features for sentiment classification, the importance of their inclusion has not been directly measured. In this work, we seek to determine if the addition of emoticon features improves classifier performance. We also investigate how high dimensionality impacts the addition of emoticon features. We conducted experiments testing the impact of using emoticon features, both with and without feature selection. Classifiers are trained using four different learners and either emoticons, unigrams, or both as features. Feature selection was conducted using five filter based feature rankers with four feature subset sizes. Our results showed that the choice of feature set (emoticon, unigram or both) had no significant impact in our initial tests when using no feature selection; however, with any of the tested feature selection techniques, augmenting unigram features with emoticon features resulted in significantly better performance than unigrams alone. Additionally, we investigate how the addition of emoticons changes the top features selected by the rankers.

Full Text: PDF