AAAI Publications, The Twenty-Sixth International FLAIRS Conference

Font Size: 
Using an Automatically Generated Dictionary and a Classifier to Identify a Person's Profession in Tweets
Abe Cezar Hall, Fernando Gomez

Last modified: 2013-05-19


Algorithms for classifying pre-tagged person entities in tweets into one of 8 profession categories are presented. A classifier using a semi-supervised learning algorithm that takes into consideration the local context surrounding the entity in the tweet, hash tag information, and topic signature scores is described. A method that uses data from the Web to dynamically create a reference file called a person dictionary, which contains person/profession relationships, is described, as is an algorithm to use the dictionary to assign a person into one of the 8 profession categories. Results show that classifications made with the automated person dictionary compare favorably to classifications made using a manually compiled dictionary. Results also show that classifications made using either the dictionary or the classifier are moderately successful and that a hybrid method using both offers significant improvement.


Named Entity Recognition; Microblogs; Tweets; Classifier

Full Text: PDF