Analyzing Microtext
Papers from the 2013 AAAI Spring Symposium
Eduard Hovy, Vita Markman, Craig Martell, David Uthus Program Cochairs
Technical Report SS-13-01
114 pp., $30.00
ISBN 978-1-57735-598-4
[Add to Cart]
[View Cart]
Microtext are short snippets of text found in many modes of communication: microblogs (such as, Twitter, Plurk), short message streams (SMS), chat (such as instant messaging, internet relay chat), and transcribed conversations (such as FBI hostage negotiations). Microtext often has the characteristics of informality, brevity, varied grammar, frequent misspellings (both accidental and purposeful), and usage of abbreviations, acronyms, and emoticons. With more conversational forms of microtext such as multiparticipant chat, there are also entangled conversation threads. These characteristics create many difficulties for analyzing and understanding microtext, often causing traditional NLP techniques to fail.
Research on microtext is becoming increasingly necessary given the explosion of online microtext language. Yet, very few suitable tools have been developed for analyzing it. Also, there are few sufficiently large, publicly-available data sets (such as the Twitter corpus). Currently, most NLP tools are designed to deal with grammatical, properly spelled and punctuated language corpora. However, the reality is that a vast portion of online data does not conform to the canons of standard grammar and spelling.
This technical report includes papers from researchers from different communities who have an interest in analyzing microtext: artificial intelligence, machine learning, computational linguistics, information retrieval, linguistics, human-computer interaction, education, and the social sciences.