AAAI Publications, 2012 AAAI Spring Symposium Series

Font Size: 
Tracking Epidemics with Natural Language Processing and Crowdsourcing
Robert Munro, Lucky Gunasekara, Stephanie Nevins, Lalith Polepeddi, Evan Rosen

Last modified: 2012-03-23

Abstract


The first indication of a new outbreak is often in unstructured data (natural language) and reported openly in traditional or social media as a new `flu-like' or `malaria-like' illness weeks or months before the new pathogen is eventually isolated. We present a system for tracking these early signals globally, using natural language processing and crowdsourcing. By comparison, search-log-based approaches, while innovative and inexpensive, are often a trailing signal that follow open reports in plain language. Concentrating on discovering outbreak-related reports in big open data, we show how crowdsourced workers can create near-real-time training data for adaptive active-learning models, addressing the lack of broad coverage training data for tracking epidemics. This is well-suited to an outbreak information-flow context, where sudden bursts of information about new diseases/locations need to be manually processed quickly at short notice.

Full Text: PDF