Chris Pal, Gideon Mann, Richard Minerich
Geographic indexing is a powerful and effective way to organize information on the web, but the use of standardized location tags is not widespread. Therefore, there is considerable interest in using machine learning approaches to automatically obtain semantic associations involving geographic locations from processing unstructured natural language text. While it is often impractical or expensive to obtain training labels, there are often ways to obtain noisy labels. We present a novel discriminative approach using a hidden variable model suitable for learning with noisy labels and apply it to extracting location relationships from natural language. We examine the problem of associating events with locations, where simple keyword matching produces a small number of positive examples within many false positives. Compared to a state-of-the-art baseline, our method doubles the precision of extracting semantic information while maintaining the same recall.
Subjects: 12. Machine Learning and Discovery; 13. Natural Language Processing
Submitted: May 15, 2007