Automatic Discovery of Linguistic Patterns for Information Extraction

Sanda M. Harabagiu, Mihai Surdeanu, and Paul Morarescu, Southern Methodist University, USA

Information Extraction (IE) systems typically rely on extraction patterns encoding domain-specific knowledge. When matched against natural language texts, these patterns recognize with high accuracy information relevant to the extraction task. Adapting an IE system to a new extraction scenario entails devising a new collection of extraction patterns - a time-consuming and expensive process. To overcome this obstacle, we have implemented in CICERO, our IE system, a pattern acquisition mechanism that combines lexicosemantic knowledge available from WordNet with syntactic information collected from training corpora. The open-domain nature of the knowledge encoded in WordNet grants portability of our approach across multiple extraction domains.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.