Hilda Hardy, Vika Kanchakouskaya, Tomek Strzalkowski
Extracting events from documents quickly and accurately is an important goal for many tasks that require language understanding, such as question answering. We present a data-driven method for discovering events and their attributes in a corpus. We further demonstrate that a carefully chosen set of textual features, when used to train some well-known learning algorithms, can approach or exceed the accuracy of hand-crafted patterns for event classification, requiring far less time and expertise. The features can be gathered using lightweight text processing tools. Overall classification accuracy reaches 59.76% for a set of 11 event types.
Subjects: 13. Natural Language Processing; 12. Machine Learning and Discovery
Submitted: May 17, 2006