This paper focuses on the linguistic aspect of noun phrase coreference, investigating the knowledge sources that can potentially improve a learning-based coreference resolution system. Unlike traditional, knowledge-lean coreference resolvers, which rely almost exclusively on morpho-syntactic cues, we show how to induce features that encode semantic knowledge from labeled and unlabeled corpora. Experiments on the ACE data sets indicate that the addition of these new semantic features to a coreference system employing a fairly standard feature set significantly improves its performance.
Subjects: 13. Natural Language Processing; 13.1 Discourse
Submitted: Oct 16, 2006