Scaling Up Word Sense Disambiguation via Parallel Texts

Yee Seng Chan, Hwee Tou Ng

A critical problem faced by current supervised WSD systems is the lack of manually annotated training data. Tackling this data acquisition bottleneck is crucial, in order to build high accuracy and wide-coverage WSD systems. In this paper, we show that the approach of automatically gathering training examples from parallel texts is scalable to a large set of nouns. We conducted evaluation on the nouns of SENSEVAL-2 English all-words task, using fine-grained sense scoring. Our evaluation shows that training on examples gathered from 680MB of parallel texts achieves accuracy comparable to the best system of SENSEVAL-2 English all-words task, and significantly outperforms the baseline of always choosing sense 1 of WordNet.

Content Area: 14. Natural Language Processing & Speech Recognition

Subjects: 13. Natural Language Processing

Submitted: May 10, 2005

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.