Yee Seng Chan, Hwee Tou Ng
A critical problem faced by current supervised WSD systems is the lack of manually annotated training data. Tackling this data acquisition bottleneck is crucial, in order to build high accuracy and wide-coverage WSD systems. In this paper, we show that the approach of automatically gathering training examples from parallel texts is scalable to a large set of nouns. We conducted evaluation on the nouns of SENSEVAL-2 English all-words task, using fine-grained sense scoring. Our evaluation shows that training on examples gathered from 680MB of parallel texts achieves accuracy comparable to the best system of SENSEVAL-2 English all-words task, and significantly outperforms the baseline of always choosing sense 1 of WordNet.
Content Area: 14. Natural Language Processing & Speech Recognition
Subjects: 13. Natural Language Processing
Submitted: May 10, 2005