Shiqi Zhao, Ting Liu, Xincheng Yuan, Sheng Li, Yu Zhang
Lexical paraphrasing aims at acquiring word-level paraphrases. It is critical for many Natural Language Processing (NLP) applications, such as Question Answering (QA), Information Extraction (IE), and Machine Translation (MT). Since the meaning and usage of a word can vary in distinct contexts, different paraphrases should be acquired according to the contexts. However, most of the existing researches focus on constructing paraphrase corpora, in which little contextual constraints for paraphrase application are imposed. This paper presents a method that automatically acquires context-specific lexical paraphrases. In this method, the obtained paraphrases of a word depend on the specific sentence the word occurs in. Two stages are included, i.e. candidate paraphrase extraction and paraphrase validation, both of which are mainly based on web mining. Evaluations are conducted on a news title corpus and the presented method is compared with a paraphrasing method that exploits a Chinese thesaurus of synonyms -- Tongyi Cilin (Extended) (CilinE for short). Results show that the f-measure of our method (0.4852) is significantly higher than that using CilinE (0.1127). In addition, over 85% of the correct paraphrases derived by our method cannot be found in CilinE, which suggests that our method is effective in acquiring out-of-thesaurus paraphrases.
Subjects: 13. Natural Language Processingn
Submitted: Oct 2, 2006