Joäo Cordeiro, Gaël Dias, and Pavel Brazdil
Paraphrase detection can be seen as the task of aligning sentences that convey the same information but yet are written in different forms. Such resources are important to automatically learn text-to-text rewriting rules. In this paper, we present a new metric for unsupervised detection of paraphrases and apply it in the context of clustering of paraphrases. An exhaustive evaluation is conducted over a set of standard paraphrase corpora and real-world web news stories (WNS) corpora. The results are promising as they outperform state-of-the-art measures developed for similar tasks.
Subjects: 13. Natural Language Processing; 1.10 Information Retrieval
Submitted: Feb 11, 2007