Wen-tau Yih, Joshua Goodman, Lucy Vanderwende, Hisami Suzuki
We show that a simple procedure based on maximizing the number of informative content-words can produce some of the best reported results for multi-document summarization. We first assign a score to each term in the document cluster, using only frequency and position information, and then we find the set of sentences in the document cluster that maximizes the sum of these scores, subject to length constraints. Our overall results are the best reported on the DUC-2004 summarization task for the ROUGE-1 score, and are the best, but not statistically significantly different from the best system in MSE-2005. Our system is also substantially simpler than the previous best system.
Subjects: 13. Natural Language Processing; 13.1 Discourse
Submitted: Oct 16, 2006