Vibhu Mittal and Mark Kantrowitz, Just Research; Jade Goldstein and Jaime Carbonell, Carnegie Mellon University
Human-quality text summarization systems are difficult to design, and even more difficult to evaluate, in part because documents can differ along several dimensions, such as length, writing style and lexical usage. Nevertheless, certain cues can often help suggest the selection of sentences for inclusion in a summary. This paper presents our analysis of news-article summaries generated by sentence selection. Sentences are ranked for potential inclusion in the summary using a weighted combination of statistical and linguistic features. This paper analyzes some of the potential linguistic features -- derived from an analysis of news-wire summaries -- for relative effectiveness. To evaluate these features we use a modified version of precision-recall curves, with a baseline derived from a theoretical analysis of text-span overlap based on random selection. We illustrate our discussions with empirical results showing the importance of discussing evaluation results in the context of both corpus characteristics and compression ratios.