Indergeet Mani, Eric Bloedorn and Barbara Gates
In this paper we investigate two classes of techniques to determine what is salient in a text, as a means of deciding whether that information should be included in a summary. We introduce three methods based on text cohesion, which models text in terms of relations between words or referring expressions, to help determine how tightly connected the text is. We also describe a method based on text coherence, which models text in terms of macro-level relations between clauses or sentences to help determine the overall argumentative structure of the text. The paper compares salience scores produced by the cohesion and coherence methods and compares them with human judgments. The results show that while the coherence method beats the cohesion methods in accuracy of determining clause salience, the best cohesion method can reach 76% of the accuracy levels of the coherence method in determining salience. Further, two of the cohesion methods each yield significant positive correlations with the human salience judgments. We also compare the types of discourse-related text structure discovered by cohesion and coherence methods.