M. A. Hearst
Two long, full-length texts are not likely to discuss all, or almost all, of the same subtopics or subpoints. Even if the documents contain many of the same terms, the ways the in which terms are grouped to form subtopical discussions still might be quite different. A solution is to create a description of a document which lists all of its subtopical discussions as well as its main topics. An index that indicates this structure is an abstract representation of the document, and we can think of this index as a case in the Case-Based Reasoning (CBR) sense. This paper proposes the use of cases to represent the high-level structure of full-length documents for the purpose of information retrieval. The cases are to be used both for assessing document similarity and for helping the user construct viable queries. The case can be transformed in various ways in order to make it more similar to the descriptions of other documents; these transformations include generalizing, substituting, and emphasizing subtopic descriptions. An advantage of this approach is that the cases that represent the document are automatically generable.