Marti A. Hearst. David R. Karger and Jan O. Pedersen
An important information access problem arises when the user is confronted with a very large number of documents that have been retrieved in response to a query. In this paper we explore the use of a technique, called Scatter/Gather, for the navigation of large collections of retrieved documents. Scatter/Gather clusters the documents into semantically coherent groups on-the-fly and presents descriptive summaries of the groups to the user. These groups can be used in several ways: to identify useful subsets of documents to be perused with other tools, to eliminate subsets whose contents appear nonrelevant, or to select promising document subsets for reclustering into more refined groups. This paper describes the Scatter/ Gather algorithm and illustrates its application to retrieval results via two examples.