Nicholas Adelman and Marin Simina
In recent years, Latent Semantic Indexing (LSI) has been recognized as an effective tool for Information Retrieval in text documents. The level of “granularity” in LSI (i.e. whether LSI is performed on documents, paragraphs, sentences, phrases, etc.) is somewhat of a limiting factor, in that LSI comparisons can only be made at the level of granularity chosen. Here we argue that, as long as a record of the document structure is maintained, the level of granularity may be arbitrarily fine while still allowing for comparison at any coarser granularity. It is shown that the reduced-dimension vector for any particular section of a document is a function of the vectors of its constituent subsections. Using this information, we illustrate how LSI can be used to compare documents at multiple structural levels. One possible application (automated plagiarism detection) is discussed as an example of how this method of multilevel comparison may be used to improve query time in fine-granularity LSI applications.