Semantic Indexing of Document Bases

Roberto Basili and Maria Teresa Pazienza

Browsing and navigating into a document base can be significantly improved by an easy access to textual sources. Many efficient indexing and search techniques have been proposed in the literature. Word vectors are commonly used to approximate the notion of document content and to support matching algorithms during the retrieval process. Efficiency criteria push for linear non-recursive representations. The text of a document is never processed for its linguistic information content. The gap between the implicit content of (a set of) texts and the rich structured formats (i.e. networks) able support intelligent browsing is well known. In this paper the overall architecture of a language oriented methodology of document processing for a content driven retrieval is described. Lexical acquisition modules are integrated with indexing and browsing ones, in order to support a significant semantic coverage and to guarantee portability throughout different domains. The experience in the development of different IR systems (based on linguistic processing of document content) used to demonstrate feasibility and strengthness of the methodology.


This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.