A Learning-Based Term-Weighting Approach for Information Retrieval

Guang Can Liu, Yong Yu, Xing Zhu

One of the core components in information retrieval(IR) is the document-term-weighting scheme. In this paper,we will propose a novel learning-based term-weighting approach to improve the retrieval performance of vector space model in homogeneous collections. We first introduce a simple learning system to weighting the index terms of documents. Then, we deduce a formal computational approach according to some theories of matrix computation and statistical inference. Our experiments on 8 collections will show that our approach outperforms classic TF.IDF weighting, about 20%~45%.

Content Area: 19. Semantic Web, Information Retrieval, and Extraction

Subjects: 1.10 Information Retrieval; 12. Machine Learning and Discovery

Submitted: May 4, 2005

This page is copyrighted by AAAI. All rights reserved. Your use of this site constitutes acceptance of all of AAAI's terms and conditions and privacy policy.