AAAI Publications, Twenty-Sixth AAAI Conference on Artificial Intelligence

Font Size: 
Random Projection with Filtering for Nearly Duplicate Search
Yue Lin, Rong Jin, Deng Cai, Xiaofei He

Last modified: 2012-07-14

Abstract


High dimensional nearest neighbor search is a fundamental problem and has found applications in many domains. Although many hashing based approaches have been proposed for approximate nearest neighbor search in high dimensional space, one main drawback is that they often return many false positives that need to be filtered out by a post procedure. We propose a novel method to address this limitation in this paper. The key idea is to introduce a filtering procedure within the search algorithm, based on the compressed sensing theory, that effectively removes the false positive answers. We first obtain a sparse representation for each data point by the landmark based approach, after which we solve the nearly duplicate search that the difference between the query and its nearest neighbors forms a sparse vector living in a small ℓp ball, where p ≤ 1. Our empirical study on real-world datasets demonstrates the effectiveness of the proposed approach compared to the
state-of-the-art hashing methods.

Full Text: PDF