Font Size:

Walking on Minimax Paths for k-NN Search

Last modified: 2013-06-30

#### Abstract

Link-based dissimilarity measures, such as shortest path or Euclidean commute time distance, base their distance on paths between nodes of a weighted graph. These measures are known to be better suited to data manifold with nonconvex-shaped clusters, compared to Euclidean distance, so that

*k*-nearest neighbor (NN) search is improved in such metric spaces. In this paper we present a new link-based dissimilarity measure based on*minimax paths*between nodes. Two main benefits of minimax path-based dissimilarity measure are: (1) only a subset of paths is considered to make it scalable, while Euclidean commute time distance considers all possible paths; (2) it better captures nonconvex-shaped cluster structure, compared to shortest path distance. We define the total cost assigned to a path between nodes as*L*norm of intermediate costs of edges involving the path, showing that minimax path emerges from our_{p}*L*framework. We also define_{p}norm over paths*minimax distance*as the intermediate cost of the longest edge on the minimax path, then present a greedy algorithm to compute*k*smallest minimax distances between a query and*N*data points in*O(log N + k log k)*time. Numerical experiments demonstrate that our*minimax k-NN algorithm*reduce the search time by several orders of magnitude, compared to existing methods, while the quality of*k*-NN search is significantly improved over Euclidean distance.
Full Text:
PDF