Mo Chen, Jianzhuang Liu, Xiaoou Tang
In this paper, we present a general data clustering algorithm which is based on the asymmetric pairwise measure of Markov random walk hitting time on directed graphs. Unlike traditional graph based clustering methods, we do not explicitly calculate the pairwise similarities between points. Instead, we form a transition matrix ofMarkov random walk on a directed graph directly from the data. Our algorithm constructs the probabilistic relations of dependence between local sample pairs by studying the local distributions of the data. Such dependence relations are asymmetric, which is a more general measure of pairwise relations than the similarity measures in traditional undirected graph based methods in that it considers both the local density and geometry of the data. The probabilistic relations of the data naturally result in a transition matrix of Markov random walk. Based on the random walk viewpoint, we compute the expected hitting time for all sample pairs, which explores the global information of the structure of the underlying directed graph. An asymmetric measure based clustering algorithm, called K-destinations, is proposed for partitioning the nodes of the directed graph into disjoint sets. By utilizing the local distribution information of the data and the global structure information of the directed graph, our method is able to conquer some limitations of traditional pairwise similarity based methods. Experimental results are provided to validate the effectiveness of the proposed approach.
Submitted: Apr 14, 2008