Distributed PageRank Computation: An Improved Theoretical Study
PageRank is a classic measure that effectively evaluates the node importance in large graphs, and has been applied in numerous applications ranging from data mining, Web algorithms, recommendation systems, load balancing, search, and identifying connectivity structures. Computing PageRank for large graphs is challenging and this has motivated the studies of distributed algorithms to compute PageRank. Previously, little works have been spent on the distributed PageRank algorithms with provably desired complexity and accuracy. Given a graph with n nodes and if we model the distributed computation model as the well-known congested clique model, the state-of-the-art algorithm takes O(√logn) communication rounds to approximate the PageRank value of each node in G, with a probability at least 1−1/n. In this paper, we present improved distributed algorithms for computing PageRank. Particularly, our algorithm performs O(log log√n) rounds (a significant improvement compared with O(√logn) rounds) to approximate the PageRank values with a probability at least 1−1/n. Moreover, under a reasonable assumption, our algorithm also reduces the edge bandwidth (i.e., the maximum communication message size that can be exchanged through an edge during a communication round) by a O(logn) factor compared with the state-of-the-art algorithm. Finally, we show that our algorithm can be adapted to efficiently compute another variant of PageRank, i.e., the batch one-hop Personalized PageRanks, in O(log logn) communication rounds.