Results 1 
5 of
5
SiteBased Partitioning and Repartitioning Techniques for Parallel PageRank Computation
"... Abstract—The PageRank algorithm is an important component in effective web search. At the core of this algorithm are repeated sparse matrixvector multiplications where the involved web matrices grow in parallel with the growth of the web and are stored in a distributed manner due to space limitatio ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract—The PageRank algorithm is an important component in effective web search. At the core of this algorithm are repeated sparse matrixvector multiplications where the involved web matrices grow in parallel with the growth of the web and are stored in a distributed manner due to space limitations. Hence, the PageRank computation, which is frequently repeated, must be performed in parallel with highefficiency and lowpreprocessing overhead while considering the initial distributed nature of the web matrices. Our contributions in this work are twofold. We first investigate the application of stateoftheart sparse matrix partitioning models in order to attain high efficiency in parallel PageRank computations with a particular focus on reducing the preprocessing overhead they introduce. For this purpose, we evaluate two different compression schemes on the web matrix using the site information inherently available in links. Second, we consider the more realistic scenario of starting with an initially distributed data and extend our algorithms to cover the repartitioning of such data for efficient PageRank computation. We report performance results using our parallelization of a stateoftheart PageRank algorithm on two different PC clusters with 40 and 64 processors. Experiments show that the proposed techniques achieve considerably high speedups while incurring a preprocessing overhead of several iterations (for some instances even less than a single iteration) of the underlying sequential PageRank algorithm. Index Terms—PageRank, sparse matrixvector multiplication, web search, parallelization, sparse matrix partitioning, graph partitioning, hypergraph partitioning, repartitioning. Ç
Article electronically published on February 7, 2008 RATIONAL EXTRAPOLATION FOR THE PAGERANK VECTOR
"... Abstract. An important problem in web search is to determine the importance of each page. From the mathematical point of view, this problem consists in finding the nonnegative left eigenvector of a matrix corresponding to its dominant eigenvalue 1. Since this matrix is neither stochastic nor irreduc ..."
Abstract
 Add to MetaCart
Abstract. An important problem in web search is to determine the importance of each page. From the mathematical point of view, this problem consists in finding the nonnegative left eigenvector of a matrix corresponding to its dominant eigenvalue 1. Since this matrix is neither stochastic nor irreducible, the power method has convergence problems. So, the matrix is replaced by a convex combination, depending on a parameter c, with a rank one matrix. Its left principal eigenvector now depends on c, and it is the PageRank vector we are looking for. However, when c is close to 1, the problem is illconditioned, and the power method converges slowly. So, the idea developed in this paper consists in computing the PageRank vector for several values of c, andthen to extrapolate them, by a conveniently chosen rational function, at a point near 1. The choice of this extrapolating function is based on the mathematical expression of the PageRank vector as a function of c. Numerical experiments end the paper. 1. The problem
AN INNEROUTER ITERATION FOR COMPUTING PAGERANK
"... Abstract. We present a new iterative scheme for PageRank computation. The algorithm is applied to the linear system formulation of the problem, using innerouter stationary iterations. It is simple, can be easily implemented and parallelized, and requires minimal storage overhead. Our convergence an ..."
Abstract
 Add to MetaCart
Abstract. We present a new iterative scheme for PageRank computation. The algorithm is applied to the linear system formulation of the problem, using innerouter stationary iterations. It is simple, can be easily implemented and parallelized, and requires minimal storage overhead. Our convergence analysis shows that the algorithm is effective for a crude inner tolerance and is not sensitive to the choice of the parameters involved. The same idea can be used as a preconditioning technique for nonstationary schemes. Numerical examples featuring matrices of dimensions exceeding 100,000,000 in sequential and parallel environments demonstrate the merits of our technique. Our code is available online for viewing and testing, along with several large scale examples.
PageRank: Splitting Homogeneous Singular Linear Systems of Index One
"... Abstract. The PageRank algorithm is used today within web information retrieval to provide a contentneutral ranking metric over web pages. It employs power method iterations to solve for the steadystate vector of a DTMC. The defining onestep probability transition matrix of this DTMC is derived f ..."
Abstract
 Add to MetaCart
Abstract. The PageRank algorithm is used today within web information retrieval to provide a contentneutral ranking metric over web pages. It employs power method iterations to solve for the steadystate vector of a DTMC. The defining onestep probability transition matrix of this DTMC is derived from the hyperlink structure of the web and a model of web surfing behaviour which accounts for user bookmarks and memorised URLs. In this paper we look to provide a more accessible, more broadly applicable explanation than has been given in the literature of how to make PageRank calculation more tractable through removal of the danglingpage matrix. This allows web pages without outgoing links to be removed before we employ power method iterations. It also allows decomposition of the problem according to irreducible subcomponents of the original transition matrix. Our explanation also covers a PageRank extension to accommodate TrustRank. In setting out our alternative explanation, we introduce and apply a general linear algebraic theorem which allows us to map homogeneous singular linear systems of index one to inhomogeneous nonsingular linear systems with a shared solution vector. As an aside, we show in this paper that irreducibility is not required for PageRank to be welldefined. 1