## Fast Parallel PageRank: A Linear System Approach (2004)

Citations: | 23 - 2 self |

### BibTeX

@TECHREPORT{Gleich04fastparallel,

author = {David Gleich},

title = {Fast Parallel PageRank: A Linear System Approach},

institution = {},

year = {2004}

}

### OpenURL

### Abstract

In this paper we investigate the convergence of iterative stationary and Krylov subspace methods for the PageRank linear system, including the convergence dependency on teleportation. We demonstrate that linear system iterations converge faster than the simple power method and are less sensitive to the changes in teleportation. In order to perform this study we developed a framework for parallel PageRank computing. We describe the details of the parallel implementation and provide experimental results obtained on a 70-node Beowulf cluster.

### Citations

3249 | The anatomy of a large-scale hypertextual web search engine
- Brin, Page
- 1998
(Show Context)
Citation Context ...eRank, Eigenvalues, Linear Systems, Parallel Computing 1. INTRODUCTION The PageRank algorithm, a method for computing the relative rank of web pages based on the Web link structure, was introduced in =-=[25, 8]-=- and has been widely used since then. PageRank computations are a key component of modern Web search ranking systems. For a general review of PageRank computing see [22, 7]. Until recently, the PageRa... |

2136 | The PageRank citation ranking: Bringing order to the web, Tech. rep., Stanford Digital Library Technologies Project
- Page, Brin, et al.
- 1998
(Show Context)
Citation Context ...eRank, Eigenvalues, Linear Systems, Parallel Computing 1. INTRODUCTION The PageRank algorithm, a method for computing the relative rank of web pages based on the Web link structure, was introduced in =-=[25, 8]-=- and has been widely used since then. PageRank computations are a key component of modern Web search ranking systems. For a general review of PageRank computing see [22, 7]. Until recently, the PageRa... |

705 | The EigenTrust algorithm for reputation management in P2P networks
- Kamvar, Schlosser, et al.
- 2003
(Show Context)
Citation Context ...94089 pberkhin@yahooinc.com PageRank is also becoming a useful tool applied in many Web search technologies and beyond, for example, spam detection [13], crawler configuration [10], or trust networks =-=[20]-=-. In this setting many PageRanks corresponding to different modifications – such as graphs with a different level of granularity (HostRank) or different link weight assignments (internal, external, et... |

494 |
Iterative Solution Methods
- Axelsson
- 1994
(Show Context)
Citation Context ... considered for smaller size problems (or subproblems), but it does create additional fill in. It is also notoriously hard to parallelize. In this paper we concentrate on the use of iterative methods =-=[12, 3, 6]-=-. There are two main requirements for the iterative linear solver: i) it should work with nonsymmetric matrices and ii) it should be easily parallelizable. Thus, from the stationary methods we use Jac... |

491 |
der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Edition
- Berry, Demmel, et al.
- 1994
(Show Context)
Citation Context ... considered for smaller size problems (or subproblems), but it does create additional fill in. It is also notoriously hard to parallelize. In this paper we concentrate on the use of iterative methods =-=[12, 3, 6]-=-. There are two main requirements for the iterative linear solver: i) it should work with nonsymmetric matrices and ii) it should be easily parallelizable. Thus, from the stationary methods we use Jac... |

415 | Topic-sensitive PageRank
- Haveliwala
- 2002
(Show Context)
Citation Context ...lobal importance score for each page on the web. These scores were recomputed for each new Web graph crawl. Recently, significant attention has been given to topic-specific and personalized PageRanks =-=[14, 16]-=-. In both cases one has to compute multiple PageRanks corresponding to various teleportation vectors for different topics or user preferences. ∗ Work performed while at Yahoo! . Leonid Zhukov Yahoo! 7... |

295 | Scaling personalized web search
- Jeh, Widom
- 2003
(Show Context)
Citation Context ...lobal importance score for each page on the web. These scores were recomputed for each new Web graph crawl. Recently, significant attention has been given to topic-specific and personalized PageRanks =-=[14, 16]-=-. In both cases one has to compute multiple PageRanks corresponding to various teleportation vectors for different topics or user preferences. ∗ Work performed while at Yahoo! . Leonid Zhukov Yahoo! 7... |

290 | Efficient crawling through url ordering
- Cho, Garcia-Molina, et al.
- 1998
(Show Context)
Citation Context ...First Ave Sunnyvale, CA 94089 pberkhin@yahooinc.com PageRank is also becoming a useful tool applied in many Web search technologies and beyond, for example, spam detection [13], crawler configuration =-=[10]-=-, or trust networks [20]. In this setting many PageRanks corresponding to different modifications – such as graphs with a different level of granularity (HostRank) or different link weight assignments... |

288 | Combating web spam with trustrank
- Gyöngyi, Garcia-Molina, et al.
- 2004
(Show Context)
Citation Context ...om Pavel Berkhin Yahoo! 701 First Ave Sunnyvale, CA 94089 pberkhin@yahooinc.com PageRank is also becoming a useful tool applied in many Web search technologies and beyond, for example, spam detection =-=[13]-=-, crawler configuration [10], or trust networks [20]. In this setting many PageRanks corresponding to different modifications – such as graphs with a different level of granularity (HostRank) or diffe... |

156 |
PETSc users manual
- Balay, Gropp, et al.
- 2001
(Show Context)
Citation Context ...ystem Eq. (7). These methods are based on certain minimization procedures and only use the matrix through matrix-vector multiplication. Detailed description of the algorithms are available in [6] and =-=[4]-=-. In this study we have chosen several Krylov methods satisfying our criteria: - Generalize Minimum Residual (GMRES) - Biconjugate Gradient (BiCG) - Quasi-Minimal Residual (QMR) - Conjugate Gradient S... |

153 |
Efficient management of parallelism in object oriented numerical software libraries
- Balay, Gropp, et al.
- 1997
(Show Context)
Citation Context ... chassis was connected to a gigabit switch and the seven chassis’ were all connected to one switch. The parallel PageRank codes use the Portable, Extensible Toolkit for Scientific Computation (PETSc) =-=[4, 5]-=- to implement basic linear algebra operations and basic iterative procedures on parallel sparse matrices. In particular, PETSc contains parallel implementations of many linear solvers, including GMRES... |

144 | C.D.: “Deeper Inside PageRank
- Langville, Meyer
(Show Context)
Citation Context ...tructure, was introduced in [25, 8] and has been widely used since then. PageRank computations are a key component of modern Web search ranking systems. For a general review of PageRank computing see =-=[22, 7]-=-. Until recently, the PageRank vector was primarily used to calculate a global importance score for each page on the web. These scores were recomputed for each new Web graph crawl. Recently, significa... |

133 | Extrapolation methods for accelerating pagerank computations
- Kamvar, Haveliwala, et al.
- 2003
(Show Context)
Citation Context ...ethods to accelerate and parallelize these computations are important. Various methods to accelerate the simple power iterations process have already been developed, including an extrapolation method =-=[19]-=-, a block-structure method [18], and an adaptive method [17]. Traditionally, PageRank has been computed as the principle eigenvector of a Markov chain probability transition matrix. In this paper we c... |

129 | Exploiting the block structure of the web for computing PageRank
- Kamvar, Haveliwala, et al.
- 2003
(Show Context)
Citation Context ...elize these computations are important. Various methods to accelerate the simple power iterations process have already been developed, including an extrapolation method [19], a block-structure method =-=[18]-=-, and an adaptive method [17]. Traditionally, PageRank has been computed as the principle eigenvector of a Markov chain probability transition matrix. In this paper we consider the PageRank linear sys... |

94 | Ranking the Web frontier
- Eiron, McCurley, et al.
- 2004
(Show Context)
Citation Context ... pages without out-links, called dangling nodes. Dangling pages present a problem for the mathematical PageRank formulation. A review of various approaches dealing with dangling pages can be found in =-=[11]-=-. One way to overcome this difficulty, is to slightly change the transition matrix P to a truly row-stochastic matrix where di = δ deg(i) 0 P ′ = P + d · v T , (2) is the dangling page indicator, and ... |

87 | SuperLU DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems
- Li, Demmel
(Show Context)
Citation Context ...GB yahoo-r3 60M 850M 10.4 GB db 70M 1B 12.3 GB av 1.4B 6.6B 80 GB Table 2: Basic statistics for the data sets used in the experiments. matrix size and computational recourses. Sparse LU factorization =-=[23]-=- can still be considered for smaller size problems (or subproblems), but it does create additional fill in. It is also notoriously hard to parallelize. In this paper we concentrate on the use of itera... |

84 | Parallel dynamic graph partitioning for adaptive unstructured meshes
- Walshaw, Cross, et al.
- 1997
(Show Context)
Citation Context ...to balance work and minimize communication between the processors. This classical graph partitioning problem is NP - hard, and approximate graph partitioning schemes such as ParMeTiS [21] and Pjostle =-=[27]-=- do not work well on such large power-law data. Thus, we restricted ourselves to a simplified heuristic method to balance the work load between processors. By default, PETSc balances the number of mat... |

70 | The second eigenvalue of the google matrix
- Haveliwala, Kamvar
- 2003
(Show Context)
Citation Context ...ar class of problems arising from PageRank computations on parallel architectures. It is well known that the random teleportation used in PageRank strongly affects the convergence of power iterations =-=[15]-=-. It has also been shown that high teleportation can help spam pages to accumulate PageRank [13], but a reduction in teleportation typically hampers the convergence of standard power methods. In this ... |

66 | A survey on pagerank computing
- Berkhin
- 2005
(Show Context)
Citation Context ...tructure, was introduced in [25, 8] and has been widely used since then. PageRank computations are a key component of modern Web search ranking systems. For a general review of PageRank computing see =-=[22, 7]-=-. Until recently, the PageRank vector was primarily used to calculate a global importance score for each page on the web. These scores were recomputed for each new Web graph crawl. Recently, significa... |

55 | Pagerank computation and the structure of the web: Experiments and algorithms
- Arasu, Novak, et al.
- 2002
(Show Context)
Citation Context ...ise personalized PageRanks. For this linear system, we study the performance of advanced iterative methods in a parallel environment. Casting PageRank as a linear system was suggested by Arasu et al. =-=[2]-=- where Jacobi, Gauss-Seidel, and Successive Over-Relaxation iterative methods were considered. Numerical solutions for various Markov chain problems are also investigated in [26]. 2.2 Iterative Method... |

48 | Adaptive methods for the computation of pagerank
- Kamvar, Haveliwala, et al.
- 2003
(Show Context)
Citation Context ...important. Various methods to accelerate the simple power iterations process have already been developed, including an extrapolation method [19], a block-structure method [18], and an adaptive method =-=[17]-=-. Traditionally, PageRank has been computed as the principle eigenvector of a Markov chain probability transition matrix. In this paper we consider the PageRank linear system formulation and its itera... |

38 | A coarse-grain parallel formulation of multilevel k-way graphpartitioning algorithm
- Karypis, Kumar
- 1997
(Show Context)
Citation Context ...rmute A in order to balance work and minimize communication between the processors. This classical graph partitioning problem is NP - hard, and approximate graph partitioning schemes such as ParMeTiS =-=[21]-=- and Pjostle [27] do not work well on such large power-law data. Thus, we restricted ourselves to a simplified heuristic method to balance the work load between processors. By default, PETSc balances ... |

35 |
Efficient PageRank approximation via graph aggregation
- Broder, Lempel, et al.
- 2004
(Show Context)
Citation Context ...ime std bcgs 10 0 500 1000 1500 2000 2500 −8 Time (sec) Figure 6: Convergence on the full “av” Web graph. nodes and 6.6 billion edges. Other experiments with this data have been done by Broder et al. =-=[9]-=-. Our main results on this graph are presented in Figure (6) and Table (3). It presents performance of simple power iterations and the BiCGSTAB method. For the linear system we show the normalized res... |

16 | Numerical methods for computing stationary distributions of finite irreducible markov chains
- Stewart
- 1999
(Show Context)
Citation Context ...gested by Arasu et al. [2] where Jacobi, Gauss-Seidel, and Successive Over-Relaxation iterative methods were considered. Numerical solutions for various Markov chain problems are also investigated in =-=[26]-=-. 2.2 Iterative Methods The PageRank linear system matrix A = I − cP T is very large, sparse and non-symmetric. Solution of the linear system Eq. (7) by a direct method is not feasible due to thesName... |

3 |
Parallel PageRank Computation on a Gigabit PC Cluster
- Manaskasemsak, Rungsawang
- 2004
(Show Context)
Citation Context ...he PageRank equation. To that end, our goal was to keep the entire Web graph in memory on a distributed memory parallel computer while computing the PageRank vector. An alternate approach explored in =-=[24]-=- is to store a piece of the Web-graph on separate hard disks for each processor and iterate through these files as necessary. Our parallel computer was a Beowulf cluster of RLX blades connected in a s... |