## A fast two-stage algorithm for computing pagerank and its extensions (2003)

Citations: | 36 - 0 self |

### BibTeX

@TECHREPORT{Lee03afast,

author = {Chris Pan-chi Lee},

title = {A fast two-stage algorithm for computing pagerank and its extensions},

institution = {},

year = {2003}

}

### Years of Citing Articles

### OpenURL

### Abstract

We present a fast two-stage algorithm for computing the PageRank vector [16]. The algorithm exploits the following observation: the homogeneous discrete-time Markov chain associated with PageRank is lumpable, with the lumpable subset of nodes being the dangling nodes [13]. Time to convergence is only a fraction of what’s required for the standard algorithm employed by Google [16]. On data of 451,237 webpages, convergence was achieved in 20 % of the time. Our algorithm also replaces a common practice which is in general incorrect. Namely, the practice of ignoring the dangling nodes until the last stages of computation [16] does not necessarily accelerate convergence. In comparison, our algorithm is provable, generally applicable, and achieves the desired speedup. The paper ends with a discussion of possible extensions that generalize the divide-and-conquer theme. We describe two variations that incorporate a multi-stage algorithm. In the first variation, the ordinary PageRank vector is computed. In the second variation, the algorithm computes a generalized version of PageRank where webpages are divided into several classes, each incorporating a different personalization vector. The latter represents a major modeling extension and introduces greater flexibility and a potentially more refined model for web traffic.

### Citations

4985 |
Matrix analysis
- Horn, Johnson
- 1985
(Show Context)
Citation Context ...h the eigenvalue 1 or the null vector 0. PROOF. We have stated earlier that P (2) is positive, rank-two, and has rows that sum to 1. The Perron-Frobenius Theorem (cf. pp. 27-32 of [2], pp. 508-511 of =-=[8]) -=-establishes the first claim. Next, suppose P (2) does not have a third distinct eigenvalue. The algebraic multiplicity of 0 is necessarily (M − 1). The Jordan canonical form of P (2) establishes the... |

2358 | The PageRank citation ranking: Bringing order to the web. http://www-db.stanford.edu/ b̃ackrub/pageranksub.ps
- Page, Brin, et al.
- 1998
(Show Context)
Citation Context ... A Fast Two-Stage Algorithm for Computing PageRank ABSTRACT Chris Pan-Chi Lee ∗ Stanford University cpclee@stanford.edu In this paper we present a fast two-stage algorithm for computing the PageRank=-= [16]-=- vector. Our algorithm exploits the observation that the homogeneous discrete-time Markov chain associated with PageRank is lumpable [13]; the lumpable subset of nodes are precisely the dangling nodes... |

2069 |
Matrix Computations
- Golub, Loan
- 1983
(Show Context)
Citation Context ...matrix, and each row sums to 1. Specifically, the Markov chain associated with P is irreducible and aperiodic. 5 The PerronFrobenius Theorem and the Power Method (cf. pp. 27-32 of [2], pp. 330-332 of =-=[5]) guar-=-antee that for such a matrix a unique limiting distribution π T = limk→∞ π T 0 P k exists regardless of the initial distribution. The PageRank vector is defined to be this limiting distribution ... |

828 |
Finite Markov Chains
- Kemeny, Snell
- 1960
(Show Context)
Citation Context ... present a fast two-stage algorithm for computing the PageRank [16] vector. Our algorithm exploits the observation that the homogeneous discrete-time Markov chain associated with PageRank is lumpable =-=[13]-=-; the lumpable subset of nodes are precisely the dangling nodes. As a result the algorithm can converge in a fraction of the time compared to the standard PageRank algorithm [16]. On data of 451,237 w... |

317 | Scaling personalized web search
- Jeh, Widom
- 2003
(Show Context)
Citation Context ...s constantly updated, added, or removed, the PageRank vector needs to be re-computed continusouly to maintain timeliness and relevance of the search results. In the context of personalized web search =-=[9], a nu-=-mber of PageRank vectors need to be computed to ∗ Scientific Computing & Computational Mathematics Program † Department of Computer Science ‡ Division of Operations, Information, & Technology, G... |

142 | Extrapolation methods for accelerating PageRank computations
- Kamvar, Haveliwala, et al.
- 2003
(Show Context)
Citation Context ... many of these focus on numerical linear algebra techniques. A Gauss-Seidel algorithm is discussed in [1] where the most recent component values of the PageRank vector are used in the computation. In =-=[12]-=-, one periodically subtracts away approximation of the sub-dominant eigenvectors to accelerate convergence. It is noted in [11] that when sorted by url, the Google matrix has a block structure; hence,... |

138 | Exploiting the block structure of the Web for computing PageRank
- Kamvar, Haveliwala, et al.
- 2003
(Show Context)
Citation Context ...r, yet the computation poses a numerically daunting challenge [15]. With billions of webpages already in existence, computing the PageRank vector is a very time-consuming procedure. It is reported in =-=[11]-=- that the computation of a PageRank vector over 290 million webpages requires as much as 3 hours 1 ; the computing time for a realistically large subset of the entire web would take days. Furthremore,... |

103 | WebBase: A repository of Web pages
- Hirai, Raghavan, et al.
(Show Context)
Citation Context ...catenate the results from Step 2 and Step 5 to get π(k) for all k ∈ S. This is the limiting distribution of P , or the PageRank vector. 7 According to [11], a 2001 crawl by Stanford’s WebBase pro=-=ject [7] con-=-tains 290 million pages in total; only 70 million are nondangling.sWe’ll now formalize Steps 1, 2, 4, and 5. We’ll give specific numerical algorithms for an efficient implementation of these steps... |

77 | The second eigenvalue of the Google matrix
- Haveliwala, Kamvar
- 2003
(Show Context)
Citation Context ... sparse matrix ( ˜ P ) plus a dense vector (u). Multiplication is done separately to those components and added together subsequently. 4 A typical value for c is between 0.85 and 0.95. It is shown in=-= [6]-=- that c controls the convergence rate of the PageRank algorithm. 5 The positivity ensures a direct positive-probability path between any two pages, and hence the irreducible and aperiodic properties. ... |

77 | Stochastic complementation uncoupling Markov chains and the theory of nearly reducible systems
- Meyer
- 1989
(Show Context)
Citation Context ...ormance gains we bring to forefront a powerful technique for statespace reduction. This technique of lumping is disctinctively different from the better-known technique of state aggregation (cf. [3], =-=[14]-=-, [17]), which we also make use of in this paper. Thus, we have a two-stage algorithm where during each stage a different statespace reduction method is used; the reduction is aggressive, the overall ... |

59 | Pagerank computation and the structure of the web: Experiments and algorithms
- Arasu, Novak, et al.
- 2001
(Show Context)
Citation Context ...verall amount of computing time. A number of papers discuss accelerating PageRank computation, and many of these focus on numerical linear algebra techniques. A Gauss-Seidel algorithm is discussed in =-=[1]-=- where the most recent component values of the PageRank vector are used in the computation. In [12], one periodically subtracts away approximation of the sub-dominant eigenvectors to accelerate conver... |

51 | Adaptive methods for the computation of PageRank
- Kamvar, Haveliwala, et al.
(Show Context)
Citation Context ...x has a block structure; hence, a PageRank vector can be computed separately for each block, and the results are pasted together to yield a good starting iterate for the entire matrix. It is noted in =-=[10]-=- that components of the PageRank vector converge at different rates, and hence by not re-computing components that have converged performance gains are realized. This paper contributes to this growing... |

32 |
Iterative aggregation/disaggregation techniques for nearly uncoupled Markov chains
- Cao, Stewart
- 1985
(Show Context)
Citation Context ... performance gains we bring to forefront a powerful technique for statespace reduction. This technique of lumping is disctinctively different from the better-known technique of state aggregation (cf. =-=[3]-=-, [14], [17]), which we also make use of in this paper. Thus, we have a two-stage algorithm where during each stage a different statespace reduction method is used; the reduction is aggressive, the ov... |

19 |
Aggregation of Variables in Dynamic Systems. Econometrica
- Simon, Ando
- 1961
(Show Context)
Citation Context ...blocks (super nodes). The block-level transitions yield another Markov chain the transition probabilities of which can be very easily calculated. Unlike conventional state aggregation (cf. [3], [14], =-=[17]), lu-=-mping doesn’t require prior knowledge or computation of aggregation weights. Lumping is thus very effective in reducing the size of the statespace. Definition 1. Suppose M ∈ R n×n is the transiti... |

15 |
The world’s largest matrix computation
- Moler
- 2002
(Show Context)
Citation Context ...nt amout of interest in the research community. The Markov chain interpretation gives an explicit model for web traffic and surfer behavior, yet the computation poses a numerically daunting challenge =-=[15]-=-. With billions of webpages already in existence, computing the PageRank vector is a very time-consuming procedure. It is reported in [11] that the computation of a PageRank vector over 290 million we... |

9 | Quasi-Lumpability, Lower Bounding Coupling Matricies, and nearly Completely Decomposable Markov Chains
- Dayer, Stewart
- 1997
(Show Context)
Citation Context ...al in this paper is to present an algorithm that is a substantial improvement over Algorithm 1. Our approach is based on the observation that the Markov chain associated with P is lumpable (cf. [13], =-=[4]-=-). In general, a Markov chain is lumpable if its transition probabilities satisfy certain properties that allow its states (nodes) to be combined into blocks (super nodes). The block-level transitions... |