## Deeper inside pagerank (2004)

### Cached

### Download Links

Venue: | Internet Mathematics |

Citations: | 144 - 4 self |

### BibTeX

@ARTICLE{Langville04deeperinside,

author = {Amy N. Langville and Carl D. Meyer},

title = {Deeper inside pagerank},

journal = {Internet Mathematics},

year = {2004},

volume = {1},

pages = {2004}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniqueness, and convergence properties, possible alterations to the basic model, suggested alternatives to the traditional solution methods, sensitivity and conditioning, and finally the updating problem. We introduce a few new results, provide an extensive reference list, and speculate about exciting areas of future research. 1.

### Citations

3262 | The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems
- Brin, Page
- 1998
(Show Context)
Citation Context ... which were later implemented into their search engine Google. Of course, it is impossible to surmise the details of Google’s implementation since the publicly disseminated details of the 1998 papers =-=[25, 26, 27]-=-. Nevertheless, we do know that PageRank remains “the heart of [Google’s] software ... and continues to provide the basis for all of [their] web search tools”, as cited directly from the Google webpag... |

2719 | Authoritative sources in a hyperlinked environment
- Kleinberg
- 1999
(Show Context)
Citation Context ...e hyperlink structure of the Web to improve search engine results, an innovative idea at the time, as most search engines used only textual content to return relevant documents. He presented his work =-=[74]-=-, begun a year earlier at IBM, in January 1998 at the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms held in San Francisco, California. Very nearby, at Stanford University, two Ph.D. candidate... |

2146 | The pagerank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project. url: citeseer.ist.psu.edu
- Page, Brin, et al.
- 1998
(Show Context)
Citation Context ...rowing business. In a public presentation at the Seventh International World Wide Web conference (WWW98) in Brisbane, Australia, their paper “The PageRank citation ranking: Bringing order to the Web” =-=[27]-=- made small ripples in the information science community that quickly turned into waves. The connections between the two models are striking (see [78]) and it’s hard to say whether HITS influenced Pag... |

1853 | A.: On the evolution of random graphs
- Erdös, Rényi
- 1960
(Show Context)
Citation Context ...wer law exponents. Recent work by Barabasi et al. [10, 11, 52] has uncovered the scale-free structure of the Web. This new discovery disputed earlier claims about the random network nature of the Web =-=[45]-=- and the smallworld nature of the Web [114]. This model, called the scale-free model, describes well the various power law distributions that have been witnessed for node indegree, outdegree and PageR... |

1253 | On power-law relationships of the internet topology
- Faloutsos, Faloutsos, et al.
- 1999
(Show Context)
Citation Context ...s model, called the scale-free model, describes well the various power law distributions that have been witnessed for node indegree, outdegree and PageRank as well as the average degree of separation =-=[10, 49, 96]-=-. The scale-free structure of the Web explains the emergence of hubs and a new node’s increasing struggle to gain importance as time marches on. We view the use of the scale-free structure to improve ... |

1090 |
The Algebraic Eigenvalue Problem
- Wilkinson
- 1965
(Show Context)
Citation Context ...o create a probability vector, the effect is minimal. The ill-conditioning of the 19linear system does not imply that the corresponding eigensystem is ill-conditioned, a fact documented by Wilkinson =-=[106]-=- (with respect to the inverse iteration method). To answer the questions about how changes in P affect π T what we need to examine is eigenvector sensitivity, not linear system sensitivity. A crude st... |

922 | The google file system
- Ghemawat, Gobioff, et al.
- 2003
(Show Context)
Citation Context ...e issues nontrivial. In this section, we provide a brief discussion of more detailed storage issues for implementation. The 1998 paper by Brin and Page [26] and more recent papers by Google engineers =-=[13, 56]-=- provide detailed discussions of the many storage schemes used by the Google search engine for all parts of its information retrieval system. The excellent survey paper by Arasu et al. [6] also provid... |

783 |
Denumerable Markov Chains
- Kemeny, Snell, et al.
- 1966
(Show Context)
Citation Context ...0]. Calculating the group inverse for a web-sized matrix is not a practical option. Similar analyses use mean first passage times, the fundamental matrix, or an LU factorization to update π T exactly =-=[35, 55, 73, 106]-=-. Yet these are also expensive means of obtaining ˜π T and remain computationally impractical. These classical Markov chain updating methods are also considered static, in that they only accommodate u... |

584 |
Introduction to the Numerical Solution of Markov Chains
- Stewart
- 1995
(Show Context)
Citation Context ... primitive matrix will converge to this stationary vector. Further, the convergence rate of the power method is determined by the magnitude of the subdominant eigenvalue of the transition rate matrix =-=[100]-=-. 3.1 The Markov model of the Web We begin by showing how Brin and Page, the founders of the PageRank model, force the transition probability matrix, which is built from the hyperlink structure of the... |

491 |
der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods
- Barrett, Berry, et al.
- 1993
(Show Context)
Citation Context ...is an adjacency matrix, (x T ). ∗ (diag(D −1 ))G now requires an additional nnz(P) additions for a total savings of nnz(P) − n multiplications. In addition, for large matrices compact storage schemes =-=[12]-=-, such as compressed row storage or compressed column storage, are often used. Of course, each compressed format, while saving some storage, requires a bit more overhead for matrix operations. Rather ... |

418 | Topic-sensitive pagerank
- Haveliwala
- 2002
(Show Context)
Citation Context ...rgence measure. This raises several interesting issues: How does one measure the difference between two orderings? How does one determine when an ordering has converged satisfactorily? Several papers =-=[44, 47, 48, 58, 60, 86]-=- have provided a variety of answers to the question of comparing rank orderings, using such measures as Kendall’s Tau, rank aggregation, and set overlap. 5.1.2 Acceleration Techniques for the PageRank... |

404 | Improved algorithms for topic Distillation in a hyperlinked environment
- Bharat, Henzinger
- 1998
(Show Context)
Citation Context ...l scoring vector, whereas HITS must compute two eigenvector calculations at query time. Numerous modifications and improvements to both HITS and PageRank and hybrids between the two have been created =-=[4, 16, 17, 24, 30, 32, 37, 40, 41, 42, 43, 50, 51, 85, 99, 116]-=-. Several groups have suggested incorporating text information into the link analysis [16, 38, 60, 67, 103]. Two other novel methods have been introduced, one based on entropy concepts [72] and anothe... |

358 |
Matrix Analysis and Applied Linear Algebra
- Meyer
- 2001
(Show Context)
Citation Context ...e Markov chain with a primitive transition matrix is called an aperiodic chain. Frobenius discovered a simple test for primitivity: the matrix A ≥ 0 is primitive if and only if A m > 0 for some m > 0 =-=[81]-=-. This test is useful in determining whether the power method applied to a matrix will converge. 3assume that, starting from any node (webpage), it is equally likely to follow any of the outgoing lin... |

354 |
2002): Linked: The New Science of Networks
- Barabási
- 2002
(Show Context)
Citation Context ...lves small bowties, and so on. The fractal nature of the web appears with respect to many of its properties including inlink, outlink, and PageRank power law exponents. Recent work by Barabasi et al. =-=[10, 11, 52]-=- has uncovered the scale-free structure of the Web. This new discovery disputed earlier claims about the random network nature of the Web [45] and the smallworld nature of the Web [114]. This model, c... |

330 | Rank aggregation methods for the web
- Dwork, Kumar, et al.
- 2001
(Show Context)
Citation Context ...rgence measure. This raises several interesting issues: How does one measure the difference between two orderings? How does one determine when an ordering has converged satisfactorily? Several papers =-=[44, 47, 48, 58, 60, 86]-=- have provided a variety of answers to the question of comparing rank orderings, using such measures as Kendall’s Tau, rank aggregation, and set overlap. 5.1.2 Acceleration Techniques for the PageRank... |

295 | Scaling Personalized Web Search
- Jeh, Widom
(Show Context)
Citation Context ...s spam prevention abilities, creating personalized PageRanking systems. Personalization is a hot area since some predict personalized engines as the future of search. See the Stanford research papers =-=[41, 60, 62, 67, 103]-=-. While the concept of personalization (producing a π T for each user’s v T vector) sounds wonderful in theory, doing this in practice is computationally impossible. (Recall that it takes Google days ... |

259 |
Nonnegative matrices in the mathematical sciences
- Berman, Plemmons
- 1979
(Show Context)
Citation Context ...ting properties of the coefficient matrix in this equation. Properties of (I − α¯P): 1. (I − α¯P) is an M-matrix. 2 Proof: Straightforward from the definition of M-matrix given by Berman and Plemmons =-=[13]-=- or Meyer [81]. □ 2 Consider the real matrix A that has aij ≤ 0 for all i ̸= j and aii ≥ 0 for all i. A can be expressed as A = sI − B, where s > 0 and B ≥ 0. When s ≥ ρ(B), the spectral radius of B, ... |

224 |
Random walks on finite groups and rapidly mixing Markov chains. Seminar on probability
- Aldous
- 1983
(Show Context)
Citation Context ...versity of Toronto. All three groups have computed bounds on the difference between the old PageRank vector πT and the new, updated PageRank vector ˜π T . Using Aldous’ notion of variational distance =-=[5]-=-, Ng et al. [94] arrive at ‖π T − ˜π T ‖1 ≤ 2 ∑ πi, 1 − α where U is the set of all pages that have been updated. Bianchini et al. [19], using concepts of energy 2 flow, and Borodin et al. [80] improv... |

222 | SALSA: The stochastic approach for link-structure analysis
- Lempel, Moran
- 2001
(Show Context)
Citation Context ...ost likely to change and most frequently changing pages on the Web [48]. A fourth group of researchers recently joined the stability discussion. Lempel and Moran, the inventors of the SALSA algorithm =-=[75]-=-, have added a further distinction to the definition of stability. In [76], they note that stability of an algorithm, which concerns volatility of the scores assigned to pages, has been well-studied. ... |

220 | Scale-free characteristics of random networks: The topology of the World Wide Web, Physica A 281
- Barabasi, Albert, et al.
- 2000
(Show Context)
Citation Context ...t on the rankings. Removing a random portion of the graph amounts to removing a very large proportion of non-authoritative pages compared to authoritative pages, due to the Web’s scale-free structure =-=[11]-=-. (A more detailed description of the scale-free structure of the Web comes in section 9.) A better indication of PageRank’s stability (or any ranking algorithm’s stability) is its sensitivity to care... |

219 | Graph structure in the web
- Broder, Kumar, et al.
- 2000
(Show Context)
Citation Context ...n be done in parallel requiring O(n(PCC)) time, where n(PCC) is the size of the largest connected component. This is theoretically promising, however, the bowtie structure discovered by Broder et al. =-=[28]-=- shows that the largest connected component for a web graph is composed of nearly 30% of the nodes, so the savings are not overwhelming. 7 Sensitivity, Stability, and Condition Numbers Section 6 discu... |

217 | H.: The evolution of the web and implications for an incremental crawler
- Cho, Garcia-Molina
- 2000
(Show Context)
Citation Context ... a brief introduction to the updating problem. Here we present a more thorough analysis. We begin by emphasizing the need for updating the PageRank vector frequently. A study by Cho and Garcia-Molina =-=[31]-=- in 2000 reported that 40% of all webpages in their dataset changed within a week, and 23% of the .com pages changed daily. In a much more extensive and recent study, the results of Fetterly et al. [4... |

193 | Comparing top k lists
- Fagin, Kumar, et al.
- 2003
(Show Context)
Citation Context ...rgence measure. This raises several interesting issues: How does one measure the difference between two orderings? How does one determine when an ordering has converged satisfactorily? Several papers =-=[44, 47, 48, 58, 60, 86]-=- have provided a variety of answers to the question of comparing rank orderings, using such measures as Kendall’s Tau, rank aggregation, and set overlap. 5.1.2 Acceleration Techniques for the PageRank... |

186 | The missing link: A probabilistic model of document content and hypertext connectivity
- Cohn
(Show Context)
Citation Context ...ank and hybrids between the two have been created [4, 16, 17, 24, 30, 32, 37, 40, 41, 42, 43, 50, 51, 85, 99, 116]. Several groups have suggested incorporating text information into the link analysis =-=[16, 38, 60, 67, 103]-=-. Two other novel methods have been introduced, one based on entropy concepts [72] and another using flow [111]. A final related algorithm is the SALSA method of Lempel and Moran [81], which uses a bi... |

177 | A large-scale study of the evolution of web pages
- Fetterly, Manasse, et al.
- 2003
(Show Context)
Citation Context ...tions, namely perturbations of the hubs or high PageRank pages. In fact, this paints a much more realistic picture as these are the most likely to change and most frequently changing pages on the Web =-=[48]-=-. A fourth group of researchers recently joined the stability discussion. Lempel and Moran, the inventors of the SALSA algorithm [75], have added a further distinction to the definition of stability. ... |

174 | Representing Web Graphs
- Raghavan, Molina
- 2003
(Show Context)
Citation Context ...r’s recent book [85] gives one possible implementation of the power method applied to an adjacency list, along with sample Matlab code. When the adjacency list does not fit in main memory, references =-=[92, 94]-=- suggest methods for compressing the data. References [26, 53] take the other approach and suggest I/O-efficient implementations of PageRank. Since the PageRank vector itself is large and completely d... |

161 | The webgraph framework I: Compression techniques
- Boldi, Vigna
(Show Context)
Citation Context ..., to impressively compress the information in a standard web graph. These techniques are freely available in the graph compression tool WebGraph, which is produced by Paolo Boldi and Sebastiano Vigna =-=[22, 23]-=-. The final storage issue we discuss concerns dangling nodes. The pages of the web can be classified as either dangling nodes or nondangling nodes. Recall that dangling nodes are webpages that contain... |

150 | The intelligent surfer: Probabilistic combination of link and content information
- Richardson, Domingos
(Show Context)
Citation Context ...s spam prevention abilities, creating personalized PageRanking systems. Personalization is a hot area since some predict personalized engines as the future of search. See the Stanford research papers =-=[36, 55, 57, 62, 95]-=-. While the concept of personalization (producing a π T for each user’s v T vector) sounds wonderful in theory, doing this in practice is computationally impossible. (Recall that it takes Google days ... |

133 | Extrapolation methods for accelerating pagerank computations - Kamvar, Haveliwala, et al. - 2003 |

132 | Efficient Computation of PageRank
- Haveliwala
- 1999
(Show Context)
Citation Context ...r method 5applied to an adjacency list, along with sample Matlab code. When the adjacency list does not fit in main memory, references [100, 102] suggest methods for compressing the data. References =-=[31, 58]-=- take the other approach and suggest I/O-efficient implementations of PageRank. Since the PageRank vector itself is large and completely dense, containing over 4.3 billion pages, and must be consulted... |

129 | Exploiting the block structure of the web for computing PageRank
- Kamvar, Haveliwala, et al.
- 2003
(Show Context)
Citation Context ...d, their extension to Aitken extrapolation, known as quadratic extrapolation, reduces PageRank computation time by 50-300% with minimal overhead. The same group of Stanford researchers, Kamvar et al. =-=[70]-=- has produced one more contribution to the acceleration of PageRank. This method straddles the classes above because it uses aggregation to reduce both the number of iterations and the work per iterat... |

127 | Learning to probabilistically identify authoritative documents
- Cohn, Chang
- 2000
(Show Context)
Citation Context ...l scoring vector, whereas HITS must compute two eigenvector calculations at query time. Numerous modifications and improvements to both HITS and PageRank and hybrids between the two have been created =-=[4, 16, 17, 24, 30, 32, 37, 40, 41, 42, 43, 50, 51, 85, 99, 116]-=-. Several groups have suggested incorporating text information into the link analysis [16, 38, 60, 67, 103]. Two other novel methods have been introduced, one based on entropy concepts [72] and anothe... |

116 |
The connectivity server: Fast access to linkage information on the web
- Bharat, Broder, et al.
- 1998
(Show Context)
Citation Context ...n cached in main memory, thus speeding query processing. Because of their potential and promise, we briefly discuss two methods for compressing the information in an adjacency list, the gap technique =-=[15]-=- and the reference encoding technique [101, 102]. The gap method exploits the locality of hyperlinked pages. The source and destination pages for a hyperlink are often close to each other lexicographi... |

106 | Stable algorithms for link analysis
- Ng, Zheng, et al.
- 2001
(Show Context)
Citation Context ...ranking pages are updated can significantly affect the global PageRank. The experiments done by the Berkeley group involve removing a random 30% of their dataset and recomputing the importance vector =-=[87]-=-. (The Toronto group conducted similar experiments on much smaller datasets [74].) Their findings show that PageRank is stable under such perturbation. However, we contest that these results may be mi... |

97 | Challenges in web search engines
- Henzinger, Motwani, et al.
(Show Context)
Citation Context ...o the algorithmic challenge of using changes in data streams to locate interesting trends, a challenge identified by Monika Henzinger in her 2003 paper, “Algorithmic Challenges in Web Search Engines” =-=[63]-=-. 9.4 Structure on many levels A final prediction for future research is the exploitation of the Web’s structure in all aspects of information retrieval. The Web has structure on many different levels... |

95 | Using pagerank to characterize web structure. COCOON, August 2002. 55
- Pandurangan, Raghavan, et al.
- 2009
(Show Context)
Citation Context ... exact PageRankings well. However, a paper by Prabahkar Raghavan et al. disputes this claim noting that “there is very little correlation on the web graph between a node’s in-degree and its PageRank” =-=[88]-=-. Intuitively, this makes sense. PageRank’s thesis is that it is not the quantity of inlinks to a page that counts, but rather, the quality of inlinks. While approximations to PageRank have not proved... |

84 |
Adaptive on-line page importance computation
- Abiteboul, Preda, et al.
- 2003
(Show Context)
Citation Context ...f section 5.1.2 can be used in conjunction with our method. While our updating solution can be applied to any Markov chain, other updating techniques tailored completely to the PageRank problem exist =-=[3, 19, 70, 113]-=-. These techniques often use the crawlers employed by the search engine to adaptively update PageRank approximately, without requiring storage of the transition matrix. Although the dynamic nature of ... |

74 |
Inside PageRank
- Bianchini, Gori, et al.
(Show Context)
Citation Context ...entists, dropped the restriction to the power method. In their short paper, Arasu et al. [7] provide one small experiment with the GaussSeidel method applied to the PageRank problem. Bianchini et al. =-=[17]-=- suggest using the Jacobi method to compute the PageRank vector. Despite this progress, these are just beginnings. If the holy grail of real-time personalized search is ever to be realized, then drast... |

70 | The second eigenvalue of the google matrix
- Haveliwala, Kamvar
- 2003
(Show Context)
Citation Context ...rates on Google’s version of the full web. The asymptotic rate of convergence of the PageRank power method is governed by the subdominant eigenvalue of the transition matrix ¯P. Kamvar and Haveliwala =-=[61]-=- have proven that, regardless of the 8value of the personalization vector vT in E = evT , this subdominant eigenvalue is equal to the scaling factor α for a reducible hyperlink matrix P and strictly ... |

68 | An analytical comparison of approaches to personalizing pagerank
- Haveliwala, Kamvar, et al.
- 2003
(Show Context)
Citation Context ...s spam prevention abilities, creating personalized PageRanking systems. Personalization is a hot area since some predict personalized engines as the future of search. See the Stanford research papers =-=[41, 60, 62, 67, 103]-=-. While the concept of personalization (producing a π T for each user’s v T vector) sounds wonderful in theory, doing this in practice is computationally impossible. (Recall that it takes Google days ... |

68 | A survey of eigenvector methods for web information retrieval
- Langville, Meyer
- 2005
(Show Context)
Citation Context ...Rank citation ranking: Bringing order to the Web” [27] made small ripples in the information science community that quickly turned into waves. The connections between the two models are striking (see =-=[78]-=-) and it’s hard to say whether HITS influenced PageRank, or vice versa, or whether both developed independently. Nevertheless, since that eventful year, PageRank has emerged as the dominant link analy... |

62 |
analysis, eigenvectors and stability
- Link
- 2001
(Show Context)
Citation Context ...to. All three groups have computed bounds on the difference between the old PageRank vector π T and the new, updated PageRank vector ˜π T . Using Aldous’ notion of variational distance [5], Ng et al. =-=[86]-=- arrive at ‖π T − ˜π T ‖1 ≤ 2 1 − α ∑ πi, where U is the set of all pages that have been updated. Bianchini et al. [17], using concepts of energy 2 2α flow, and Borodin et al. [74] improve upon this b... |

55 | Pagerank computation and the structure of the web: Experiments and algorithms
- Arasu, Novak, et al.
- 2002
(Show Context)
Citation Context ... law structure to speed ranking computations. Yet another group of researchers from Stanford, joined by IBM scientists, dropped the restriction to the power method. In their short paper, Arasu et al. =-=[7]-=- provide one small experiment with the GaussSeidel method applied to the PageRank problem. Bianchini et al. [19] suggest using the Jacobi method to compute the PageRank vector. Despite this progress, ... |

55 | What can you do with a Web in your pocket
- Brin, Motwani, et al.
- 1998
(Show Context)
Citation Context ... which were later implemented into their search engine Google. Of course, it is impossible to surmise the details of Google’s implementation since the publicly disseminated details of the 1998 papers =-=[25, 26, 27]-=-. Nevertheless, we do know that PageRank remains “the heart of [Google’s] software ... and continues to provide the basis for all of [their] web search tools”, as cited directly from the Google webpag... |

53 | Searching the workplace web
- Fagin, Kumar, et al.
(Show Context)
Citation Context |

52 |
Numerical computing with Matlab
- Moler
- 2004
(Show Context)
Citation Context ...or L. For the tiny 6-node web from section 3, an adjacency list representation of the columns of L is: Node Inlinks from 1 3 2 1, 3 3 1 4 5, 6 5 3, 4 6 4, 5 Exercise 2.24 of Cleve Moler’s recent book =-=[93]-=- gives one possible implementation of the power method 5applied to an adjacency list, along with sample Matlab code. When the adjacency list does not fit in main memory, references [100, 102] suggest... |

50 | Query-free news search
- Henzinger, Chang, et al.
- 2003
(Show Context)
Citation Context ...am of text” information such as news and TV broadcasts. Such dynamic content creates challenges that need tailored solutions. One example is the query-free news search proposed by Google engineers in =-=[64]-=-. This is related to the algorithmic challenge of using changes in data streams to locate interesting trends, a challenge identified by Monika Henzinger in her 2003 paper, “Algorithmic Challenges in W... |

48 | Adaptive methods for the computation of pagerank
- Kamvar, Haveliwala, et al.
- 2003
(Show Context)
Citation Context ...l. Reduction in work per iteration 10Two methods have been proposed that clearly aim to reduce the work incurred at each iteration of the power method. The first method was proposed by Kamvar et al. =-=[69]-=- and is called adaptive PageRank. This method adaptively reduces the work at each iteration by taking a closer look at elements in the iteration vector. Kamvar et al. noticed that some pages converge ... |

44 | Pagerank, hits and a unified framework for link analysis - Ding, He, et al. - 2003 |

36 |
Urs Hölzle. Web search for a planet: The Google cluster architecture
- Barroso, Dean
- 2003
(Show Context)
Citation Context ...e issues nontrivial. In this section, we provide a brief discussion of more detailed storage issues for implementation. The 1998 paper by Brin and Page [26] and more recent papers by Google engineers =-=[13, 56]-=- provide detailed discussions of the many storage schemes used by the Google search engine for all parts of its information retrieval system. The excellent survey paper by Arasu et al. [6] also provid... |