## Towards scaling fully personalized PageRank (2004)

Venue: | In Proceedings of the 3rd Workshop on Algorithms and Models for the Web-Graph (WAW |

Citations: | 75 - 2 self |

### BibTeX

@INPROCEEDINGS{Fogaras04towardsscaling,

author = {Dániel Fogaras and Balázs Rácz},

title = {Towards scaling fully personalized PageRank},

booktitle = {In Proceedings of the 3rd Workshop on Algorithms and Models for the Web-Graph (WAW},

year = {2004},

pages = {105--117}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract Personalized PageRank expresses backlink-based page quality around user-selected pages in a similar way as PageRank expresses quality over the entire Web. Existing personalized PageRank algorithms can however serve on-line queries only for a restricted choice of page selection. In this paper we achieve full personalization by a novel algorithm that computes a compact database of simulated random walks; this database can serve arbitrary personal choices of small subsets of web pages. We prove that for a fixed error probability, the size of our database is linear in the number of web pages. We justify our estimation approach by asymptotic worst-case lower bounds; we show that exact personalized PageRank values can only be obtained from a database of quadratic size. 1

### Citations

3598 | The anatomy of a large-scale hypertextual web search engine
- Brin, Page
- 1998
(Show Context)
Citation Context ...values can only be obtained from a database of quadratic size. 1 Introduction The idea of topic sensitive or personalized ranking appears since the beginning of the success story of Google’s PageRank =-=[5,23]-=- and other hyperlink-based centrality measures [20,4]. Topic sensitivity is either achieved by precomputing modified measures over the entire Web [13] or by ranking the neighborhood of pages containin... |

2968 | Authoritative sources in a hyperlinked environment
- Kleinberg
- 1999
(Show Context)
Citation Context ...tic size. 1 Introduction The idea of topic sensitive or personalized ranking appears since the beginning of the success story of Google’s PageRank [5,23] and other hyperlink-based centrality measures =-=[20,4]-=-. Topic sensitivity is either achieved by precomputing modified measures over the entire Web [13] or by ranking the neighborhood of pages containing the query word [20]. These methods however work onl... |

2374 | The PageRank Citation Ranking: Bring Order to the Web
- Page, Brin, et al.
- 1999
(Show Context)
Citation Context ...values can only be obtained from a database of quadratic size. 1 Introduction The idea of topic sensitive or personalized ranking appears since the beginning of the success story of Google’s PageRank =-=[5,23]-=- and other hyperlink-based centrality measures [20,4]. Topic sensitivity is either achieved by precomputing modified measures over the entire Web [13] or by ranking the neighborhood of pages containin... |

633 | Communication complexity
- Kushilevitz, Nisan
- 1997
(Show Context)
Citation Context ....e. the output is the y th bit of the input vector. To compute the proper output they have to communicate, and communication is restricted in the direction A → B. The one-way communication complexity =-=[22]-=- of this function is the required bits of transfer in the worst case for the best protocol. Theorem 5 ([17]). Any protocol that outputs the correct answer to the bit-vector probing problem with probab... |

452 | Topic-Sensitive PageRank
- Haveliwala
- 2002
(Show Context)
Citation Context ...nning of the success story of Google’s PageRank [5,23] and other hyperlink-based centrality measures [20,4]. Topic sensitivity is either achieved by precomputing modified measures over the entire Web =-=[13]-=- or by ranking the neighborhood of pages containing the query word [20]. These methods however work only for restricted cases or when the entire hyperlink structure fits into the main memory. In this ... |

376 | On the resemblance and containment of documents
- Broder
- 1997
(Show Context)
Citation Context ...ng to estimate the neighborhood function of web pages. Besides link-mining the paper [8] estimates the size of transitive closure for massive graphs occurring in databases. For text-mining algorithms =-=[6]-=- estimates the resemblance and containment of documents with a sampling technique. Random walks were used before to compute various web statistics, mostly focused on sampling the web (uniformly or acc... |

317 | Scaling personalized Web search
- Jeh, Widom
- 2003
(Show Context)
Citation Context ...[20]. These methods however work only for restricted cases or when the entire hyperlink structure fits into the main memory. In this paper we address the computational issues of personalized PageRank =-=[13,18]-=-. Just as all hyperlink based ranking methods, PageRank is based on the assumption that the existence of a hyperlink u → v implies that page u votes for the quality of v. Personalized PageRank (PPR) [... |

172 | The WebGraph framework I: Compression techniques
- Boldi, Vigna
- 2004
(Show Context)
Citation Context ...rted by the source page, and we will count the database scans and total I/O size as the efficiency measure of our algorithms. The assumption is made, since even with the latest compression techniques =-=[3]-=- it does not seem plausible to store the entire web graph in main memory. Under such assumption it is infeasible to generate the random walks one-by-one, as it would require random access to the edge-... |

156 | The intelligent surfer: Probabilistic combination of link and content information in PageRank
- Richardson, Domingos
- 2002
(Show Context)
Citation Context ...ubset of the individual pages personalization is available by linearity. Furthermore, the algorithm of [19] personalizes PageRank over hosts rather than single web pages. Instead of user preferences, =-=[25]-=- tunes PageRank automatically using the query keywords. To the best of our knowledge randomized algorithms are not very common in the link-mining community. A remarkable exception [24] applies probabi... |

139 | Exploiting the Block Structure of the Web for Computing PageRank - Kamvar, Haveliwala, et al. - 2003 |

121 | Size-estimation framework with applications to transitive closure and reachability
- Cohen
- 1994
(Show Context)
Citation Context ...gorithms are not very common in the link-mining community. A remarkable exception [24] applies probabilistic counting to estimate the neighborhood function of web pages. Besides link-mining the paper =-=[8]-=- estimates the size of transitive closure for massive graphs occurring in databases. For text-mining algorithms [6] estimates the resemblance and containment of documents with a sampling technique. Ra... |

109 |
On Near-Uniform URL Sampling
- Henzinger, Heydon, et al.
- 2000
(Show Context)
Citation Context ...lance and containment of documents with a sampling technique. Random walks were used before to compute various web statistics, mostly focused on sampling the web (uniformly or according to static PR) =-=[16,26,1,15]-=-, but also for calculating page decay [2] and similarity values [11]. The lower bounds of Section 4 show that precise PPR requires significantly larger database than Monte Carlo estimation does. Analo... |

101 | ANF: A fast and scalable tool for data mining in massive graphs - Palmer, Gibbons, et al. - 2002 |

77 | Approximating Aggregate Queries about Web Pages via Random Walks
- Bar-Yossef, Berg, et al.
(Show Context)
Citation Context ...lance and containment of documents with a sampling technique. Random walks were used before to compute various web statistics, mostly focused on sampling the web (uniformly or according to static PR) =-=[16,26,1,15]-=-, but also for calculating page decay [2] and similarity values [11]. The lower bounds of Section 4 show that precise PPR requires significantly larger database than Monte Carlo estimation does. Analo... |

77 | Najork M., "Measuring Index Quality Using Random Walks on the Web
- Henzinger, Heydon, et al.
- 1999
(Show Context)
Citation Context ...lance and containment of documents with a sampling technique. Random walks were used before to compute various web statistics, mostly focused on sampling the web (uniformly or according to static PR) =-=[16,26,1,15]-=-, but also for calculating page decay [2] and similarity values [11]. The lower bounds of Section 4 show that precise PPR requires significantly larger database than Monte Carlo estimation does. Analo... |

75 | Finding authorities and hubs from link structures on the world wide web
- Borodin, Roberts, et al.
- 2001
(Show Context)
Citation Context ...tic size. 1 Introduction The idea of topic sensitive or personalized ranking appears since the beginning of the success story of Google’s PageRank [5,23] and other hyperlink-based centrality measures =-=[20,4]-=-. Topic sensitivity is either achieved by precomputing modified measures over the entire Web [13] or by ranking the neighborhood of pages containing the query word [20]. These methods however work onl... |

74 | An analytical comparison of approaches to personalizing PageRank
- Haveliwala, Kamvar, et al.
- 2003
(Show Context)
Citation Context ...at precomputes a compact database. As described in Section 2, the database contains simulated random walks, and PPR is estimated on-line with a limited number of database accesses. Earlier algorithms =-=[14]-=- restricted personalization to a few topics, a subset of popular pages or to hosts; our algorithm on the other hand enables personalization for any small set of pages. Query time is linear in the numb... |

55 | Sic Transit Gloria Telae: Towards an Understading of the Web’s Decay
- Bar-Yossef, Broder, et al.
- 2004
(Show Context)
Citation Context ... technique. Random walks were used before to compute various web statistics, mostly focused on sampling the web (uniformly or according to static PR) [16,26,1,15], but also for calculating page decay =-=[2]-=- and similarity values [11]. The lower bounds of Section 4 show that precise PPR requires significantly larger database than Monte Carlo estimation does. Analogous results with similar communication c... |

42 | Methods for sampling pages uniformly from the world wide web
- Rusmevichientong, Pennock, et al.
- 2001
(Show Context)
Citation Context |

29 | I/O-Efficient Techniques for Computing PageRank
- Chen, Gan, et al.
- 2002
(Show Context)
Citation Context ...al memory algorithm) and a distributed system with tens to thousands of medium capacity computers. Both algorithms use similar techniques to the respective I/O efficient algorithms computing PageRank =-=[7]-=-. As the task is to generate N independent fingerprints, the single computer solution can be trivially parallelized to make use of a large cluster of machines, too. (Commercial web search engines have... |

17 |
Computing on data streams. External memory algorithms
- Henzinger, Raghavan, et al.
- 1999
(Show Context)
Citation Context ...wer bounds of Section 4 show that precise PPR requires significantly larger database than Monte Carlo estimation does. Analogous results with similar communication complexity arguments were proved in =-=[17]-=- for the space complexity of several data stream graph algorithms. Preliminaries. In this section we briefly introduce notation, and recall definitions and basic facts about PageRank. Let V denote the... |

9 | Where to start browsing the web
- Fogaras
- 2003
(Show Context)
Citation Context .... The last statement of the introduction will play a central role in our PPV estimations. The theorem provides an alternate probabilistic characterization of individual PageRank scores. 3 Theorem 2 ( =-=[18,10]-=- ). Suppose that a number L is chosen at random with probability Pr{L = i} = c(1 − c) i for i = 0, 1, 2, . . . Consider a random walk starting from some page u and taking L steps. Then PPV(u, v) = Pr{... |

6 |
Rank stability and rank similarity of link-based web ranking algorithms in authority-connected graphs
- Lempel, Moran
(Show Context)
Citation Context ...ing to the personalized PageRank scores. However, the order of the low ranked pages will usually not follow the PPR closely. This is not surprising, and actually a deep problem of PageRank itself, as =-=[21]-=- showed that PageRank is unstable around the low ranked pages, in the sense that with little perturbation of the graph a very low ranked page can jump in the ranking order somewhere to the middle. The... |

5 | A scalable randomized method to compute link-based similarity rank on the web graph
- Fogaras, Racz
- 2004
(Show Context)
Citation Context ...ere used before to compute various web statistics, mostly focused on sampling the web (uniformly or according to static PR) [16,26,1,15], but also for calculating page decay [2] and similarity values =-=[11]-=-. The lower bounds of Section 4 show that precise PPR requires significantly larger database than Monte Carlo estimation does. Analogous results with similar communication complexity arguments were pr... |

3 | Locality, hierarchy, and bidirectionality in the web
- Eiron, McCurley
- 2003
(Show Context)
Citation Context ...putable from v, for example by renumbering the vertices according to the partition. 7 It should be enough to have each domain on a single computer, as the majority of the links are intra-domain links =-=[19,9]-=-. 8 Also depending on the actual partition; as a heuristics one should use a partition that distributes the global PageRank uniformly across computers: the expected value of the total InQueue hits of ... |