Abstract:
The original PageRank algorithm for improving the ranking of search-query results computes a single vector, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. For ordinary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared. By using linear combinations of these (precomputed) biased PageRank vectors to generate context-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. We describe techniques for e#ciently implementing a large scale search system based on the topicsensitive PageRank scheme.
Citations
|
1897
|
The anatomy of a large-scale hypertextual Web search engine
– Brin, Page
- 1998
|
|
1709
|
Authoritative sources in a hyperlinked environment
– Kleinberg
- 1998
|
|
1334
|
Randomized Algorithms
– Motwani, Raghavan
- 1995
|
|
1099
|
The pagerank citation ranking: Bringing order to the Web
– Page, Brin, et al.
- 1998
|
|
521
|
A comparison of event models for naive bayes text classification
– McCallum, Nigam
- 1998
|
|
427
|
Vector quantization
– Gray
- 1984
|
|
351
|
Improved Algorithms for Topic Distillation in Hyperlinked Environments," presented at
– Bharat, Henzinger
- 1998
|
|
245
|
Automatic resource compilation by analyzing hyperlink structure and associated text
– Chakrabarti, Dom, et al.
- 1998
|
|
237
|
Topic-sensitive PageRank
– Haveliwala
|
|
149
|
Mining the web (Discovering Knowledge from Hypertext Data
– Chakrabarti
- 2003
|
|
143
|
D.: Rank aggregation methods for the web
– Dwork, Kumar, et al.
- 2001
|
|
142
|
Scaling personalized web search
– Jeh, Widom
- 2003
|
|
123
|
Managing Gigabytes
– Witten, Moffat, et al.
- 1994
|
|
100
|
Extrapolation methods for accelerating pagerank computations
– Kamvar, Haveliwala, et al.
|
|
100
|
The intelligent surfer: Probabilistic combination of link and content information in PageRank
– Richardson, Domingos
- 2002
|
|
83
|
Comparing top k lists
– Fagin, Kumar, et al.
- 2003
|
|
72
|
WebBase: A repository of web pages
– Hirai, Raghavan, et al.
- 2000
|
|
71
|
Winners dont take all: Characterizing the competition for links on the web
– Pennock, Flake, et al.
- 2002
|
|
48
|
What is this page known for? computing web page reputations
– RAFIEI, MENDELZON
|
|
41
|
What can you do with a Web in your pocket
– Brin, Motwani, et al.
- 1998
|
|
36
|
The structure of broad topics on the Web
– Chakrabarti, Joshi, et al.
- 2002
|
|
22
|
When experts agree: using non-affiliated experts to rank popular topics
– Bharat, Mihaila
|
|
14
|
Web page scoring systems for horizontal and vertical search
– Diligenti, Gori, et al.
- 2002
|
|
8
|
Efficient encodings for document ranking vectors
– Haveliwala
- 2002
|
|
5
|
Efficient Computation of PageRank. Stanford University Technical Report. Available http://dbpubs.stanford.edu:8090/pub/1999-31
– Haveliwala
- 1999
|
|
3
|
When experts agree: using non-a#liated experts to rank popular topics
– Bharat, Mihaila
- 2001
|
|
3
|
Efficient computation
– Haveliwala
- 1999
|
|
2
|
WebBase: A Repository of
– Hirai, Raghavan, et al.
- 2000
|
|
1
|
Evil Than Dr. Evil?” http://searchenginewatch.com/ sereport/99/11-google.html
– “More
- 2003
|
|
1
|
Web Page Scoring Systems for Horizontal and
– Diligenti, Gori, et al.
- 2002
|
|
1
|
Efficient Encodings for Document Ranking Vectors,” Stanford Univ. technical report
– Haveliwala
- 2002
|
|
1
|
Managing Gigabytes. San Francisco
– Witten, Moffat, et al.
- 1999
|