Results 1  10
of
106
TopicSensitive PageRank
, 2002
"... In the original PageRank algorithm for improving the ranking of searchquery results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search resu ..."
Abstract

Cited by 453 (10 self)
 Add to MetaCart
(Show Context)
In the original PageRank algorithm for improving the ranking of searchquery results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. By using these (precomputed) biased PageRank vectors to generate queryspecific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. For ordinary keyword search queries, we compute the topicsensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topicsensitive PageRank scores using the topic of the context in which the query appeared.
Approximate Frequency Counts over Data Streams
 VLDB
, 2002
"... We present algorithms for computing frequency counts exceeding a userspecified threshold over data streams. Our algorithms are simple and have provably small memory footprints. Although the output is approximate, the error is guaranteed not to exceed a userspecified parameter. Our algorithms can e ..."
Abstract

Cited by 342 (1 self)
 Add to MetaCart
We present algorithms for computing frequency counts exceeding a userspecified threshold over data streams. Our algorithms are simple and have provably small memory footprints. Although the output is approximate, the error is guaranteed not to exceed a userspecified parameter. Our algorithms can easily be deployed for streams of singleton items like those found in IP network monitoring. We can also handle streams of variable sized sets of items exemplified by a sequence of market basket transactions at a retail store. For such streams, we describe an optimized implementation to compute frequent itemsets in a single pass.
Scaling Personalized Web Search
 In Proceedings of the Twelfth International World Wide Web Conference
, 2002
"... Recent web search techniques augment traditional text matching with a global notion of "importance" based on the linkage structure of the web, such as in Google's PageRank algorithm. For more refined searches, this global notion of importance can be specialized to create personalized ..."
Abstract

Cited by 319 (3 self)
 Add to MetaCart
Recent web search techniques augment traditional text matching with a global notion of "importance" based on the linkage structure of the web, such as in Google's PageRank algorithm. For more refined searches, this global notion of importance can be specialized to create personalized views of importance  for example, importance scores can be biased according to a userspecified set of initially interesting pages. Computing and storing all possible personalized views in advance is impractical, as is computing personalized views at query time, since the computation of each view requires an iterative computation over the web graph. We present new graphtheoretical results, and a new technique based on these results, that encode personalized views as partial vectors. Partial vectors are shared across multiple personalized views, and their computation and storage costs scale well with the number of views.
The WebGraph Framework I: Compression Techniques
 In Proc. of the Thirteenth International World Wide Web Conference
, 2003
"... Studying web graphs is often dicult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms ..."
Abstract

Cited by 174 (30 self)
 Add to MetaCart
(Show Context)
Studying web graphs is often dicult due to their large size. Recently, several proposals have been published about various techniques that allow to store a web graph in memory in a limited space, exploiting the inner redundancies of the web. The WebGraph framework is a suite of codes, algorithms and tools that aims at making it easy to manipulate large web graphs. This papers presents the compression techniques used in WebGraph, which are centred around referentiation and intervalisation (which in turn are dual to each other).
Topicsensitive pagerank: A contextsensitive ranking algorithm for web search
 IEEE Transactions on Knowledge and Data Engineering
, 2003
"... Abstract—The original PageRank algorithm for improving the ranking of searchquery results computes a single vector, using the link structure of the Web, to capture the relative “importance ” of Web pages, independent of any particular search query. To yield more accurate search results, we propose ..."
Abstract

Cited by 163 (2 self)
 Add to MetaCart
(Show Context)
Abstract—The original PageRank algorithm for improving the ranking of searchquery results computes a single vector, using the link structure of the Web, to capture the relative “importance ” of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. For ordinary keyword search queries, we compute the topicsensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topicsensitive PageRank scores using the topic of the context in which the query appeared. By using linear combinations of these (precomputed) biased PageRank vectors to generate contextspecific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. We describe techniques for efficiently implementing a largescale search system based on the topicsensitive PageRank scheme. Index Terms—Web search, web graph, link analysis, PageRank, search in context, personalized search, ranking algorithm.
The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank
, 2002
"... The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a page a score proportional to the number of times a random surfer would visit that page, if it surfed indefinitely from page ..."
Abstract

Cited by 156 (5 self)
 Add to MetaCart
(Show Context)
The PageRank algorithm, used in the Google search engine, greatly improves the results of Web search by taking into account the link structure of the Web. PageRank assigns to a page a score proportional to the number of times a random surfer would visit that page, if it surfed indefinitely from page to page, following all outlinks from a page with equal probability. We propose to improve PageRank by using a more intelligent surfer, one that is guided by a probabilistic model of the relevance of a page to a query. Efficient execution of our algorithm at query time is made possible by precomputing at crawl time (and thus once for all queries) the necessary terms. Experiments on two large subsets of the Web indicate that our algorithm significantly outperforms PageRank in the (human rated) quality of the pages returned, while remaining efficient enough to be used in today's large search engines. 1
SemTag and Seeker: Bootstrapping the semantic web via automated semantic annotation
 Proceedings of the 12 th International Conference on World Wide Web (WWW’03
, 2003
"... This paper describes Seeker, a platform for largescale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatica ..."
Abstract

Cited by 151 (5 self)
 Add to MetaCart
(Show Context)
This paper describes Seeker, a platform for largescale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. The final version of this paper will reflect new data labeling one billion pages, rather than the 264 million pages reported on herein. To our knowledge, this is the largest scale semantic tagging effort to date. We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of largescale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web. 1.
Extrapolation Methods for Accelerating PageRank Computations
 In Proceedings of the Twelfth International World Wide Web Conference
, 2003
"... We present a novel algorithm for the fast computation of PageRank, a hyperlinkbased estimate of the "importance" of Web pages. The original PageRank algorithm uses the Power Method to compute successive iterates that converge to the principal eigenvector of the Markov matrix representing ..."
Abstract

Cited by 145 (13 self)
 Add to MetaCart
(Show Context)
We present a novel algorithm for the fast computation of PageRank, a hyperlinkbased estimate of the "importance" of Web pages. The original PageRank algorithm uses the Power Method to compute successive iterates that converge to the principal eigenvector of the Markov matrix representing the Web link graph. The algorithm presented here, called Quadratic Extrapolation, accelerates the convergence of the Power Method by periodically subtracting off estimates of the nonprincipal eigenvectors from the current iterate of the Power Method. In Quadratic Extrapolation, we take advantage of the fact that the first eigenvalueof a Markov matrix is known to be 1 to compute the nonprincipal eigenvectorsusing successiveiterates of the Power Method. Empirically, we show that using Quadratic Extrapolation speeds up PageRank computation by 50300% on a Web graph of 80 million nodes, with minimal overhead.
Exploiting the Block Structure of the Web for Computing PageRank
, 2003
"... The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3stage alg ..."
Abstract

Cited by 140 (5 self)
 Add to MetaCart
The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3stage algorithm whereby (1) the local PageRanks of pages for each host are computed independently using the link structure of that host, (2) these local PageRanks are then weighted by the "importance" of the corresponding host, and (3) the standard PageRank algorithm is then run using as its starting vector the weighted concatenation of the local PageRanks. Empirically, this algorithm speeds up the computation of PageRank by a factor of 2 in realistic scenarios. Further, we develop a variant of this algorithm that efficiently computes many different "personalized" PageRanks, and a variant that efficiently recomputes PageRank after node updates.
Seeing the Whole in Parts: Text Summarization for Web Browsing on Handheld Devices
, 2000
"... THIS PAGE HAS BEEN HACKED AND NEEDS TO BE RESET. SEE THE ACTUAL ARTICLE. ..."
Abstract

Cited by 139 (2 self)
 Add to MetaCart
THIS PAGE HAS BEEN HACKED AND NEEDS TO BE RESET. SEE THE ACTUAL ARTICLE.