Results 1  10
of
484
Combating web spam with trustrank
 In VLDB
, 2004
"... Web spam pages use various techniques to achieve higherthandeserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semiautomatically separate reputable, good pages fr ..."
Abstract

Cited by 325 (3 self)
 Add to MetaCart
(Show Context)
Web spam pages use various techniques to achieve higherthandeserved rankings in a search engine’s results. While human experts can identify spam, it is too expensive to manually evaluate a large number of pages. Instead, we propose techniques to semiautomatically separate reputable, good pages from spam. We first select a small set of seed pages to be evaluated by an expert. Once we manually identify the reputable seed pages, we use the link structure of the web to discover other pages that are likely to be good. In this paper we discuss possible ways to implement the seed selection and the discovery of good pages. We present results of experiments run on the World Wide Web indexed by AltaVista and evaluate the performance of our techniques. Our results show that we can effectively filter out spam from a significant fraction of the web, based on a good seed set of less than 200 sites. 1
Scaling Personalized Web Search
 In Proceedings of the Twelfth International World Wide Web Conference
, 2002
"... Recent web search techniques augment traditional text matching with a global notion of "importance" based on the linkage structure of the web, such as in Google's PageRank algorithm. For more refined searches, this global notion of importance can be specialized to create personalized ..."
Abstract

Cited by 319 (3 self)
 Add to MetaCart
Recent web search techniques augment traditional text matching with a global notion of "importance" based on the linkage structure of the web, such as in Google's PageRank algorithm. For more refined searches, this global notion of importance can be specialized to create personalized views of importance  for example, importance scores can be biased according to a userspecified set of initially interesting pages. Computing and storing all possible personalized views in advance is impractical, as is computing personalized views at query time, since the computation of each view requires an iterative computation over the web graph. We present new graphtheoretical results, and a new technique based on these results, that encode personalized views as partial vectors. Partial vectors are shared across multiple personalized views, and their computation and storage costs scale well with the number of views.
Authoritybased keyword search in databases
 TODS
"... The ObjectRank system applies authoritybased ranking to keyword search in databases modeled as labeled graphs. Conceptually, authority originates at the nodes (objects) containing the keywords and flows to objects according to their semantic connections. Each node is ranked according to its authori ..."
Abstract

Cited by 175 (12 self)
 Add to MetaCart
The ObjectRank system applies authoritybased ranking to keyword search in databases modeled as labeled graphs. Conceptually, authority originates at the nodes (objects) containing the keywords and flows to objects according to their semantic connections. Each node is ranked according to its authority with respect to the particular
Topicsensitive pagerank: A contextsensitive ranking algorithm for web search
 IEEE Transactions on Knowledge and Data Engineering
, 2003
"... Abstract—The original PageRank algorithm for improving the ranking of searchquery results computes a single vector, using the link structure of the Web, to capture the relative “importance ” of Web pages, independent of any particular search query. To yield more accurate search results, we propose ..."
Abstract

Cited by 163 (2 self)
 Add to MetaCart
(Show Context)
Abstract—The original PageRank algorithm for improving the ranking of searchquery results computes a single vector, using the link structure of the Web, to capture the relative “importance ” of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. For ordinary keyword search queries, we compute the topicsensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topicsensitive PageRank scores using the topic of the context in which the query appeared. By using linear combinations of these (precomputed) biased PageRank vectors to generate contextspecific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. We describe techniques for efficiently implementing a largescale search system based on the topicsensitive PageRank scheme. Index Terms—Web search, web graph, link analysis, PageRank, search in context, personalized search, ranking algorithm.
Deeper inside pagerank
 Internet Mathematics
, 2004
"... Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existe ..."
Abstract

Cited by 158 (5 self)
 Add to MetaCart
(Show Context)
Abstract. This paper serves as a companion or extension to the “Inside PageRank” paper by Bianchini et al. [Bianchini et al. 03]. It is a comprehensive survey of all issues associated with PageRank, covering the basic PageRank model, available and recommended solution methods, storage issues, existence, uniqueness, and convergence properties, possible alterations to the basic model, suggested alternatives to the traditional solution methods, sensitivity and conditioning, and finally the updating problem. We introduce a few new results, provide an extensive reference list, and speculate about exciting areas of future research. 1.
Extrapolation Methods for Accelerating PageRank Computations
 In Proceedings of the Twelfth International World Wide Web Conference
, 2003
"... We present a novel algorithm for the fast computation of PageRank, a hyperlinkbased estimate of the "importance" of Web pages. The original PageRank algorithm uses the Power Method to compute successive iterates that converge to the principal eigenvector of the Markov matrix representing ..."
Abstract

Cited by 145 (13 self)
 Add to MetaCart
(Show Context)
We present a novel algorithm for the fast computation of PageRank, a hyperlinkbased estimate of the "importance" of Web pages. The original PageRank algorithm uses the Power Method to compute successive iterates that converge to the principal eigenvector of the Markov matrix representing the Web link graph. The algorithm presented here, called Quadratic Extrapolation, accelerates the convergence of the Power Method by periodically subtracting off estimates of the nonprincipal eigenvectors from the current iterate of the Power Method. In Quadratic Extrapolation, we take advantage of the fact that the first eigenvalueof a Markov matrix is known to be 1 to compute the nonprincipal eigenvectorsusing successiveiterates of the Power Method. Empirically, we show that using Quadratic Extrapolation speeds up PageRank computation by 50300% on a Web graph of 80 million nodes, with minimal overhead.
Learning to cluster web search results
 In Proc. of SIGIR ’04
, 2004
"... In web search, surfers are often faced with the problem of selecting their most wanted information from the potential huge amount of search results. The clustering of web search results is the possible solution, but the traditional content based clustering is not sufficient since it ignores many uni ..."
Abstract

Cited by 142 (7 self)
 Add to MetaCart
(Show Context)
In web search, surfers are often faced with the problem of selecting their most wanted information from the potential huge amount of search results. The clustering of web search results is the possible solution, but the traditional content based clustering is not sufficient since it ignores many unique features of web pages. The link structure, authority, quality, or trustfulness of search results can play even the higher role than the actual contents of the web pages in clustering. These possible extents are reflected by Google's PageRank algorithm, HITS algorithm and etc. The main goal of this project is to integrate the authoritative information such as PageRank, link structure (e.g. inlinks and outlinks) into the KMeans clustering of web search results. The PageRank, inlinks and outlinks can be used to extend the vector representation of web pages, and the PageRank can also be considered in the initial centroids selection, or the web page with higher PageRank influences the centroid computation to a higher degree. The relevance of this modified KMeans clustering algorithm needs to be compared to the ones obtained by the content based KMeans clustering, and the effects of different authoritative information also needs to be analyzed.
Exploiting the Block Structure of the Web for Computing PageRank
, 2003
"... The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3stage alg ..."
Abstract

Cited by 140 (5 self)
 Add to MetaCart
The web link graph has a nested block structure: the vast majority of hyperlinks link pages on a host to other pages on the same host, and many of those that do not link pages within the same domain. We show how to exploit this structure to speed up the computation of PageRank by a 3stage algorithm whereby (1) the local PageRanks of pages for each host are computed independently using the link structure of that host, (2) these local PageRanks are then weighted by the "importance" of the corresponding host, and (3) the standard PageRank algorithm is then run using as its starting vector the weighted concatenation of the local PageRanks. Empirically, this algorithm speeds up the computation of PageRank by a factor of 2 in realistic scenarios. Further, we develop a variant of this algorithm that efficiently computes many different "personalized" PageRanks, and a variant that efficiently recomputes PageRank after node updates.
Automatic identification of user goals in web search
, 2004
"... There have been recent interests in studying the “goal ” behind a user’s Web query, so that this goal can be used to improve the quality of a search engine’s results. Previous studies have mainly focused on using manual querylog investigation to identify Web query goals. In this paper we study whet ..."
Abstract

Cited by 120 (2 self)
 Add to MetaCart
(Show Context)
There have been recent interests in studying the “goal ” behind a user’s Web query, so that this goal can be used to improve the quality of a search engine’s results. Previous studies have mainly focused on using manual querylog investigation to identify Web query goals. In this paper we study whether and how we can automate this goalidentification process. We first present our results from a human subject study that strongly indicate the feasibility of automatic querygoal identification. We then propose two types of features for the goalidentification task: userclick behavior and anchorlink distribution. Our experimental evaluation shows that by combining these features we can correctly identify the goals for 90 % of the queries studied.
Algorithms for estimating relative importance in networks
 In Proceedings of KDD 2003
, 2003
"... Large and complex graphs representing relationships among sets of entities are an increasingly common focus of interest in data analysis—examples include social networks, Web graphs, telecommunication networks, and biological networks. In interactive analysis of such data a natural query is “which e ..."
Abstract

Cited by 106 (4 self)
 Add to MetaCart
Large and complex graphs representing relationships among sets of entities are an increasingly common focus of interest in data analysis—examples include social networks, Web graphs, telecommunication networks, and biological networks. In interactive analysis of such data a natural query is “which entities are most important in the network relative to a particular individual or set of individuals? ” We investigate the problem of answering such queries in this paper, focusing in particular on defining and computing the importance of nodes in a graph relative to one or more root nodes. We define a general framework and a number of different algorithms, building on ideas from social networks, graph theory, Markov models, and Web graph analysis. We experimentally evaluate the different properties of these algorithms on toy graphs and demonstrate how our approach can be used to study relative importance in realworld networks including a network of interactions among September 11th terrorists, a network of collaborative research in biotechnology among companies and universities, and a network of coauthorship relationships among computer science researchers.