Results 1  10
of
27
Beyond pagerank: Machine learning for static ranking
 In WWW ’06: Proceedings of the 15th international conference on World Wide Web
, 2006
"... Since the publication of Brin and Page’s paper on PageRank, many in the Web community have depended on PageRank for the static (queryindependent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are independent of the link structure of the Web. We gai ..."
Abstract

Cited by 44 (2 self)
 Add to MetaCart
Since the publication of Brin and Page’s paper on PageRank, many in the Web community have depended on PageRank for the static (queryindependent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are independent of the link structure of the Web. We gain a further boost in accuracy by using data on the frequency at which users visit Web pages. We use RankNet, a ranking machine learning algorithm, to combine these and other static features based on anchor text and domain characteristics. The resulting model achieves a static ranking pairwise accuracy of 67.3 % (vs. 56.7% for PageRank or 50 % for random).
Generalizing pagerank: Damping functions for linkbased ranking algorithms
 In Proceedings of ACM SIGIR
"... This paper introduces a family of linkbased ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ran ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
This paper introduces a family of linkbased ranking algorithms that propagate page importance through links. In these algorithms there is a damping function that decreases with distance, so a direct link implies more endorsement than a link through a long path. PageRank is the most widely known ranking function of this family. The main objective of this paper is to determine whether this family of ranking techniques has some interest per se, and how different choices for the damping function impact on rank quality and on convergence speed. Even though our results suggest that PageRank can be approximated with other simpler forms of rankings that may be computed more efficiently, our focus is of more speculative nature, in that it aims at separating the kernel of PageRank, that is, linkbased importance propagation, from the way propagation decays over paths. We focus on three damping functions, having linear, exponential, and hyperbolic decay on the lengths of the paths. The exponential decay corresponds to PageRank, and the other functions are new. Our presentation includes algorithms, analysis, comparisons and experiments that study their behavior under different parameters in real Web graph data. Among other results, we show how to calculate a linear approximation that induces a page ordering that is almost identical to PageRank’s using a fixed small number of iterations; comparisons were performed using Kendall’s τ on large domain datasets.
Ranking web sites with real user traffic
 INTERNATIONAL CONFERENCE ON WEB SEARCH AND WEB DATA MINING
, 2008
"... We analyze the trafficweighted Web host graph obtained from a large sample of real Web users over about seven months. A number of interesting structural properties are revealed by this complex dynamic network, some in line with the wellstudied boolean link host graph and others pointing to importa ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
We analyze the trafficweighted Web host graph obtained from a large sample of real Web users over about seven months. A number of interesting structural properties are revealed by this complex dynamic network, some in line with the wellstudied boolean link host graph and others pointing to important differences. We find that while search is directly involved in a surprisingly small fraction of user clicks, it leads to a much larger fraction of all sites visited. The temporal traffic patterns display strong regularities, with a large portion of future requests being statistically predictable by past ones. Given the importance of topological measures such as PageRank in modeling user navigation, as well as their role in ranking sites for Web search, we use the traffic data to validate the PageRank random surfing model. The ranking obtained by the actual frequency with which a site is visited by users differs significantly from that approximated by the uniform surfing/teleportation behavior modeled by PageRank, especially for the most important sites. To interpret this finding, we consider each of the fundamental assumptions underlying PageRank and show how each is violated by actual user behavior.
Building implicit links from content for forum search
 In SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
, 2006
"... The objective of Web forums is to create a shared space for open communications and discussions of specific topics and issues. The tremendous information behind forum sites is not fullyutilized yet. Most links between forum pages are automatically created, which means the linkbased ranking algorit ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
The objective of Web forums is to create a shared space for open communications and discussions of specific topics and issues. The tremendous information behind forum sites is not fullyutilized yet. Most links between forum pages are automatically created, which means the linkbased ranking algorithm cannot be applied efficiently. In this paper, we proposed a novel ranking algorithm which tries to introduce the content information into linkbased methods as implicit links. The basic idea is derived from the more focused random surfer: the surfer may more likely jump to a page which is similar to what he is reading currently. In this manner, we are allowed to introduce the content similarities into the link graph as a personalization bias. Our method, named Finegrained Rank (FGRank), can be efficiently computed based on an automatically generated topic hierarchy. Not like the topicsensitive PageRank, our method only need to compute single PageRank score for each page. Another contribution of this paper is to present a very efficient algorithm for automatically generating topic hierarchy and map each page in a largescale collection onto the computed hierarchy. The experimental results show that the proposed method can improve retrieval performance, and reveal that contentbased link graph is also important compared with the hyperlink graph.
Predicting influential users in online social networks
 IN: SNAKDD: PROCEEDINGS OF KDD WORKSHOP ON SOCIAL NETWORK ANALYSIS
, 2010
"... ..."
Graph fibrations, graph isomorphism, and PageRank
 RAIRO Inform. Théor
"... PageRank is a ranking method that assigns scores to web pages using the limit distribution of a random walk on the web graph. A fibration of graphs is a morphism that is a local isomorphism of inneighbourhoods, much in the same way a covering projection is a local isomorphism of neighbourhoods. We ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
PageRank is a ranking method that assigns scores to web pages using the limit distribution of a random walk on the web graph. A fibration of graphs is a morphism that is a local isomorphism of inneighbourhoods, much in the same way a covering projection is a local isomorphism of neighbourhoods. We show that a deep connection relates fibrations and Markov chains with restart, a particular kind of Markov chains that include the PageRank one as a special case. This fact provides constraints on the values that PageRank can assume. Using our results, we show that a recently defined class of graphs that admit a polynomialtime isomorphism algorithm based on the computation of PageRank is really a subclass of fibrationprime graphs, which possess simple, entirely discrete polynomialtime isomorphism algorithms based on classical techniques for graph isomorphism. We discuss efficiency issues in the implementation of such algorithms for the particular case of web graphs, in which O(n) space occupancy (where n is the number of nodes) may be acceptable, but O(m) is not (where m is the number of arcs). 1
Distribution of PageRank Mass Among Principle Components of the Web
 in "Proc. of 5th Workshop On Algorithms And Models For The WebGraph (WAW 2007
, 2007
"... We study the PageRank mass of principal components in a bowtie Web Graph, as a function of the damping factor c. Using a singular perturbation approach, we show that the PageRank share of IN and SCC components remains high even for very large values of the damping factor, in spite of the fact that ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We study the PageRank mass of principal components in a bowtie Web Graph, as a function of the damping factor c. Using a singular perturbation approach, we show that the PageRank share of IN and SCC components remains high even for very large values of the damping factor, in spite of the fact that it drops to zero when c→1. However, a detailed study of the OUT component reveals the presence “deadends” (small groups of pages linking only to each other) that receive an unfairly high ranking when c is close to one. We argue that this problem can be mitigated by choosing c as small as 1/2. 1
TotalRank: Ranking Without Damping
, 2005
"... PageRank is defined as the stationary state of a Markov chain obtained by perturbing the transition matrix of a web graph with a damping factor # that spreads part of the rank. The choice of # is eminently empirical, but most applications use # = 0.85; nonetheless, the selection of # is critical, an ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
PageRank is defined as the stationary state of a Markov chain obtained by perturbing the transition matrix of a web graph with a damping factor # that spreads part of the rank. The choice of # is eminently empirical, but most applications use # = 0.85; nonetheless, the selection of # is critical, and some believe that link farms may use this choice adversarially. Recent results [1] prove that the PageRank of a page is a rational function of #, and that this function can be approximated quite efficiently: this fact can be used to define a new form of ranking, TotalRank, that averages PageRanks over all possible #'s. We show how this rank can be computed efficiently, and provide some preliminary experimental results on its quality and comparisons with PageRank.
A singular perturbation approach for choosing PageRank damping factor
 2006, in arXiv:math.PR/0612079. G. Kollias, E. Gallopoulos
, 2006
"... Abstract. We study the PageRank mass of principal components in a bowtie Web Graph, as a function of the damping factor c. It is known that the Web graph can be divided into three principal components: SCC, IN and OUT. The Giant Strongly Connected Component (SCC) contains a large group of pages all ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract. We study the PageRank mass of principal components in a bowtie Web Graph, as a function of the damping factor c. It is known that the Web graph can be divided into three principal components: SCC, IN and OUT. The Giant Strongly Connected Component (SCC) contains a large group of pages all having a hyperlink path to each other. The pages in the IN (OUT) component have a path to (from) the SCC, but not back. Using a singular perturbation approach, we show that the PageRank share of IN and SCC components remains high even for very large values of the damping factor, in spite of the fact that it drops to zero when c tends to one. However, a detailed study of the OUT component reveals the presence of “deadends ” (small groups of pages linking only to each other) that receive an unfairly high ranking when c is close to one. We argue that this problem can be mitigated by choosing c as small as 1/2. 1
Spectral Ranking
, 2009
"... This note tries to attempt a sketch of the history of spectral ranking—a general umbrella name for techniques that apply the theory of linear maps (in particular, eigenvalues and eigenvectors) to matrices that do not represent geometric transformations, but rather some kind of relationship between e ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
This note tries to attempt a sketch of the history of spectral ranking—a general umbrella name for techniques that apply the theory of linear maps (in particular, eigenvalues and eigenvectors) to matrices that do not represent geometric transformations, but rather some kind of relationship between entities. Albeit recently made famous by the ample press coverage of Google’s PageRank algorithm, spectral ranking was devised more than fifty years ago, almost exactly in the same terms, and has been studied in psychology and social sciences. I will try to describe it in precise and modern mathematical terms, highlighting along the way the contributions given by previous scholars. Disclaimer This is is a work in progress with no claim of completeness. I have tried to collect evidence of spectral techniques in ranking from a number of sources, providing a unified mathematical framework that should make it possible to understand in a precise way the relationship between contributions. Reports of inaccuracies and missing references are more than welcome. 1