Results 11 - 20
of
23
Efficient parallel computation of PageRank
- In Proc. 28th ECIR
, 2006
"... Abstract. PageRank inherently is massively parallelizable and distributable, as a result of web’s strict host-based link locality. In this paper we show that the Gauß-Seidel iterative method for solving linear systems can be successfully applied in such a parallel ranking scenario in order to improv ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract. PageRank inherently is massively parallelizable and distributable, as a result of web’s strict host-based link locality. In this paper we show that the Gauß-Seidel iterative method for solving linear systems can be successfully applied in such a parallel ranking scenario in order to improve convergence. By introducing a two-dimensional web model and by adapting the PageRank to this environment, we present and evaluate efficient methods to compute the exact rank vector even for large-scale web graphs in only a few minutes and iteration steps, with intrinsic support for incremental web crawling, and without the need for page sorting/reordering or for sharing global information. 1
On Rank Correlation in Information Retrieval Evaluation
, 2007
"... Some methods for rank correlation in evaluation are examined and their relative advantages and disadvantages are discussed. In particular, it is suggested that different test statistics should be used for providing additional information about the experiments other that the one provided by statistic ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Some methods for rank correlation in evaluation are examined and their relative advantages and disadvantages are discussed. In particular, it is suggested that different test statistics should be used for providing additional information about the experiments other that the one provided by statistical significance testing. Kendall’s τ is often used for testing rank correlation, yet it is little appropriate if the objective of the test is different from what τ was designed for. In particular, attention should be paid to the null hypothesis. Other measures for rank correlation are described. If one test statistic suggests to reject a hypothesis, other test statistics should be used to support or to revise the decision. The paper then focuses on rank correlation between webpage lists ordered by PageRank for applying the general reflections on these test statistics. An interpretation of PageRank behaviour is provided on the basis of the discussion of the test statistics for rank correlation.
Hierarchical Link Analysis for Ranking Web Data
"... Abstract. On the Web of Data, entities are often interconnected in a way similar to web documents. Previous works have shown how PageRank can be adapted to achieve entity ranking. In this paper, we propose to exploit locality on the Web of Data by taking a layered approach, similar to hierarchical P ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. On the Web of Data, entities are often interconnected in a way similar to web documents. Previous works have shown how PageRank can be adapted to achieve entity ranking. In this paper, we propose to exploit locality on the Web of Data by taking a layered approach, similar to hierarchical PageRank approaches. We provide justifications for a two-layer model of the Web of Data, and introduce DING (Dataset Ranking) a novel ranking methodology based on this two-layer model. DING uses links between datasets to compute dataset ranks and combines the resulting values with semantic-dependent entity ranking strategies. We quantify the effectiveness of the approach with other link-based algorithms on large datasets coming from the Sindice search engine. The evaluation which includes a user study indicates that the resulting rank is better than the other approaches. Also, the resulting algorithm is shown to have desirable computational properties such as parallelisation. 1
Using a Layered Markov Model for Distributed Web Ranking Computation
- In ICDCS
, 2005
"... The link structure of the Web graph is used in algorithms such as Kleinberg’s HITS and Google’s PageRank to assign authoritative weights to Web pages and thus rank them. Both require a centralized computation of the ranking if used to rank the complete Web graph. In this paper, we propose a new appr ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The link structure of the Web graph is used in algorithms such as Kleinberg’s HITS and Google’s PageRank to assign authoritative weights to Web pages and thus rank them. Both require a centralized computation of the ranking if used to rank the complete Web graph. In this paper, we propose a new approach based on a Layered Markov Model to distinguish transitions among Web sites and Web documents. Based on this model, we propose two different approaches for computation of ranking of Web documents, a centralized one and a decentralized one. Both produce a well-defined ranking for a given Web graph. We then formally prove that the two approaches are equivalent. This provides a theoretical foundation for decomposing linkbased rank computation and makes the computation for a Web-scale graph feasible in a decentralized fashion, such as required for Web search engines having a peer-to-peer architecture. Furthermore, personalized rankings can be produced by adapting the computation at both the local layer and the global layer. Our empirical results show that the ranking generated by our model is qualitatively comparable to or even better than the ranking produced by PageRank. 1.
Computing Trusted Authority Scores in Peer-to-Peer Web Search Networks
, 2007
"... Peer-to-peer (P2P) networks have received great attention for sharing and searching information in large user communities. The open and anonymous nature of P2P networks is one of its main strengths, but it also opens doors to manipulation of the information and of the quality ratings. In our previou ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Peer-to-peer (P2P) networks have received great attention for sharing and searching information in large user communities. The open and anonymous nature of P2P networks is one of its main strengths, but it also opens doors to manipulation of the information and of the quality ratings. In our previous work (J. X. Parreira, D. Donato, S. Michel and G. Weikum in VLDB 2006) we presented the JXP algorithm for distributed computing PageRank scores for information units (Web pages, sites, peers, social groups, etc.) within a link- or endorsement-based graph structure. The algorithm builds on local authority computations and bilateral peer meetings with exchanges of small data structures that are relevant for gradually learning about global properties and eventually converging towards global authority rankings. In the current paper we address the important issue of cheating peers that attempt to distort the global authority values, by providing manipulated data during the peer meetings. Our approach to this problem enhances JXP with statistical techniques for detecting suspicious behavior. Our method, coined TrustJXP, is again completely decentralized, and we demonstrate its viability and robustness in experiments with real Web data.
Local Aspects of the Global Ranking of Web Pages
- in "6th International Workshop on Innovative Internet Community S ystems (I2CS
, 2006
"... Abstract. Started in 1998, the search engine Google estimates page importance using several parameters. PageRank is one of those. Precisely, PageRank is a distribution of probability on the Web pages that depends on the Web graph. Our purpose is to show that the PageRank can be decomposed into two t ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Started in 1998, the search engine Google estimates page importance using several parameters. PageRank is one of those. Precisely, PageRank is a distribution of probability on the Web pages that depends on the Web graph. Our purpose is to show that the PageRank can be decomposed into two terms, internal and external PageRank. These two PageRanks allow a better comprehension of the PageRank signification inside and outside a site. A first application is a local algorithm to estimate the PageRank inside a site. We will also show quantitative results on the possibilities for a site to boost its own PageRank. 1
A perspective on Web information retrieval
"... this paper assumes the usefulness of PageRank and addresses only the question of e#cient implementation. A methodology is proposed which aggregates the large page graph into a smaller host graph from which the PageRank distribution on the pages can be reconstructed. The reported experiments show tha ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
this paper assumes the usefulness of PageRank and addresses only the question of e#cient implementation. A methodology is proposed which aggregates the large page graph into a smaller host graph from which the PageRank distribution on the pages can be reconstructed. The reported experiments show that the time needed to recompute the PageRank is reduced considerably thus suggesting the possibility of implementing, for example, e#cient personalized PageRank algorithms
Local Approximation of PageRank and Reverse PageRank
"... We consider the problem of approximating the PageRank of a target node using only local information provided by a link server. This problem was originally studied by Chen, Gan, and Suel (CIKM 2004), who presented an algorithm for tackling it. We prove that local approximation of PageRank, even to wi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We consider the problem of approximating the PageRank of a target node using only local information provided by a link server. This problem was originally studied by Chen, Gan, and Suel (CIKM 2004), who presented an algorithm for tackling it. We prove that local approximation of PageRank, even to within modest approximation factors, is infeasible in the worst-case, as it requires probing the link server for Ω(n) nodes, where n is the size of the graph. The difficulty emanates from nodes of high in-degree and/or from slow convergence of the PageRank random walk. We show that when the graph has bounded in-degree and admits fast PageRank convergence, then local PageRank approximation can be done using a small number of queries. Unfortunately, natural graphs, such as the web graph, are abundant with high in-degree nodes, making this algorithm (or any other local approximation algorithm) too costly. On the other hand, reverse natural graphs tend to have low in-degree while maintaining fast PageRank convergence. It follows that calculating Reverse PageRank locally is frequently more feasible than computing PageRank locally. We demonstrate that Reverse PageRank is useful for several applications, including computation of hub scores for web pages, finding influencers in social networks, obtaining good seeds for crawling, and measurement of semantic relatedness between concepts in a taxonomy. 1.
Subgraphrank: PageRank approximation for a subgraph or in a decentralized system
, 2007
"... PageRank, a ranking metric for hypertext web pages, has received increased interests. As the Web has grown in size, computing PageRank scores on the whole web using centralized approaches faces challenges in scalability. Distributed systems like peer-to-peer(P2P) networks are employed to speed up Pa ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
PageRank, a ranking metric for hypertext web pages, has received increased interests. As the Web has grown in size, computing PageRank scores on the whole web using centralized approaches faces challenges in scalability. Distributed systems like peer-to-peer(P2P) networks are employed to speed up PageRank. In a P2P system, each peer crawls web fragments independently. Hence the web fragment on one peer is incomplete and may overlap with other peers. The challenge is to compute PageRank on each web fragment, reflecting the global web graph. Another interesting case is focused crawler, where only pages in a web fragment are of interest. In this research, we study the following problem: Given a web fragment and the whole web structure, approximate the global PageRank scores on subgraph, without running PageRank on the whole Web. We refine the PageRank paradigm to take into consideration the links connecting external pages. We describe a weight assigning approach to convey information about the global graph. We propose an efficient algorithm called SubgraphRank to compute the PageRank scores of a subgraph and design the experiments to validate the algorithm. In P2P case, we will relax the assumption of the global graph in future work. 1.
Form 836 (8/00) Simulating Network Influence Algorithms Using
"... Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. ..."
Abstract
- Add to MetaCart
Department of Energy under contract W-7405-ENG-36. By acceptance of this article, the publisher recognizes that the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or to allow others to do so, for U.S. Government purposes. Los Alamos National Laboratory requests that the publisher identify this article as work performed under the

