Results 1  10
of
16
A survey on pagerank computing
 Internet Mathematics
, 2005
"... Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. T ..."
Abstract

Cited by 66 (0 self)
 Add to MetaCart
Abstract. This survey reviews the research related to PageRank computing. Components of a PageRank vector serve as authority weights for web pages independent of their textual content, solely based on the hyperlink structure of the web. PageRank is typically used as a web search ranking component. This defines the importance of the model and the data structures that underly PageRank processing. Computing even a single PageRank is a difficult computational task. Computing many PageRanks is a much more complex challenge. Recently, significant effort has been invested in building sets of personalized PageRank vectors. PageRank is also used in many diverse applications other than ranking. We are interested in the theoretical foundations of the PageRank formulation, in the acceleration of PageRank computing, in the effects of particular aspects of web graph structure on the optimal organization of computations, and in PageRank stability. We also review alternative models that lead to authority indices similar to PageRank and the role of such indices in applications other than web search. We also discuss linkbased search personalization and outline some aspects of PageRank infrastructure from associated measures of convergence to link preprocessing. 1.
Exploiting RDFS and OWL for Integrating Heterogeneous, LargeScale, Linked Data Corpora
, 2011
"... and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.” —George Bernard ShawAcknowledgements First, thanks to the taxpayers for the pizza and (much needed) cigarettes;...thanks to friends and family;...thanks ..."
Abstract

Cited by 10 (9 self)
 Add to MetaCart
and I will still each have one apple. But if you have an idea and I have an idea and we exchange these ideas, then each of us will have two ideas.” —George Bernard ShawAcknowledgements First, thanks to the taxpayers for the pizza and (much needed) cigarettes;...thanks to friends and family;...thanks to the various students and staff of DERI;...thanks to the URQ folk;...thanks to people with whom I have worked closely, including Alex, Antoine, Jeff, Luigi and Piero;...thanks to people with whom I have worked very closely, particularly Andreas and Jürgen;...thanks to John and Stefan for the guidance;...thanks to Jim for the patience and valuable time;...and finally, a big thanks to Axel for everything. The Web contains a vast amount of information on an abundance of topics, much of which is encoded as structured data indexed by local databases. However, these databases are rarely interconnected and information reuse across sites is limited. Semantic Web standards offer a possible solution in the form of an agreedupon data model and set of syntaxes, as well as metalanguages for publishing schemalevel information, offering a highlyinteroperable means of publishing and interlinking structured data on the Web. Thanks to the Linked Data community, an unprecedented lode of such data has now been published on the Web—by individuals, academia, communities, corporations and governmental organisations alike—on a medley of often overlapping topics. This new publishing paradigm has opened up a range of new and interesting research topics with respect to how this emergent “Web of Data ” can be harnessed and exploited by consumers. Indeed, although Semantic
Efficient parallel computation of PageRank
 In Proc. 28th ECIR
, 2006
"... Abstract. PageRank inherently is massively parallelizable and distributable, as a result of web’s strict hostbased link locality. In this paper we show that the GaußSeidel iterative method for solving linear systems can be successfully applied in such a parallel ranking scenario in order to improv ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Abstract. PageRank inherently is massively parallelizable and distributable, as a result of web’s strict hostbased link locality. In this paper we show that the GaußSeidel iterative method for solving linear systems can be successfully applied in such a parallel ranking scenario in order to improve convergence. By introducing a twodimensional web model and by adapting the PageRank to this environment, we present and evaluate efficient methods to compute the exact rank vector even for largescale web graphs in only a few minutes and iteration steps, with intrinsic support for incremental web crawling, and without the need for page sorting/reordering or for sharing global information. 1
Distributed pagerank computation based on iterative aggregationdisaggregation methods
 Proceedings of the 14th ACM international conference on Information and knowledge management
, 2005
"... PageRank has been widely used as a major factor in search engine ranking systems. However, global link graph information is required when computing PageRank, which causes prohibitive communication cost to achieve accurate results in distributed solution. In this paper, we propose a distributed PageR ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
PageRank has been widely used as a major factor in search engine ranking systems. However, global link graph information is required when computing PageRank, which causes prohibitive communication cost to achieve accurate results in distributed solution. In this paper, we propose a distributed PageRank computation algorithm based on iterative aggregationdisaggregation (IAD) method with Block Jacobi smoothing. The basic idea is divideandconquer. We treat each web site as a node to explore the block structure of hyperlinks. Local PageRank is computed by each node itself and then updated with a low communication cost with a coordinator. We prove the global convergence of the Block Jacobi method and then analyze the communication overhead and major advantages of our algorithm. Experiments on three real web graphs show that our method converges 5–7 times faster than the traditional Power method. We believe our work provides an efficient and practical distributed solution for PageRank on large scale Web graphs.
Hypergraph partitioning for faster parallel PageRank computation
 LECTURE NOTES IN COMPUTER SCIENCE 3670
, 2005
"... The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a usercentred model of websurfing behaviour. As the web has expanded a ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a usercentred model of websurfing behaviour. As the web has expanded and as demand for usertailored web page ordering metrics has grown, scalable parallel computation of PageRank has become a focus of considerable research effort. In this paper, we seek a scalable problem decomposition for parallel PageRank computation, through the use of stateoftheart hypergraphbased partitioning schemes. These have not been previously applied in this context. We consider both one and twodimensional hypergraph decomposition models. Exploiting the recent availability of the Parkway 2.1 parallel hypergraph partitioner, we present empirical results on a gigabit PC cluster for three publicly available web graphs. Our results show that hypergraphbased partitioning substantially reduces communication volume over conventional partitioning schemes (by up to three orders of magnitude), while still maintaining computational load balance. They also show a halving of the periteration runtime cost when compared to the most effective alternative approach used to date.
PAGERANK COMPUTATION, WITH SPECIAL ATTENTION TO DANGLING NODES
"... Abstract. We present a simple algorithm for computing the PageRank (stationary distribution) of the stochastic Google matrix G. The algorithm lumps all dangling nodes into a single node. We express lumping as a similarity transformation of G, and show that the PageRank of the nondangling nodes can b ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Abstract. We present a simple algorithm for computing the PageRank (stationary distribution) of the stochastic Google matrix G. The algorithm lumps all dangling nodes into a single node. We express lumping as a similarity transformation of G, and show that the PageRank of the nondangling nodes can be computed separately from that of the dangling nodes. The algorithm applies the power method only to the smaller lumped matrix, but the convergence rate is the same as that of the power method applied to the full matrix G. The efficiency of the algorithm increases as the number of dangling nodes increases. We also extend the expression for PageRank and the algorithm to more general Google matrices that have several different dangling node vectors, when it is required to distinguish among different classes of dangling nodes. We also analyze the effect of the dangling node vector on the PageRank, and show that the PageRank of the dangling nodes depends strongly on that of the nondangling nodes but not vice versa. At last we present a Jordan decomposition of the Google matrix for the (theoretical) extreme case when all web pages are dangling nodes.
Robust and Scalable Linked Data Reasoning Incorporating Provenance and Trust Annotations
, 2011
"... In this paper, we leverage annotated logic programs for tracking indicators of provenance and trust during reasoning, specifically focussing on the usecase of applying a scalable subset of OWL 2 RL/RDF rules over static corpora of arbitrary Linked Data (Web data). Our annotations encode three facet ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
In this paper, we leverage annotated logic programs for tracking indicators of provenance and trust during reasoning, specifically focussing on the usecase of applying a scalable subset of OWL 2 RL/RDF rules over static corpora of arbitrary Linked Data (Web data). Our annotations encode three facets of information: (i) blacklist: a (possibly manually generated) boolean annotation which indicates that the referent data are known to be harmful and should be ignored during reasoning; (ii) ranking: a numeric value derived by a PageRankinspired technique—adapted for Linked Data—which determines the centrality of certain data artefacts (such as RDF documents and statements); (iii) authority: a boolean value which uses Linked Data principles to conservatively determine whether or not some terminological information can be trusted. We formalise a logical framework which annotates inferences with the strength of derivation along these dimensions of trust and provenance; we formally demonstrate some desirable properties of the deployment of annotated logic programming in our setting, which guarantees (i) a unique minimal model (least fixpoint); (ii) monotonicity; (iii) finitariness; and (iv) finally decidability. In so doing, we also give some formal results which reveal strategies for scalable and efficient implementation of various reasoning tasks one might consider. Thereafter, we discuss scalable and distributed implementation strategies for applying our ranking and reasoning methods over a cluster of commodity hardware; throughout, we provide evaluation of our methods over 1 billion Linked Data quadruples crawled from approximately 4 million individual Web documents, empirically demonstrating the scalability of our approach, and how our
Mathematical properties and analysis of Google’s PageRank
 Bol. Soc. Esp. Mat. Apl
, 2006
"... To determine the order in which to display web pages, the search engine Google computes the PageRank vector, whose entries are the PageRanks of the web pages. The PageRank vector is the stationary distribution of a stochastic matrix, the Google matrix. The Google matrix in turn is a convex combinati ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
To determine the order in which to display web pages, the search engine Google computes the PageRank vector, whose entries are the PageRanks of the web pages. The PageRank vector is the stationary distribution of a stochastic matrix, the Google matrix. The Google matrix in turn is a convex combination of two stochastic matrices: one matrix represents the link structure of the web graph and a second, rankone matrix, mimics the random behaviour of web surfers and can also be used to combat web spamming. As a consequence, PageRank depends mainly the link structure of the web graph, but not on the contents of the web pages. We analyze the sensitivity of PageRank to changes in the Google matrix, including addition and deletion of links in the web graph. Due to the proliferation of web pages, the dimension of the Google matrix most likely exceeds ten billion. One of the simplest and most storageefficient methods for computing PageRank is the power method. We present error bounds for the iterates of the power method and for their residuals. Palabras clave: Markov matrix, stochastic matrix, stationary distribution, power method, perturbation bounds Clasificación por materias AMS: 15A51,65C40,65F15,65F50,65F10 1.
Asynchronous iterative computations with Web information retrieval structures: The PageRank case
, 2005
"... ..."
Spamresilient web rankings via influence throttling
 In IPDPS
, 2007
"... Web search is one of the most critical applications for managing the massive amount of distributed Web content. Due to the overwhelming reliance on Web search, there is a rise in efforts to manipulate (or spam) Web search engines. In this paper, we develop a spamresilient ranking model that promote ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Web search is one of the most critical applications for managing the massive amount of distributed Web content. Due to the overwhelming reliance on Web search, there is a rise in efforts to manipulate (or spam) Web search engines. In this paper, we develop a spamresilient ranking model that promotes a sourcebased view of the Web. One of the most salient features of our spamresilient ranking algorithm is the concept of influence throttling. We show how to utilize influence throttling to counter Web spam that aims at manipulating linkbased ranking systems, especially PageRanklike systems. Through formal analysis and experimental evaluation, we show the effectiveness and robustness of our spamresilient ranking model in comparison with existing Web algorithms such as PageRank. 1.