Results 1  10
of
11
Exploiting social networks for Internet search
 In Proceedings of the 5th Workshop on Hot Topics in Networks (HotNetsV
, 2006
"... Over the last decade, the World Wide Web and Web search engines have fundamentally transformed the way people find and share information. Recently, a new form of publishing and locating information, known as online ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
Over the last decade, the World Wide Web and Web search engines have fundamentally transformed the way people find and share information. Recently, a new form of publishing and locating information, known as online
Measurement and Analysis
 of Online Social Networks,” in 7th ACM SIGCOMM Internet Measurement Conference (IMC
, 2007
"... Recently, online social networking sites have exploded in popularity. Numerous sites are dedicated to finding and maintaining contacts and to locating and sharing different types of content. Online social networks represent a new kind of information network that differs significantly from existing n ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
Recently, online social networking sites have exploded in popularity. Numerous sites are dedicated to finding and maintaining contacts and to locating and sharing different types of content. Online social networks represent a new kind of information network that differs significantly from existing networks like the Web. For example, in the Web, hyperlinks between content form a graph that is used to organize, navigate, and rank information. The properties of the Web graph have been studied extensively, and have lead to useful algorithms such as PageRank. In contrast, few links exist between content in online social networks and instead, the links exist between content and users, and between users themselves. However, little is known in the research community about the properties of online social network graphs at scale, the factors that shape their structure, or the ways they can be leveraged in information systems. In this thesis, we use novel measurement techniques to study online social networks at scale, and use the resulting insights to design innovative new information systems. First, we examine the structure and growth patterns of online social networks, focusing on how users are connecting to one another. We conduct the first
RankMass Crawler: A Crawler with High Personalized PageRank Coverage Guarantee
, 2007
"... Crawling algorithms have been the subject of extensive research and optimizations, but some important questions remain open. In particular, given the unbounded number of pages available on the Web, searchengine operators constantly struggle with the following vexing questions: When can I stop downl ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Crawling algorithms have been the subject of extensive research and optimizations, but some important questions remain open. In particular, given the unbounded number of pages available on the Web, searchengine operators constantly struggle with the following vexing questions: When can I stop downloading the Web? How many pages should I download to cover “most ” of the Web? How can I know I am not missing an important part when I stop? In this paper we provide an answer to these questions by developing, in the context of a system that is given a set of trusted pages, a family of crawling algorithms that (1) provide a theoretical guarantee on how much of the “important ” part of the Web it will download after crawling a certain number of pages and (2) give a high priority to important pages during a crawl, so that the search engine can index the most important part of the Web first. We prove the correctness of our algorithms by theoretical analysis and evaluate their performance experimentally based on 141 million URLs obtained from the Web. Our experiments demonstrate that even our simple algorithm is effective in downloading important pages early on and provides high “coverage of the Web with a relatively small number of pages.
Asynchronous distributed power iteration with gossipbased normalization
 EuroPar 2007, volume 4641 of Lecture Notes in Computer Science
, 2007
"... Abstract. The dominant eigenvector of matrices defined by weighted links in overlay networks plays an important role in many peertopeer applications. Examples include trust management, importance ranking to support search, and virtual coordinate systems to facilitate managing network proximity. Ro ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
Abstract. The dominant eigenvector of matrices defined by weighted links in overlay networks plays an important role in many peertopeer applications. Examples include trust management, importance ranking to support search, and virtual coordinate systems to facilitate managing network proximity. Robust and efficient asynchronous distributed algorithms are known only for the case when the dominant eigenvalue is exactly one. We present a fully distributed algorithm for a more general case: nonnegative square matrices that have an arbitrary dominant eigenvalue. The basic idea is that we apply a gossipbased aggregation protocol coupled with an asynchronous iteration algorithm, where the gossip component controls the iteration component. The norm of the resulting vector is an unknown finite constant by default; however, it can optionally be set to any desired constant using a third gossip control component. Through extensive simulation results on artificially generated overlay networks and real web traces we demonstrate the correctness, the performance and the fault tolerance of the protocol. 1
Local Approximation of PageRank and Reverse PageRank
"... We consider the problem of approximating the PageRank of a target node using only local information provided by a link server. This problem was originally studied by Chen, Gan, and Suel (CIKM 2004), who presented an algorithm for tackling it. We prove that local approximation of PageRank, even to wi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We consider the problem of approximating the PageRank of a target node using only local information provided by a link server. This problem was originally studied by Chen, Gan, and Suel (CIKM 2004), who presented an algorithm for tackling it. We prove that local approximation of PageRank, even to within modest approximation factors, is infeasible in the worstcase, as it requires probing the link server for Ω(n) nodes, where n is the size of the graph. The difficulty emanates from nodes of high indegree and/or from slow convergence of the PageRank random walk. We show that when the graph has bounded indegree and admits fast PageRank convergence, then local PageRank approximation can be done using a small number of queries. Unfortunately, natural graphs, such as the web graph, are abundant with high indegree nodes, making this algorithm (or any other local approximation algorithm) too costly. On the other hand, reverse natural graphs tend to have low indegree while maintaining fast PageRank convergence. It follows that calculating Reverse PageRank locally is frequently more feasible than computing PageRank locally. We demonstrate that Reverse PageRank is useful for several applications, including computation of hub scores for web pages, finding influencers in social networks, obtaining good seeds for crawling, and measurement of semantic relatedness between concepts in a taxonomy. 1.
Computing Trusted Authority Scores in PeertoPeer Web Search Networks
, 2007
"... Peertopeer (P2P) networks have received great attention for sharing and searching information in large user communities. The open and anonymous nature of P2P networks is one of its main strengths, but it also opens doors to manipulation of the information and of the quality ratings. In our previou ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Peertopeer (P2P) networks have received great attention for sharing and searching information in large user communities. The open and anonymous nature of P2P networks is one of its main strengths, but it also opens doors to manipulation of the information and of the quality ratings. In our previous work (J. X. Parreira, D. Donato, S. Michel and G. Weikum in VLDB 2006) we presented the JXP algorithm for distributed computing PageRank scores for information units (Web pages, sites, peers, social groups, etc.) within a link or endorsementbased graph structure. The algorithm builds on local authority computations and bilateral peer meetings with exchanges of small data structures that are relevant for gradually learning about global properties and eventually converging towards global authority rankings. In the current paper we address the important issue of cheating peers that attempt to distort the global authority values, by providing manipulated data during the peer meetings. Our approach to this problem enhances JXP with statistical techniques for detecting suspicious behavior. Our method, coined TrustJXP, is again completely decentralized, and we demonstrate its viability and robustness in experiments with real Web data.
Subgraphrank: PageRank approximation for a subgraph or in a decentralized system
, 2007
"... PageRank, a ranking metric for hypertext web pages, has received increased interests. As the Web has grown in size, computing PageRank scores on the whole web using centralized approaches faces challenges in scalability. Distributed systems like peertopeer(P2P) networks are employed to speed up Pa ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
PageRank, a ranking metric for hypertext web pages, has received increased interests. As the Web has grown in size, computing PageRank scores on the whole web using centralized approaches faces challenges in scalability. Distributed systems like peertopeer(P2P) networks are employed to speed up PageRank. In a P2P system, each peer crawls web fragments independently. Hence the web fragment on one peer is incomplete and may overlap with other peers. The challenge is to compute PageRank on each web fragment, reflecting the global web graph. Another interesting case is focused crawler, where only pages in a web fragment are of interest. In this research, we study the following problem: Given a web fragment and the whole web structure, approximate the global PageRank scores on subgraph, without running PageRank on the whole Web. We refine the PageRank paradigm to take into consideration the links connecting external pages. We describe a weight assigning approach to convey information about the global graph. We propose an efficient algorithm called SubgraphRank to compute the PageRank scores of a subgraph and design the experiments to validate the algorithm. In P2P case, we will relax the assumption of the global graph in future work. 1.
AN INNEROUTER ITERATION FOR COMPUTING PAGERANK
"... Abstract. We present a new iterative scheme for PageRank computation. The algorithm is applied to the linear system formulation of the problem, using innerouter stationary iterations. It is simple, can be easily implemented and parallelized, and requires minimal storage overhead. Our convergence an ..."
Abstract
 Add to MetaCart
Abstract. We present a new iterative scheme for PageRank computation. The algorithm is applied to the linear system formulation of the problem, using innerouter stationary iterations. It is simple, can be easily implemented and parallelized, and requires minimal storage overhead. Our convergence analysis shows that the algorithm is effective for a crude inner tolerance and is not sensitive to the choice of the parameters involved. The same idea can be used as a preconditioning technique for nonstationary schemes. Numerical examples featuring matrices of dimensions exceeding 100,000,000 in sequential and parallel environments demonstrate the merits of our technique. Our code is available online for viewing and testing, along with several large scale examples.
Noname manuscript No. (will be inserted by the editor) Asynchronous Privacypreserving Iterative Computation on Peertopeer Networks
"... Abstract Privacy preserving algorithms allow several participants to compute a global function collaboratively without revealing local information to each other. Examples of applications include trust management, collaborative filtering, and ranking algorithms such as PageRank. Most solutions that c ..."
Abstract
 Add to MetaCart
Abstract Privacy preserving algorithms allow several participants to compute a global function collaboratively without revealing local information to each other. Examples of applications include trust management, collaborative filtering, and ranking algorithms such as PageRank. Most solutions that can be proven to be privacy preserving theoretically are not appropriate for highly unreliable, large scale, distributed environments such as peertopeer networks because they either require centralized components, or a high degree of synchronism among the participants. At the same time, in peertopeer networks privacy preservation is becoming a key requirement. Here, we propose an asynchronous privacy preserving communication layer for an important class of iterative computations in peertopeer networks, where each peer periodically computes a linear combination of data stored at its neighbors. Our algorithm tolerates realistic rates of message drop and delay, and node churn, and has a low communication overhead. We perform simulation experiments to compare our algorithm to related work. The problem we use as an example is power iteration (a method used to calculate the dominant eigenvector of a matrix), since eigenvector computation is at the core of several practical applications. We demonstrate that our novel algorithm also converges in the presence of realistic node churn, message drop rates and message delay, even when previous synchronized solutions are able to make almost no progress.