Results 1  10
of
30
Kineograph: taking the pulse of a fastchanging and connected world
 In Proceedings of the 7th ACM european conference on Computer Systems, EuroSys ’12
, 2012
"... Kineograph is a distributed system that takes a stream of incoming data to construct a continuously changing graph, which captures the relationships that exist in the data feed. As a computing platform, Kineograph further supports graphmining algorithms to extract timely insights from the fastchan ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
(Show Context)
Kineograph is a distributed system that takes a stream of incoming data to construct a continuously changing graph, which captures the relationships that exist in the data feed. As a computing platform, Kineograph further supports graphmining algorithms to extract timely insights from the fastchanging graph structure. To accommodate graphmining algorithms that assume a static underlying graph, Kineograph creates a series of consistent snapshots, using a novel and efficient epoch commit protocol. To keep up with continuous updates on the graph, Kineograph includes an incremental graphcomputation engine. We have developed three applications on top of Kineograph to analyze Twitter data: user ranking, approximate shortest paths, and controversial topic detection. For these applications, Kineograph takes a live Twitter data feed and maintains a graph of edges between all users and hashtags. Our evaluation shows that with 40 machines processing 100K tweets per second, Kineograph is able to continuously compute global properties, such as user ranks, with less than 2.5minute timeliness guarantees. This rate of traffic is more than 10 times the reported peak rate of Twitter as of October 2011.
Fast and accurate estimation of shortest paths in large graphs
 In Proceedings of Conference on Information and Knowledge Management (CIKM
, 2010
"... Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large diskresident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual short ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large diskresident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. In this paper, we address this problem and present a scalable sketchbased index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to nearexact shortestpath approximations in real world graphs. We evaluate our techniques – implemented within a fully functional RDF graph database system – over large realworld social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0 % and 1 % on average.
Exploring the design space of social networkbased Sybil defense
 In Proceedings of the 4th International Conference on Communication Systems and Network (COMSNETS’12
, 2012
"... Abstract—Recently, there has been significant research interest in leveraging social networks to defend against Sybil attacks. While much of this work may appear similar at first glance, existing social networkbased Sybil defense schemes can be divided into two categories: Sybil detection and Sybil ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
(Show Context)
Abstract—Recently, there has been significant research interest in leveraging social networks to defend against Sybil attacks. While much of this work may appear similar at first glance, existing social networkbased Sybil defense schemes can be divided into two categories: Sybil detection and Sybil tolerance. These two categories of systems both leverage global properties of the underlying social graph, but they rely on different assumptions and provide different guarantees: Sybil detection schemes are applicationindependent and rely only on the graph structure to identify Sybil identities, while Sybil tolerance schemes rely on applicationspecific information and leverage the graph structure and transaction history to bound the leverage an attacker can gain from using multiple identities. In this paper, we take a closer look at the design goals, models, assumptions, guarantees, and limitations of both categories of social networkbased Sybil defense systems. I.
Fast fully dynamic landmarkbased estimation of shortest path distances in very large graphs
 In ACM Conference on Information and Knowledge Management (CIKM
, 2011
"... Computing the shortest path between a pair of vertices in a graph is a fundamental primitive in graph algorithmics. Classical exact methods for this problem do not scale up to contemporary, rapidly evolving social networks with hundreds of millions of users and billions of connections. A number of a ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
Computing the shortest path between a pair of vertices in a graph is a fundamental primitive in graph algorithmics. Classical exact methods for this problem do not scale up to contemporary, rapidly evolving social networks with hundreds of millions of users and billions of connections. A number of approximate methods have been proposed, including several landmarkbased methods that have been shown to scale up to very large graphs with acceptable accuracy. This paper presents two improvements to existing landmarkbased shortest path estimation methods. The first improvement relates to the use of shortestpath trees (SPTs). Together with appropriate shortcutting heuristics, the use of SPTs allows to achieve higher accuracy with acceptable time and memory overhead. Furthermore, SPTs can be maintained incrementally under edge insertions and deletions, which allows for a fullydynamic algorithm. The second improvement is a new landmark selection strategy that seeks to maximize the coverage of all shortest paths by the selected landmarks. The improved method is evaluated on the DBLP, Orkut, Twitter and Skype social networks.
Canal: Scaling Social NetworkBased Sybil Tolerance Schemes
"... There has been a flurry of research on leveraging social networks to defend against multiple identity, or Sybil, attacks. A series of recent works does not try to explicitly identify Sybil identities and, instead, bounds the impact that Sybil identities can have. We call these approaches Sybil toler ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
There has been a flurry of research on leveraging social networks to defend against multiple identity, or Sybil, attacks. A series of recent works does not try to explicitly identify Sybil identities and, instead, bounds the impact that Sybil identities can have. We call these approaches Sybil tolerance; they have shown to be effective in applications including reputation systems, spam protection, online auctions, and content rating systems. All of these approaches use a social network as a credit network, rendering multiple identities ineffective to an attacker without a commensurate increase in social links to honest users (which are assumed to be hard to obtain). Unfortunately, a hurdle to practical adoption is that Sybil tolerance relies on computationally expensive network analysis, thereby limiting widespread deployment.
Of Hammers and Nails: An Empirical Comparison of Three Paradigms for Processing Large Graphs
"... Many phenomena and artifacts such as road networks, social networks and the web can be modeled as large graphs and analyzed using graph algorithms. However, given the size of the underlying graphs, efficient implementation of basic operations such as connected component analysis, approximate shortes ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
Many phenomena and artifacts such as road networks, social networks and the web can be modeled as large graphs and analyzed using graph algorithms. However, given the size of the underlying graphs, efficient implementation of basic operations such as connected component analysis, approximate shortest paths, and linkbased ranking (e.g.PageRank) becomes challenging. This paper presents an empirical study of computations on such large graphs in three wellstudied platform models, viz., a relational model, a dataparallel model, and a specialpurpose inmemory model. We choose a prototypical member of each platform model and analyze the computational efficiencies and requirements for five basic graph operations used in the analysis of realworld graphs viz., PageRank, SALSA, Strongly Connected Components (SCC), Weakly Connected Components (WCC), and Approximate Shortest Paths (ASP). Further, we characterize each platform in terms of these computations using modelspecific implementations of these algorithms on a large web graph. Our experiments show that there is no single platform that performs best across different classes of operations on large graphs. While relational databases are powerful and flexible tools that support a wide variety of computations, there are computations that benefit from using specialpurpose storage systems and others that can exploit dataparallel platforms. Categories and Subject Descriptors G.2.2 [Discrete mathematics]: Graph Theory—Graph algorithms, path and circuit problems; H.2.4 [Database management]: Systems—Distributed
Efficient Shortest Paths on Massive Social Graphs
"... Abstract—Analysis of large networks is a critical component of many of today’s application environments, including online social networks, protein interactions in biological networks, and Internet traffic analysis. The arrival of massive network graphs with hundreds of millions of nodes, e.g. social ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Analysis of large networks is a critical component of many of today’s application environments, including online social networks, protein interactions in biological networks, and Internet traffic analysis. The arrival of massive network graphs with hundreds of millions of nodes, e.g. social graphs, presents a unique challenge to graph analysis applications. Most of these applications rely on computing distances between node pairs, which for large graphs can take minutes to compute using traditional algorithms such as breadthfirstsearch (BFS). In this paper, we study ways to enable scalable graph processing for today’s massive networks. We explore the design space of graph coordinate systems, a new approach that accurately approximates node distances in constant time by embedding graphs into coordinate spaces. We show that a hyperbolic embedding produces relatively low distortion error, and propose Rigel, a hyperbolic graph coordinate system that lends itself to efficient parallelization across a compute cluster. Rigel produces significantly more accurate results than prior systems, and is naturally parallelizable across compute clusters, allowing it to provide accurate results for graphs up to 43 million nodes. Finally, we show that Rigel’s functionality can be easily extended to locate (near) shortest paths between node pairs. After a onetime preprocessing cost, Rigel answers nodedistance queries in 10’s of microseconds, and also produces shortest path results up to 18 times faster than prior shortestpath systems with similar levels of accuracy. I.
ISLABEL: an IndependentSet based Labeling Scheme for PointtoPoint Distance Querying
"... We study the problem of computing shortest path or distance between two query vertices in a graph, which has numerous important applications. Quite a number of indexes have been proposed to answer such distance queries. However, all of these indexes can only process graphs of size barely up to 1 mil ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
We study the problem of computing shortest path or distance between two query vertices in a graph, which has numerous important applications. Quite a number of indexes have been proposed to answer such distance queries. However, all of these indexes can only process graphs of size barely up to 1 million vertices, which is rather small in view of many of the fastgrowing realworld graphs today such as social networks and Web graphs. We propose an efficient index, which is a novel labeling scheme based on the independent set of a graph. We show that our method can handle graphs of size orders of magnitude larger than existing indexes. 1.
Approximate shortest distance computing: A querydependent local landmark scheme
 In ICDE
, 2012
"... Abstract—Shortest distance query between two nodes is a fundamental operation in largescale networks. Most existing methods in the literature take a landmark embedding approach, which selects a set of graph nodes as landmarks and computes the shortest distances from each landmark to all nodes as an ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Shortest distance query between two nodes is a fundamental operation in largescale networks. Most existing methods in the literature take a landmark embedding approach, which selects a set of graph nodes as landmarks and computes the shortest distances from each landmark to all nodes as an embedding. To handle a shortest distance query between two nodes, the precomputed distances from the landmarks to the query nodes are used to compute an approximate shortest distance based on the triangle inequality. In this paper, we analyze the factors that affect the accuracy of the distance estimation in the landmark embedding approach. In particular we find that a globally selected, queryindependent landmark set plus the triangulation based distance estimation introduces a large relative error, especially for nearby query nodes. To address this issue, we propose a querydependent local landmark scheme, which identifies a local landmark close to the specific query nodes and provides a more accurate distance estimation than the traditional global landmark approach. Specifically, a local landmark is defined as the least common ancestor of the two query nodes in the shortest path tree rooted at a global landmark. We propose efficient local landmark indexing and retrieval techniques, which are crucial to achieve low offline indexing complexity and online query complexity. Two optimization techniques on graph compression and graph online search are also proposed, with the goal to further reduce index size and improve query accuracy. Our experimental results on largescale social networks and road networks demonstrate that the local landmark scheme reduces the shortest distance estimation error significantly when compared with global landmark embedding. I.
Scalable similarity estimation in social networks: closeness, node labels, and random edge lengths
 In COSN
, 2013
"... Similarity estimation between nodes based on structural properties of graphs is a basic building block used in the analysis of massive networks for diverse purposes such as link prediction, product recommendations, advertisement, collaborative filtering, and community discovery. While local simila ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Similarity estimation between nodes based on structural properties of graphs is a basic building block used in the analysis of massive networks for diverse purposes such as link prediction, product recommendations, advertisement, collaborative filtering, and community discovery. While local similarity measures, based on properties of immediate neighbors, are easy to compute, those relying on global properties have better recall. Unfortunately, this better quality comes with a computational price tag. Aiming for both accuracy and scalability, we make several contributions. First, we define closeness similarity, a natural measure that compares two nodes based on the similarity of their relations to all other nodes. Second, we show how the alldistances sketch (ADS) node labels, which are efficient to compute, can support the estimation of closeness similarity and shortestpath (SP) distances in logarithmic query time. Third, we propose the randomized edge lengths (REL) technique and define the corresponding REL distance, which captures both path length and path multiplicity and therefore improves over the SP distance as a similarity measure. The REL distance can also be the basis of closeness similarity and can be estimated using SP computation or the ADS labels. We demonstrate the effectiveness of our measures and the accuracy of our estimates through experiments on social networks with up to tens of millions of nodes.