## Efficient Shortest Paths on Massive Social Graphs

Citations: | 3 - 3 self |

### BibTeX

@MISC{Zhao_efficientshortest,

author = {Xiaohan Zhao and Ra Sala and Haitao Zheng and Ben Y. Zhao},

title = {Efficient Shortest Paths on Massive Social Graphs},

year = {}

}

### OpenURL

### Abstract

Abstract—Analysis of large networks is a critical component of many of today’s application environments, including online social networks, protein interactions in biological networks, and Internet traffic analysis. The arrival of massive network graphs with hundreds of millions of nodes, e.g. social graphs, presents a unique challenge to graph analysis applications. Most of these applications rely on computing distances between node pairs, which for large graphs can take minutes to compute using traditional algorithms such as breadth-first-search (BFS). In this paper, we study ways to enable scalable graph processing for today’s massive networks. We explore the design space of graph coordinate systems, a new approach that accurately approximates node distances in constant time by embedding graphs into coordinate spaces. We show that a hyperbolic embedding produces relatively low distortion error, and propose Rigel, a hyperbolic graph coordinate system that lends itself to efficient parallelization across a compute cluster. Rigel produces significantly more accurate results than prior systems, and is naturally parallelizable across compute clusters, allowing it to provide accurate results for graphs up to 43 million nodes. Finally, we show that Rigel’s functionality can be easily extended to locate (near-) shortest paths between node pairs. After a onetime preprocessing cost, Rigel answers node-distance queries in 10’s of microseconds, and also produces shortest path results up to 18 times faster than prior shortest-path systems with similar levels of accuracy. I.

### Citations

1371 |
A simplex method for function minimization
- Nelder, Mead
- 1965
(Show Context)
Citation Context ...r a graph node, we randomly select 16 out of the l (l = 100) landmarks. Since we know the actual distances in the graph between the new node and its 16 selected landmarks, we apply the Simplex method =-=[32]-=- to compute an optimal coordinate to minimize the deviation in distances between the node and its landmarks in the coordinate space and their actual distances. Optimizing Local Paths. It has been show... |

526 | Predicting Internet network distance with coordinates-based approaches
- Ng, Zhang
(Show Context)
Citation Context .... The most recent and well-known use of embedding techniques was in the context of network coordinate systems used to estimate Internet latencies without performing exhaustive end-to-end measurements =-=[8]-=-, [9], [10].We summarize prior experiences in embedding in geometric spaces from both measurement and theoretical studies. Euclidean embedding was first used on simple graphs [11], and was widely use... |

468 | Vivaldi: A Decentralized Network Coordinate System
- Dabek, Cox, et al.
(Show Context)
Citation Context ... recent and well-known use of embedding techniques was in the context of network coordinate systems used to estimate Internet latencies without performing exhaustive end-to-end measurements [8], [9], =-=[10]-=-.We summarize prior experiences in embedding in geometric spaces from both measurement and theoretical studies. Euclidean embedding was first used on simple graphs [11], and was widely used to predic... |

451 | The geometry of graphs and some of its algorithmic applications
- Linial, London, et al.
- 1995
(Show Context)
Citation Context ...ORDINATE SYSTEM A number of recent projects have shown that hyperbolic spaces can more accurately capture distances on a network graph [18], [19], [20]. We also empirically compute distortion metrics =-=[30]-=- on our social graphs for different coordinate systems, and find that the hyperbolic space is in fact significantly more accurate than Euclidean and spherical alternative. The results are omitted here... |

358 | What is twitter, a social network or a news media
- Kwak, Lee, et al.
- 2010
(Show Context)
Citation Context ...wling the Facebook network [5]. Jiang et al. [7] used the same methodology to generate a large social graph of 43 million users on Renren, the Chinese Facebook clone. Finally, Twitter was analyzed in =-=[26]-=-, and other studies modeled behavior of social network users using network level data measurements [27], [28]. Our focus. We focus on the problem of designing and building a real system for analyzing ... |

344 | Measurement and analysis of online social networks
- Mislove, Marcon, et al.
- 2007
(Show Context)
Citation Context ...enren social networks, each with millions of nodes and edges. We use them to test the efficiency and scalability of our system. The Livejournal, Flickr and Orkut are datasets shared by the authors of =-=[6]-=-. With 43 million nodes and more than 1 billion edges, our largest dataset is a snapshot of Renren, the largest online social network in China. We obtained this graph after seeking permission from Ren... |

216 | Community detection in graphs
- Fortunato
(Show Context)
Citation Context ...ny other social applications rely on shortest path computations. For instance, information dissemination [22] can use node distances to find the most influential nodes. Community detection algorithms =-=[23]-=- can use distance between nodes to cluster them. Algorithms for detecting Sybil attacks rely on strategies similar to community detection [24], and hence can also leverage node distance information. N... |

161 | Virtual Landmarks for the Internet
- Tang, Crovella
(Show Context)
Citation Context ...metric spaces from both measurement and theoretical studies. Euclidean embedding was first used on simple graphs [11], and was widely used to predict routing latency between Internet hosts [8], [10], =-=[12]-=-, [13]. These systems calibrate nodes’ geometric positions based on Internet round-trip time (RTT). Recent result in [14] proves the tightest upper bound, O( √ log n log log n) for an n-point Euclidea... |

133 | Big-bang simulation for embedding network distances in Euclidean space
- Shavitt, Tankel
(Show Context)
Citation Context ... spaces from both measurement and theoretical studies. Euclidean embedding was first used on simple graphs [11], and was widely used to predict routing latency between Internet hosts [8], [10], [12], =-=[13]-=-. These systems calibrate nodes’ geometric positions based on Internet round-trip time (RTT). Recent result in [14] proves the tightest upper bound, O( √ log n log log n) for an n-point Euclidean embe... |

123 | An architecture for a global internet host distance estimation service
- Francis, Jarnin, et al.
- 1999
(Show Context)
Citation Context ... most recent and well-known use of embedding techniques was in the context of network coordinate systems used to estimate Internet latencies without performing exhaustive end-to-end measurements [8], =-=[9]-=-, [10].We summarize prior experiences in embedding in geometric spaces from both measurement and theoretical studies. Euclidean embedding was first used on simple graphs [11], and was widely used to ... |

107 | User interactions in social networks and their implications - Wilson, Boe, et al. - 2009 |

90 | On the Accuracy of Embeddings for Internet Coordinate Systems
- Lua, Griffin, et al.
- 2005
(Show Context)
Citation Context ...rical coordinate systems. A hyperbolic space can be thought of a space with a tightly connected core, where all paths between nodes pass through. Experimental systems for embedding Internet distances =-=[17]-=-, [18], [15] generally showed improved accuracy over analogous systems that used Euclidean spaces. Kleinberg proposed a routing algorithm in ad hoc networks that works by greedy embedding the network ... |

62 |
Small Distortion and Volume Preserving Embeddings for Planar and Euclidean Metrics
- Rao
- 1999
(Show Context)
Citation Context ...-to-end measurements [8], [9], [10].We summarize prior experiences in embedding in geometric spaces from both measurement and theoretical studies. Euclidean embedding was first used on simple graphs =-=[11]-=-, and was widely used to predict routing latency between Internet hosts [8], [10], [12], [13]. These systems calibrate nodes’ geometric positions based on Internet round-trip time (RTT). Recent result... |

61 | Efficient influence maximization in social networks, KDD ’09
- Chen, Wang, et al.
- 2009
(Show Context)
Citation Context ...ns: graph separation metrics, graph centrality, and distance-ranked social search [2], [21]. Many other social applications rely on shortest path computations. For instance, information dissemination =-=[22]-=- can use node distances to find the most influential nodes. Community detection algorithms [23] can use distance between nodes to cluster them. Algorithms for detecting Sybil attacks rely on strategie... |

46 | Geographic routing using hyperbolic space, in
- Kleinberg
(Show Context)
Citation Context ...ork users using network level data measurements [27], [28]. Our focus. We focus on the problem of designing and building a real system for analyzing today’s massive networks. As with prior work [14], =-=[29]-=-, [19], it is extremely challenging to prove bounds on these probabilistic approaches. Instead, we use a wide range of empirical data to verify that our system works accurately for network graphs up t... |

45 | An analysis of social network-based sybil defenses
- Viswanath, Post, et al.
- 2010
(Show Context)
Citation Context ...he most influential nodes. Community detection algorithms [23] can use distance between nodes to cluster them. Algorithms for detecting Sybil attacks rely on strategies similar to community detection =-=[24]-=-, and hence can also leverage node distance information. Neighborhood function [25] uses node distance distributions to predict whether two graphs are similar. Finally, users in the Overstock auction ... |

41 | Exploiting social networks for Internet search
- Mislove, Gummadi, et al.
- 2006
(Show Context)
Citation Context ...lds. In Section V, we will evaluate our proposed system using three of the most common social analysis applications: graph separation metrics, graph centrality, and distance-ranked social search [2], =-=[21]-=-. Many other social applications rely on shortest path computations. For instance, information dissemination [22] can use node distances to find the most influential nodes. Community detection algorit... |

39 | The RDF-3X Engine for Scalable Management of RDF Data - Neumann, Weikum |

25 | Fast shortest path distance estimation in large networks
- Potamias, Bonchi, et al.
- 2009
(Show Context)
Citation Context ... (100 million) and Twitter (200 million), computing the shortest path distance between a single pair of nodes can take a minute or more using traditional algorithms such as breadth-first-search (BFS) =-=[2]-=-. Similarly, variants such as Dijkstra and FloydWarshall also fail to scale to these network sizes. Without an efficient alternative for node distance computation, recent work has focused on exploring... |

17 | Understanding latent interactions in online social networks
- Jiang, Wilson, et al.
- 2010
(Show Context)
Citation Context ... and more than 1 billion edges, our largest dataset is a snapshot of Renren, the largest online social network in China. We obtained this graph after seeking permission from Renren and the authors of =-=[7]-=-. While these graphs are still significantly smaller than the current user populations of Facebook (600 million) and LinkedIn (80 million), we believe our graphs are large enough to demonstrate the sc... |

17 | Hyperbolic embedding and routing for dynamic graphs
- Cvetkovski, Crovella
- 2009
(Show Context)
Citation Context ... improved accuracy over analogous systems that used Euclidean spaces. Kleinberg proposed a routing algorithm in ad hoc networks that works by greedy embedding the network into a hyperbolic space, and =-=[19]-=- proposed a similar approach for dynamic graphs. However, their focus is on smaller graphs of wireless or synthetic networks (∼50 nodes as in [19]). [20] proposes a model using Hyperbolic spaces to pr... |

16 | A sketch-based distance oracle for web-scale graphs
- Sarma, Gollapudi, et al.
(Show Context)
Citation Context ...oximations of shortest paths by using node distance queries as a tool. We first describe how this extension to Rigel computes short paths between any two nodes. Next, we describe the Sketch algorithm =-=[33]-=-, an efficient algorithm for shortest path estimation, and its followup algorithms including SketchCE, SketchCESC, and TreeSketch [34]. Finally, we compare Rigel’s shortest path algorithm against thes... |

13 | Do social networks improve e-commerce?: a study on social marketplaces
- Swamynathan, Wilson, et al.
- 2008
(Show Context)
Citation Context ...l protein interaction networks, and analysis of the Internet router backbone. For example, a social game network might search for “central” users to help deploy new games, while a social auction site =-=[1]-=- wants to tell a buyer if a specific item is being auctioned by someone in her social circles. Ideally, such queries should be answered quickly, regardless of the size of the graph, or even if graphs ... |

11 | Hyperbolic embedding of internet graph for distance estimation and overlay construction
- Shavitt, Tankel
- 2008
(Show Context)
Citation Context ...coordinate systems. A hyperbolic space can be thought of a space with a tightly connected core, where all paths between nodes pass through. Experimental systems for embedding Internet distances [17], =-=[18]-=-, [15] generally showed improved accuracy over analogous systems that used Euclidean spaces. Kleinberg proposed a routing algorithm in ad hoc networks that works by greedy embedding the network into a... |

9 | Fast and accurate estimation of shortest paths in large graphs
- Gubichev, Bedathur, et al.
- 2010
(Show Context)
Citation Context ...ths between any two nodes. Next, we describe the Sketch algorithm [33], an efficient algorithm for shortest path estimation, and its followup algorithms including SketchCE, SketchCESC, and TreeSketch =-=[34]-=-. Finally, we compare Rigel’s shortest path algorithm against these algorithms on a variety of social graphs in both accuracy and per-query runtime. We show that while Rigel requires similar preproces... |

7 | Measurement manipulation and space selection in network coordinates
- Lumezanu, Spring
- 2008
(Show Context)
Citation Context ... calibrate nodes’ geometric positions based on Internet round-trip time (RTT). Recent result in [14] proves the tightest upper bound, O( √ log n log log n) for an n-point Euclidean embedding. Vivaldi =-=[15]-=- was the first to investigate the accuracy of embedding a network into a spherical space. While morphing on spherical spaces is widely used in computer vision [16], there is little theoretical work in... |

7 |
Characterizing user behavior
- Benvenuto, Rodrigues, et al.
- 2009
(Show Context)
Citation Context ...raph of 43 million users on Renren, the Chinese Facebook clone. Finally, Twitter was analyzed in [26], and other studies modeled behavior of social network users using network level data measurements =-=[27]-=-, [28]. Our focus. We focus on the problem of designing and building a real system for analyzing today’s massive networks. As with prior work [14], [29], [19], it is extremely challenging to prove bou... |

6 | Volume distortion for subsets of euclidean spaces, Discrete Comput. Geom
- Lee
- 2009
(Show Context)
Citation Context ...d was widely used to predict routing latency between Internet hosts [8], [10], [12], [13]. These systems calibrate nodes’ geometric positions based on Internet round-trip time (RTT). Recent result in =-=[14]-=- proves the tightest upper bound, O( √ log n log log n) for an n-point Euclidean embedding. Vivaldi [15] was the first to investigate the accuracy of embedding a network into a spherical space. While ... |

6 | Greedy forwarding in dynamic scale-free networks embedded in hyperbolic metric spaces
- Papadopoulos, Krioukov, et al.
- 2010
(Show Context)
Citation Context ...dding the network into a hyperbolic space, and [19] proposed a similar approach for dynamic graphs. However, their focus is on smaller graphs of wireless or synthetic networks (∼50 nodes as in [19]). =-=[20]-=- proposes a model using Hyperbolic spaces to produce synthetic graphs. C. Social Network Applications and Studies Here we briefly summarize other related projects on social applications and social net... |

4 | shortest path estimation for large social graphs
- Zhao, Sala, et al.
(Show Context)
Citation Context ...ydWarshall also fail to scale to these network sizes. Without an efficient alternative for node distance computation, recent work has focused on exploring efficient approximation algorithms [2], [3], =-=[4]-=-. Our prior work [4], described the idea of graph coordinate systems, which embeds graph nodes into points on a coordinate system. The resulting coordinates can be used to quickly approximate node dis... |

4 | Morphing planar graphs in spherical space
- KOBOUROV, LANDIS
(Show Context)
Citation Context ...n-point Euclidean embedding. Vivaldi [15] was the first to investigate the accuracy of embedding a network into a spherical space. While morphing on spherical spaces is widely used in computer vision =-=[16]-=-, there is little theoretical work investigating spherical coordinate systems. A hyperbolic space can be thought of a space with a tightly connected core, where all paths between nodes pass through. E... |

3 |
ANF: A Fast and Scalable Tool for Data Mining
- Palmer, Gibbons, et al.
(Show Context)
Citation Context ...ween nodes to cluster them. Algorithms for detecting Sybil attacks rely on strategies similar to community detection [24], and hence can also leverage node distance information. Neighborhood function =-=[25]-=- uses node distance distributions to predict whether two graphs are similar. Finally, users in the Overstock auction site query the social graph to see how they are connected to sellers of a given pro... |

3 |
et al. Understanding online social network usage from a network perspective
- Schneider
- 2009
(Show Context)
Citation Context ...f 43 million users on Renren, the Chinese Facebook clone. Finally, Twitter was analyzed in [26], and other studies modeled behavior of social network users using network level data measurements [27], =-=[28]-=-. Our focus. We focus on the problem of designing and building a real system for analyzing today’s massive networks. As with prior work [14], [29], [19], it is extremely challenging to prove bounds on... |

2 |
Using of structure indices for efficinet approximation of network properties
- RATTIGAN, MAIER, et al.
- 2006
(Show Context)
Citation Context ...d FloydWarshall also fail to scale to these network sizes. Without an efficient alternative for node distance computation, recent work has focused on exploring efficient approximation algorithms [2], =-=[3]-=-, [4]. Our prior work [4], described the idea of graph coordinate systems, which embeds graph nodes into points on a coordinate system. The resulting coordinates can be used to quickly approximate nod... |

2 |
Fast and scalable analysis of massive social graphs
- Zhao, Sala, et al.
- 2011
(Show Context)
Citation Context ... coordinate systems, and find that the hyperbolic space is in fact significantly more accurate than Euclidean and spherical alternative. The results are omitted here for brevity, but available online =-=[31]-=-. In this section, we describe Rigel, a hyperbolic graph coordinate system (GCS) for estimating node distance queries. Before answering queries on a particular graph, the graph must first be embedded ... |

1 |
et al., “UnderstandingOnlineSocialNetworkUsagefrom a Network Perspective
- Schneider
(Show Context)
Citation Context ...f 43 million users on Renren, the Chinese Facebook clone. Finally, Twitter was analyzed in [26], and other studies modeled behavior of social network users using network level data measurements [27], =-=[28]-=-. Our focus. We focus on the problem of designing and building a real system for analyzing today’s massive networks. As with prior work [14], [29], [19], it is extremely challenging to prove bounds on... |

1 |
The geometry of graphsand some of its algorithmic applications
- Linial, London, et al.
- 1994
(Show Context)
Citation Context ...GRAPH COORDINATE SYSTEM Anumberofrecentprojectshaveshownthathyperbolic spaces can more accurately capture distances on a network graph [18], [19], [20]. We also empirically compute distortion metrics =-=[30]-=- on our social graphs for different coordinate systems, and find that the hyperbolic space is in fact significantly more accurate than Euclidean and spherical alternative. The results are omitted here... |