Results 1  10
of
32
Fast and accurate estimation of shortest paths in large graphs
 In Proceedings of Conference on Information and Knowledge Management (CIKM
, 2010
"... Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large diskresident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual short ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large diskresident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. In this paper, we address this problem and present a scalable sketchbased index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to nearexact shortestpath approximations in real world graphs. We evaluate our techniques – implemented within a fully functional RDF graph database system – over large realworld social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0 % and 1 % on average.
Incremental graph pattern matching
 In SIGMOD
, 2011
"... Graph pattern matching has become a routine process in emerging applications such as social networks. In practice a data graph is typically large, and is frequently updated with small changes. It is often prohibitively expensive to recompute matches from scratch via batch algorithms when the graph ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Graph pattern matching has become a routine process in emerging applications such as social networks. In practice a data graph is typically large, and is frequently updated with small changes. It is often prohibitively expensive to recompute matches from scratch via batch algorithms when the graph is updated. With this comes the need for incremental algorithms that compute changes to the matches in response to updates, to minimize unnecessary recomputation. This paper investigates incremental algorithms for graph pattern matching defined in terms of graph simulation, bounded simulation and subgraph isomorphism. (1) For simulation, we provide incremental algorithms for unit updates and certain graph patterns. These algorithms are optimal: in linear time in the size of the changes in the input and output, which characterizes the cost that is inherent to the problem itself. For general patterns we show that the incremental matching problem is unbounded, i.e., its cost is not determined by the size of the changes alone. (2) For bounded simulation, we show that the problem is unbounded even for unit updates and path patterns. (3) For subgraph isomorphism, we show that the problem is intractable and unbounded for unit updates and path patterns. (4) For multiple updates, we develop an incremental algorithm for each of simulation, bounded simulation and subgraph isomorphism. We experimentally verify that these incremental algorithms significantly outperform their batch counterparts in response to small changes, using reallife data and synthetic data. Categories and Subject Descriptors: F.2 [Analysis of algorithms and problem complexity]: Nonnumerical algorithms and problems[pattern matching]
A Continuous Query System for Dynamic Route Planning
"... Abstract—In this paper, we address the problem of answering continuous route planning queries over a road network, in the presence of updates to the delay (cost) estimates of links. A simple approach to this problem would be to recompute the best path for all queries on arrival of every delay update ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we address the problem of answering continuous route planning queries over a road network, in the presence of updates to the delay (cost) estimates of links. A simple approach to this problem would be to recompute the best path for all queries on arrival of every delay update. However, such a naive approach scales poorly when there are many users who have requested routes in the system. Instead, we propose two new classes of approximate techniques – Kpaths and proximity measures to substantially speed up processing of the set of designated routes specified by continuous route planning queries in the face of incoming traffic delay updates. Our techniques work through a combination of precomputation of likely good paths and by avoiding complete recalculations on every delay update, instead only sending the user new routes when delays change significantly. Based on an experimental evaluation with 7,000 drives from real taxi cabs, we found that the routes delivered by our techniques are within 5 % of the best shortest path and have run times an order of magnitude or less compared to a naive approach. I.
Orion: Shortest Path Estimation for Large Social Graphs
"... Through measurements, researchers continue to produce large social graphs that capture relationships, transactions, and social interactions between users. Efficient analysis of these graphs requires algorithms that scale well with graph size. We examine node distance computation, a critical primitiv ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Through measurements, researchers continue to produce large social graphs that capture relationships, transactions, and social interactions between users. Efficient analysis of these graphs requires algorithms that scale well with graph size. We examine node distance computation, a critical primitive in graph problems such as computing node separation, centrality computation, mutual friend detection, and community detection. For large millionnode social graphs, computing even a single shortest path using traditional breadthfirstsearch can take several seconds. In this paper, we propose a novel node distance estimation mechanism that effectively maps nodes in high dimensional graphs to positions in lowdimension Euclidean coordinate spaces, thus allowing constant time node distance computation. We describe Orion, a prototype graph coordinate system, and explore critical decisions in its design. Finally, we evaluate the accuracy of Orion’s node distance estimates, and show that it can produce accurate results in applications such as node separation, node centrality, and ranked social search. 1
Neighborhoodprivacy protected shortest distance computing in cloud
 In SIGMOD Conference
, 2011
"... With the advent of cloud computing, it becomes desirable to utilize cloud computing to efficiently process complex operations in large graphs without compromising their sensitive information. This paper studies shortest distance computing in the cloud, which aims at the following goals: i) preventin ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
With the advent of cloud computing, it becomes desirable to utilize cloud computing to efficiently process complex operations in large graphs without compromising their sensitive information. This paper studies shortest distance computing in the cloud, which aims at the following goals: i) preventing outsourced graphs from neighborhood attack, ii) preserving shortest distances in outsourced graphs, iii) minimizing overhead on the client side. The basic idea of this paper is to transform an original graph G into a link graph Gl kept locally and a set of outsourced graphs Go. Each outsourced graph should meet the requirement of a new security model called 1neighborhooddradius. In addition, the shortest distance query can be equivalently answered using Gl and Go. Our objective is to minimize the space cost on the client side when both security and utility requirements are satisfied. We devise a greedy method to produce Gl and Go, which can exactly answer the shortest distance queries. We also develop an efficient transformation method to support approximate shortest distance answering under a given additive error bound. The final experimental results illustrate the effectiveness and efficiency of our method.
Fast fully dynamic landmarkbased estimation of shortest path distances in very large graphs
 In ACM Conference on Information and Knowledge Management (CIKM
, 2011
"... Computing the shortest path between a pair of vertices in a graph is a fundamental primitive in graph algorithmics. Classical exact methods for this problem do not scale up to contemporary, rapidly evolving social networks with hundreds of millions of users and billions of connections. A number of a ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Computing the shortest path between a pair of vertices in a graph is a fundamental primitive in graph algorithmics. Classical exact methods for this problem do not scale up to contemporary, rapidly evolving social networks with hundreds of millions of users and billions of connections. A number of approximate methods have been proposed, including several landmarkbased methods that have been shown to scale up to very large graphs with acceptable accuracy. This paper presents two improvements to existing landmarkbased shortest path estimation methods. The first improvement relates to the use of shortestpath trees (SPTs). Together with appropriate shortcutting heuristics, the use of SPTs allows to achieve higher accuracy with acceptable time and memory overhead. Furthermore, SPTs can be maintained incrementally under edge insertions and deletions, which allows for a fullydynamic algorithm. The second improvement is a new landmark selection strategy that seeks to maximize the coverage of all shortest paths by the selected landmarks. The improved method is evaluated on the DBLP, Orkut, Twitter and Skype social networks.
Benefits of bias: Towards better characterization of network sampling
 In SIGKDD
, 2011
"... From social networks to P2P systems, network sampling arises in many settings. We present a detailed study on the nature of biases in network sampling strategies to shed light on how best to sample from networks. We investigate connections between specific biases and various measures of structural r ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
From social networks to P2P systems, network sampling arises in many settings. We present a detailed study on the nature of biases in network sampling strategies to shed light on how best to sample from networks. We investigate connections between specific biases and various measures of structural representativeness. We show that certain biases are, in fact, beneficial for many applications, as they “push” the sampling process towards inclusion of desired properties. Finally, we describe how these sampling biases can be exploited in several, realworld applications including disease outbreak detection and market research.
On kskip Shortest Paths
"... Given two vertices s, t in a graph, let P be the shortest path (SP) from s to t, and P ⋆ a subset of the vertices in P. P ⋆ is a kskip shortest path from s to t, if it includes at least a vertex out of every k consecutive vertices in P. In general, P ⋆ succinctly describes P by sampling the vertice ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Given two vertices s, t in a graph, let P be the shortest path (SP) from s to t, and P ⋆ a subset of the vertices in P. P ⋆ is a kskip shortest path from s to t, if it includes at least a vertex out of every k consecutive vertices in P. In general, P ⋆ succinctly describes P by sampling the vertices in P with a rate of at least 1/k. This makes P ⋆ a natural substitute in scenarios where reporting every single vertex of P is unnecessary or even undesired. This paper studies kskip SP computation in the context of spatial network databases (SNDB). Our technique has two properties crucial for realtime query processing in SNDB. First, our solution is able to answer kskip queries significantly faster than finding the original SPs in their entirety. Second, the previous objective is achieved with a structure that occupies less space than storing the underlying road network. The proposed algorithms are the outcome of a careful theoretical analysis that reveals valuable insight into the characteristics of the kskip SP problem. Their efficiency has been confirmed by extensive experiments with real data.
Efficient Shortest Paths on Massive Social Graphs
"... Abstract—Analysis of large networks is a critical component of many of today’s application environments, including online social networks, protein interactions in biological networks, and Internet traffic analysis. The arrival of massive network graphs with hundreds of millions of nodes, e.g. social ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Abstract—Analysis of large networks is a critical component of many of today’s application environments, including online social networks, protein interactions in biological networks, and Internet traffic analysis. The arrival of massive network graphs with hundreds of millions of nodes, e.g. social graphs, presents a unique challenge to graph analysis applications. Most of these applications rely on computing distances between node pairs, which for large graphs can take minutes to compute using traditional algorithms such as breadthfirstsearch (BFS). In this paper, we study ways to enable scalable graph processing for today’s massive networks. We explore the design space of graph coordinate systems, a new approach that accurately approximates node distances in constant time by embedding graphs into coordinate spaces. We show that a hyperbolic embedding produces relatively low distortion error, and propose Rigel, a hyperbolic graph coordinate system that lends itself to efficient parallelization across a compute cluster. Rigel produces significantly more accurate results than prior systems, and is naturally parallelizable across compute clusters, allowing it to provide accurate results for graphs up to 43 million nodes. Finally, we show that Rigel’s functionality can be easily extended to locate (near) shortest paths between node pairs. After a onetime preprocessing cost, Rigel answers nodedistance queries in 10’s of microseconds, and also produces shortest path results up to 18 times faster than prior shortestpath systems with similar levels of accuracy. I.
ISLABEL: an IndependentSet based Labeling Scheme for PointtoPoint Distance Querying
"... We study the problem of computing shortest path or distance between two query vertices in a graph, which has numerous important applications. Quite a number of indexes have been proposed to answer such distance queries. However, all of these indexes can only process graphs of size barely up to 1 mil ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
We study the problem of computing shortest path or distance between two query vertices in a graph, which has numerous important applications. Quite a number of indexes have been proposed to answer such distance queries. However, all of these indexes can only process graphs of size barely up to 1 million vertices, which is rather small in view of many of the fastgrowing realworld graphs today such as social networks and Web graphs. We propose an efficient index, which is a novel labeling scheme based on the independent set of a graph. We show that our method can handle graphs of size orders of magnitude larger than existing indexes. 1.