Results 1  10
of
57
On Graph Query Optimization in Large Networks
"... The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such largescale graphstructured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such largescale graphstructured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph structures efficiently within a large network? Unfortunately, the graph query is hard due to the NPcomplete nature of subgraph isomorphism. It becomes even challenging when the network examined is large and diverse. In this paper, we present a high performance graph indexing mechanism, SPath, to address the graph query problem on large networks. SPath leverages decomposed shortest paths around vertex neighborhood as basic indexing units, which prove to be both effective in graph search space pruning and highly scalable in index construction and deployment. Via SPath, a graph query is processed and optimized beyond the traditional vertexatatime fashion to a more efficient pathatatime way: the query is first decomposed to a set of shortest paths, among which a subset of candidates with good selectivity is picked by a query plan optimizer; Candidate paths are further joined together to help recover the query graph to finalize the graph query processing. We evaluate SPath with the stateoftheart GraphQL on both real and synthetic data sets. Our experimental studies demonstrate the effectiveness and scalability of SPath, which proves to be a more practical and efficient indexing method in addressing graph queries on large networks. 1.
Adding Regular Expressions to Graph Reachability and Pattern Queries
 Frontiers of Computer Science
, 2012
"... Abstract—It is increasingly common to find graphs in which edges bear different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressin ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
(Show Context)
Abstract—It is increasingly common to find graphs in which edges bear different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity in a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubictime algorithms for evaluating graph pattern queries based on extended graph simulation, as opposed to the NPcompleteness of graph pattern matching via subgraph isomorphism. (4) The effectiveness, efficiency and scalability of these algorithms are experimentally verified using reallife data and synthetic data. I.
Distanceconstraint reachability computation in uncertain graphs
 PVLDB
"... Driven by the emerging network applications, querying and mining uncertain graphs has become increasingly important. In this paper, we investigate a fundamental problem concerning uncertain graphs, which we call the distanceconstraint reachability (DCR) problem: Given two vertices s and t, what is ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
Driven by the emerging network applications, querying and mining uncertain graphs has become increasingly important. In this paper, we investigate a fundamental problem concerning uncertain graphs, which we call the distanceconstraint reachability (DCR) problem: Given two vertices s and t, what is the probability that the distance from s to t is less than or equal to a userdefined threshold d in the uncertain graph? Since this problem is #PComplete, we focus on efficiently and accurately approximating DCR online. Our main results include two new estimators for the probabilistic reachability. One is a HorvitzThomson type estimator based on the unequal probabilistic sampling scheme, and the other is a novel recursive sampling estimator, which effectively combines a deterministic recursive computational procedure with a sampling process to boost the estimation accuracy. Both estimators can produce much smaller variance than the direct sampling estimator, which considers each trial to be either 1 or 0. We also present methods to make these estimators more computationally efficient. The comprehensive experiment evaluation on both real and synthetic datasets demonstrates the efficiency and accuracy of our new estimators. 1.
Fast and accurate estimation of shortest paths in large graphs
 In Proceedings of Conference on Information and Knowledge Management (CIKM
, 2010
"... Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large diskresident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual short ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large diskresident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. In this paper, we address this problem and present a scalable sketchbased index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to nearexact shortestpath approximations in real world graphs. We evaluate our techniques – implemented within a fully functional RDF graph database system – over large realworld social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0 % and 1 % on average.
A SURVEY OF ALGORITHMS FOR DENSE SUBGRAPH DISCOVERY
"... In this chapter, we present a survey of algorithms for dense subgraph discovery. The problem of dense subgraph discovery is closely related to clustering though the two problems also have a number of differences. For example, the problem of clustering is largely concerned with that of finding a fixe ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
In this chapter, we present a survey of algorithms for dense subgraph discovery. The problem of dense subgraph discovery is closely related to clustering though the two problems also have a number of differences. For example, the problem of clustering is largely concerned with that of finding a fixed partition in the data, whereas the problem of dense subgraph discovery defines these dense components in a much more flexible way. The problem of dense subgraph discovery may wither be defined over single or multiple graphs. We explore both cases. In the latter case, the problem is also closely related to the problem of the frequent subgraph discovery. This chapter will discuss and organize the literature on this topic effectively in order to make it much more accessible to the reader.
On Querying Historical Evolving Graph Sequences
"... In many applications, information is best represented as graphs. In a dynamic world, information changes and so the graphs representing the information evolve with time. We propose that historical graphstructured data be maintained for analytical processing. We call a historical evolving graph sequ ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
In many applications, information is best represented as graphs. In a dynamic world, information changes and so the graphs representing the information evolve with time. We propose that historical graphstructured data be maintained for analytical processing. We call a historical evolving graph sequence an EGS. We observe that in many applications, graphs of an EGS are large and numerous, and they often exhibit much redundancy among them. We study the problem of efficient query processing on an EGS and put forward a solution framework called FVF. Through extensive experiments on both real and synthetic datasets, we show that our FVF framework is highly efficient in EGS query processing. 1.
Query preserving graph compression
 In SIGMOD
, 2012
"... It is common to find graphs with millions of nodes and billions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to compress graphs relative to a class Q of queries of users ’ choice. We c ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
(Show Context)
It is common to find graphs with millions of nodes and billions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to compress graphs relative to a class Q of queries of users ’ choice. We compute a small Gr from a graph G such that (a) for any query Q ∈ Q, Q(G) = Q′(Gr), where Q ′ ∈ Q can be efficiently computed from Q; and (b) any algorithm for computing Q(G) can be directly applied to evaluating Q ′ on Gr as is. That is, while we cannot lower the complexity of evaluating graph queries, we reduce data graphs while preserving the answers to all the queries in Q. To verify the effectiveness of this approach, (1) we develop compression strategies for two classes of queries: reachability and graph pattern queries via (bounded) simulation. We show that graphs can be efficiently compressed via a reachability equivalence relation and graph bisimulation, respectively, while preserving query answers. (2) We provide techniques for maintaining compressed graph Gr in response to changes ΔG to the original graph G. We show that the incremental maintenance problems are unbounded for the two classes of queries, i.e., their costs are not a function of the size of ΔG and changes in Gr. Nevertheless, we develop incremental algorithms that depend only on ΔG and Gr, independent of G, i.e., we do not have to decompress Gr to propagate the changes. (3) Using reallife data, we experimentally verify that our compression techniques could reduce graphs in average by 95% for reachability and 57 % for graph pattern matching, and that our incremental maintenance algorithms are efficient. Categories and Subject Descriptors F.2 [Analysis of algorithms and problem complexity]: Nonnumerical algorithms and problems—graph compression
Densest Subgraph in Streaming and MapReduce
"... The problem of finding locally dense components of a graph is an important primitive in data analysis, with wideranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we present new algorithms for finding the densest subgraph in t ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
The problem of finding locally dense components of a graph is an important primitive in data analysis, with wideranging applications from community mining to spam detection and the discovery of biological network modules. In this paper we present new algorithms for finding the densest subgraph in the streaming model. For any ɛ> 0, our algorithms make O(log 1+ɛ n) passes over the input and find a subgraph whose density is guaranteed to be within a factor 2(1 + ɛ) of the optimum. Our algorithms are also easily parallelizable and we illustrate this by realizing them in the MapReduce model. In addition we perform extensive experimental evaluation on massive realworld graphs showing the performance and scalability of our algorithms in practice. 1.