Results 1  10
of
54
Adding Regular Expressions to Graph Reachability and Pattern Queries
 Frontiers of Computer Science
, 2012
"... Abstract—It is increasingly common to find graphs in which edges bear different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressin ..."
Abstract

Cited by 30 (6 self)
 Add to MetaCart
(Show Context)
Abstract—It is increasingly common to find graphs in which edges bear different types, indicating a variety of relationships. For such graphs we propose a class of reachability queries and a class of graph patterns, in which an edge is specified with a regular expression of a certain form, expressing the connectivity in a data graph via edges of various types. In addition, we define graph pattern matching based on a revised notion of graph simulation. On graphs in emerging applications such as social networks, we show that these queries are capable of finding more sensible information than their traditional counterparts. Better still, their increased expressive power does not come with extra complexity. Indeed, (1) we investigate their containment and minimization problems, and show that these fundamental problems are in quadratic time for reachability queries and are in cubic time for pattern queries. (2) We develop an algorithm for answering reachability queries, in quadratic time as for their traditional counterpart. (3) We provide two cubictime algorithms for evaluating graph pattern queries based on extended graph simulation, as opposed to the NPcompleteness of graph pattern matching via subgraph isomorphism. (4) The effectiveness, efficiency and scalability of these algorithms are experimentally verified using reallife data and synthetic data. I.
Neighborhood based fast graph search in large networks
 in SIGMOD
, 2011
"... Complex social and information network search becomes important with a variety of applications. In the core of these applications, lies a common and critical problem: Given a labeled network and a query graph, how to efficiently search the query graph in the target network. The presence of noise a ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
Complex social and information network search becomes important with a variety of applications. In the core of these applications, lies a common and critical problem: Given a labeled network and a query graph, how to efficiently search the query graph in the target network. The presence of noise and the incomplete knowledge about the structure and content of the target network make it unrealistic to find an exact match. Rather, it is more appealing to find the topk approximate matches. In this paper, we propose a neighborhoodbased similarity measure that could avoid costly graph isomorphism and edit distance computation. Under this new measure, we prove that subgraph similarity search is NP hard, while graph similarity match is polynomial. By studying the principles behind this measure, we found an information propagation model that is able to convert a large net
Query preserving graph compression
 In SIGMOD
, 2012
"... It is common to find graphs with millions of nodes and billions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to compress graphs relative to a class Q of queries of users ’ choice. We c ..."
Abstract

Cited by 22 (10 self)
 Add to MetaCart
(Show Context)
It is common to find graphs with millions of nodes and billions of edges in, e.g., social networks. Queries on such graphs are often prohibitively expensive. These motivate us to propose query preserving graph compression, to compress graphs relative to a class Q of queries of users ’ choice. We compute a small Gr from a graph G such that (a) for any query Q ∈ Q, Q(G) = Q′(Gr), where Q ′ ∈ Q can be efficiently computed from Q; and (b) any algorithm for computing Q(G) can be directly applied to evaluating Q ′ on Gr as is. That is, while we cannot lower the complexity of evaluating graph queries, we reduce data graphs while preserving the answers to all the queries in Q. To verify the effectiveness of this approach, (1) we develop compression strategies for two classes of queries: reachability and graph pattern queries via (bounded) simulation. We show that graphs can be efficiently compressed via a reachability equivalence relation and graph bisimulation, respectively, while preserving query answers. (2) We provide techniques for maintaining compressed graph Gr in response to changes ΔG to the original graph G. We show that the incremental maintenance problems are unbounded for the two classes of queries, i.e., their costs are not a function of the size of ΔG and changes in Gr. Nevertheless, we develop incremental algorithms that depend only on ΔG and Gr, independent of G, i.e., we do not have to decompress Gr to propagate the changes. (3) Using reallife data, we experimentally verify that our compression techniques could reduce graphs in average by 95% for reachability and 57 % for graph pattern matching, and that our incremental maintenance algorithms are efficient. Categories and Subject Descriptors F.2 [Analysis of algorithms and problem complexity]: Nonnumerical algorithms and problems—graph compression
Managing large dynamic graphs efficiently
, 2012
"... There is an increasing need to ingest, manage, and query large volumes of graphstructured data arising in applications like social networks, communication networks, biological networks, and so on. Graph databases that can explicitly reason about the graphical nature of the data, that can support fl ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
(Show Context)
There is an increasing need to ingest, manage, and query large volumes of graphstructured data arising in applications like social networks, communication networks, biological networks, and so on. Graph databases that can explicitly reason about the graphical nature of the data, that can support flexible schemas and nodecentric or edgecentric analysis and querying, are ideal for storing such data. However, although there is much work on singlesite graph databases and on efficiently executing different types of queries over large graphs, to date there is little work on understanding the challenges in distributed graph databases, needed to handle the large scale of such data. In this paper, we propose the design of an inmemory, distributed graph data management system aimed at managing a largescale dynamically changing graph, and supporting lowlatency query processing over it. The key challenge in
Keyword search in graphs: Finding rcliques
 PVLDB
, 2011
"... ABSTRACT Keyword search over a graph finds a substructure of the graph containing all or some of the input keywords. Most of previous methods in this area find connected minimal trees that cover all the query keywords. Recently, it has been shown that finding subgraphs rather than trees can be more ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
ABSTRACT Keyword search over a graph finds a substructure of the graph containing all or some of the input keywords. Most of previous methods in this area find connected minimal trees that cover all the query keywords. Recently, it has been shown that finding subgraphs rather than trees can be more useful and informative for the users. However, the current tree or graph based methods may produce answers in which some content nodes (i.e., nodes that contain input keywords) are not very close to each other. In addition, when searching for answers, these methods may explore the whole graph rather than only the content nodes. This may lead to poor performance in execution time. To address the above problems, we propose the problem of finding rcliques in graphs. An rclique is a group of content nodes that cover all the input keywords and the distance between each two nodes is less than or equal to r. An exact algorithm is proposed that finds all rcliques in the input graph. In addition, an approximation algorithm that produces rcliques with 2approximation in polynomial delay is proposed. Extensive performance studies using two large real data sets confirm the efficiency and accuracy of finding rcliques in graphs.
Querying graph databases with XPath
, 2013
"... General Terms XPath plays a prominent role as an XML navigational language due to several factors, including its ability to express queries of interest, its close connection to yardstick database query languages (e.g., firstorder logic), and the low complexity of query evaluation for many fragments ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
(Show Context)
General Terms XPath plays a prominent role as an XML navigational language due to several factors, including its ability to express queries of interest, its close connection to yardstick database query languages (e.g., firstorder logic), and the low complexity of query evaluation for many fragments. Another common database model — graph databases — also requires a heavy use of navigation in queries; yet it largely adopts a different approach to querying, relying on reachability patterns expressed with regular constraints. Our goal here is to investigate the behavior and applicability of XPathlike languages for querying graph databases, concentrating on their expressiveness and complexity of query evaluation. We are particularly interested in a model of graph data that combines navigation through graphs with querying data held in the nodes, such as, for example, in a social network scenario. As navigational languages, we use analogs of core and regular XPath and augment them with various tests on data values. We relate these languages to firstorder logic, its transitive closure extensions, and finitevariable fragments thereof, proving several capture results. In addition, we describe their relative expressive power. We then show that they behave very well computationally: they have a lowdegree polynomial combined complexity, which becomes linear for several fragments. Furthermore, we introduce new types of tests for XPath languages that let them capture firstorder logic with data comparisons and prove that the low complexity bounds continue to apply to such extended languages. Therefore, XPathlike languages seem to be very wellsuited to query graphs.
Regular path queries on graphs with data
 In ICDT’12
"... Graph data models received much attention lately due to applications in social networks, semantic web, biological databases and other areas. Typical query languages for graph databases retrieve their topology, while actual data stored in them is usually queried using standard relational mechanisms. ..."
Abstract

Cited by 16 (6 self)
 Add to MetaCart
Graph data models received much attention lately due to applications in social networks, semantic web, biological databases and other areas. Typical query languages for graph databases retrieve their topology, while actual data stored in them is usually queried using standard relational mechanisms. Our goal is to develop techniques that combine these two modes of querying, and give us query languages that can ask questions about both data and topology. As the basic querying mechanism we consider regular path queries, with the key difference that conditions on paths between nodes now talk not only about labels but also specify how data changes along the path. Paths that combine edge labels with data values are closely related to data words, so for stating conditions in queries, we look at several dataword formalisms developed recently. We show that many of them immediately lead to intractable data complexity for graph queries, with the notable exception of register automata, which can specify many properties of interest, and have NLOGSPACE data and PSPACE combined complexity. As register automata themselves are not easy to use in querying, we define two types of extensions of regular expressions that are more userfriendly, and develop query evaluation techniques for them. For one class, regular expressions with memory, we achieve the same bounds as for automata, and for the other class, regular expressions with equality, we also obtain tractable combined complexity of query evaluation. In addition, we show that results extends to analogs of conjunctive regular path queries.
Querying Graph Patterns
"... Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, i.e., graph patterns. While queries need to be posed against such data, techniques for querying patterns are generall ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
Graph data appears in a variety of application domains, and many uses of it, such as querying, matching, and transforming data, naturally result in incompletely specified graph data, i.e., graph patterns. While queries need to be posed against such data, techniques for querying patterns are generally lacking, and properties of such queries are not well understood. Our goal is to study the basics of querying graph patterns. We first identify key features of patterns, such as node and label variables and edges specified by regular expressions, and define a classification of patterns based on them. We then study standard graph queries on graph patterns, and give precise characterizations of both data and combined complexity for each class of patterns. If complexity is high, we do further analysis of features that lead to intractability, as well as lowercomplexity restrictions. We introduce a new automata model for query answering with two modes of acceptance: one captures queries returning nodes, and the other queries returning paths. We study properties of such automata, and the key computational tasks associated with them. Finally, we provide additional restrictions for tractability, and show that some intractable cases can be naturally cast as instances of constraint satisfaction problem.
Capturing Topology in Graph Pattern Matching
"... Graph pattern matching is often defined in terms of subgraph isomorphism, an npcomplete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow pattern matching to be conducted in cubictime. However, they fall short of capturing ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
Graph pattern matching is often defined in terms of subgraph isomorphism, an npcomplete problem. To lower its complexity, various extensions of graph simulation have been considered instead. These extensions allow pattern matching to be conducted in cubictime. However, they fall short of capturing the topology of data graphs, i.e., graphs may have a structure drastically different from pattern graphs they match, and the matches found are often too large to understand and analyze. To rectify these problems, this paper proposes a notion of strong simulation, a revision of graph simulation, for graph pattern matching. (1) We identify a set of criteria for preserving the topology of graphs matched. We show that strong simulation preserves the topology of data graphs and finds a bounded number of matches. (2) We show that strong simulation retains the same complexity as earlier extensions of simulation, by providing a cubictime algorithm for computing strong simulation. (3) We present the locality property of strong simulation, which allows us to effectively conduct pattern matching on distributed graphs. (4) We experimentally verify the effectiveness and efficiency of these algorithms, using reallife data and synthetic data. 1.