Results 1 - 10
of
21
Graphs-at-a-time: Query Language and Access Methods for Graph Databases
, 2008
"... With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this l ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
With the prevalence of graph data in a variety of domains, there is an increasing need for a language to query and manipulate graphs with heterogeneous attributes and structures. We propose a query language for graph databases that supports arbitrary attributes on nodes, edges, and graphs. In this language, graphs are the basic unit of information and each query manipulates one or more collections of graphs. To allow for flexible compositions of graph structures, we extend the notion of formal languages from strings to the graph domain. We present a graph algebra extended from the relational algebra in which the selection operator is generalized to graph pattern matching and a composition operator is introduced for rewriting matched graphs. Then, we investigate access methods of the selection operator. Pattern matching over large graphs is challenging due to the NP-completeness of subgraph isomorphism. We address this by a combination of techniques: use of neighborhood subgraphs and profiles, joint reduction of the search space, and optimization of the search order. Experimental results on real and synthetic large graphs demonstrate that our graph specific optimizations outperform an SQL-based implementation by orders of magnitude.
Efficiently Answering Reachability Queries on Very Large Directed Graphs
"... Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In particular, graph reachability has attracted a lot of ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Efficiently processing queries against very large graphs is an important research topic largely driven by emerging real world applications, as diverse as XML databases, GIS, web mining, social network analysis, ontologies, and bioinformatics. In particular, graph reachability has attracted a lot of research attention as reachability queries are not only common on graph databases, but they also serve as fundamental operations for many other graph queries. The main idea behind answering reachability queries in graphs is to build indices based on reachability labels. Essentially, each vertex in the graph is assigned with certain labels such that the reachability between any two vertices can be determined by their labels. Several approaches have been proposed for building these reachability labels; among them are interval labeling (tree cover) and 2-hop labeling. However, due to the large number of vertices in many real world graphs (some graphs can easily contain millions of vertices), the computational cost and (index) size of the labels using existing methods would prove too expensive to be practical. In this paper, we introduce a novel graph structure, referred to as pathtree, to help labeling very large graphs. The path-tree cover is a spanning subgraph of G in a tree shape. We demonstrate both analytically and empirically the effectiveness of our new approaches.
Fast computation of reachability labeling for large graphs
- In Proc. of EDBT’06
, 2006
"... There are numerous applications that need to deal with a large graph and need to query reachability between nodes in the graph. A 2-hop cover can compactly represent the whole edge transitive closure of a graph in O(|V | · |E | 1/2) space, and be used to answer reachability query efficiently. Howev ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
There are numerous applications that need to deal with a large graph and need to query reachability between nodes in the graph. A 2-hop cover can compactly represent the whole edge transitive closure of a graph in O(|V | · |E | 1/2) space, and be used to answer reachability query efficiently. However, it is challenging to compute a 2-hop cover. The existing approaches suffer from either large resource consumption or low compression rate. In this paper, we propose a hierarchical partitioning approach to partition a large graph G into two subgraphs repeatedly in a top-down fashion. The unique feature of our approach is that we compute 2-hop cover while partitioning. In brief, in every iteration of top-down partitioning, we provide techniques to compute the 2-hop cover for connections between the two subgraphs first. A cover is computed to cut the graph into two subgraphs, which results in an overall cover with high compression for the entire graph G. Two approaches are proposed, namely a node-oriented approach and an edge-oriented approach. Our approach can efficiently compute 2-hop cover for a large graph with high compression rate. Our extensive experiment study shows that the 2-hop cover for a graph with 1,700,000 nodes and 169 billion connections can be obtained in less than 30 minutes with a compression rate about 40,000 using a PC.
Efficiently Indexing Shortest Paths by Exploiting Symmetry in Graphs
- In EDBT 2009
"... Shortest path queries (SPQ) are essential in many graph analysis and mining tasks. However, answering shortest path queries on-the-fly on large graphs is costly. To online answer shortest path queries, we may materialize and index shortest paths. However, a straightforward index of all shortest path ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Shortest path queries (SPQ) are essential in many graph analysis and mining tasks. However, answering shortest path queries on-the-fly on large graphs is costly. To online answer shortest path queries, we may materialize and index shortest paths. However, a straightforward index of all shortest paths in a graph of N vertices takes O(N 2) space. In this paper, we tackle the problem of indexing shortest paths and online answering shortest path queries. As many large real graphs are shown richly symmetric, the central idea of our approach is to use graph symmetry to reduce the index size while retaining the correctness and the efficiency of shortest path query answering. Technically, we develop a framework to index a large graph at the orbit level instead of the vertex level so that the number of breadth-first
An Optimal Labeling Scheme for Workflow Provenance Using Skeleton Labels
"... We develop a compact and efficient reachability labeling scheme for answering provenance queries on workflow runs that conform to a given specification. Even though a workflow run can be structurally more complex and can be arbitrarily larger than the specification due to fork (parallel) and loop ex ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We develop a compact and efficient reachability labeling scheme for answering provenance queries on workflow runs that conform to a given specification. Even though a workflow run can be structurally more complex and can be arbitrarily larger than the specification due to fork (parallel) and loop executions, we show that a compact reachability labeling for a run can be efficiently computed using the fact that it originates from a fixed specification. Our labeling scheme is optimal in the sense that it uses labels of logarithmic length, runs in linear time, and answers any reachability query in constant time. Our approach is based on using the reachability labeling for the specification as an effective skeleton for designing the reachability labeling for workflow runs. We also demonstrate empirically the effectiveness of our skeleton-based labeling approach.
Fast and accurate estimation of shortest paths in large graphs
- In Proceedings of Conference on Information and Knowledge Management (CIKM
, 2010
"... Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual short ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Computing shortest paths between two given nodes is a fundamental operation over graphs, but known to be nontrivial over large disk-resident instances of graph data. While a numberoftechniquesexistfor answeringreachabilityqueries and approximating node distances efficiently, determining actual shortest paths (i.e. the sequence of nodes involved) is often neglected. However, in applications arising in massive online social networks, biological networks, and knowledge graphs it is often essential to find out many, if not all, shortest paths between two given nodes. In this paper, we address this problem and present a scalable sketch-based index structure that not only supports estimation of node distances, but also computes corresponding shortest paths themselves. Generating the actual path information allows for further improvements to the estimation accuracy of distances (and paths), leading to near-exact shortest-path approximations in real world graphs. We evaluate our techniques – implemented within a fully functional RDF graph database system – over large realworld social and biological networks of sizes ranging from tens of thousand to millions of nodes and edges. Experiments on several datasets show that we can achieve query response times providing several orders of magnitude speedup over traditional path computations while keeping the estimation errors between 0 % and 1 % on average.
Comparing Stars: On Approximating Graph Edit Distance
, 2009
"... Graph data have become ubiquitous and manipulating them based on similarity is essential for many applications. Graph edit distance is one of the most widely accepted measures to determine similarities between graphs and has extensive applications in the fields of pattern recognition, computer visio ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Graph data have become ubiquitous and manipulating them based on similarity is essential for many applications. Graph edit distance is one of the most widely accepted measures to determine similarities between graphs and has extensive applications in the fields of pattern recognition, computer vision etc. Unfortunately, the problem of graph edit distance computation is NP-Hard in general. Accordingly, in this paper we introduce three novel methods to compute the upper and lower bounds for the edit distance between two graphs in polynomial time. Applying these methods, two algorithms AppFull and AppSub are introduced to perform different kinds of graph search on graph databases. Comprehensive experimental studies are conducted on both real and synthetic datasets to examine various aspects of the methods for bounding graph edit distance. Result shows that these methods achieve good scalability in terms of both the number of graphs and the size of graphs. The effectiveness of these algorithms also confirms the usefulness of using our bounds in filtering and searching of graphs.
On Graph Query Optimization in Large Networks
"... The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such large-scale graph-structured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The dramatic proliferation of sophisticated networks has resulted in a growing need for supporting effective querying and mining methods over such large-scale graph-structured data. At the core of many advanced network operations lies a common and critical graph query primitive: how to search graph structures efficiently within a large network? Unfortunately, the graph query is hard due to the NP-complete nature of subgraph isomorphism. It becomes even challenging when the network examined is large and diverse. In this paper, we present a high performance graph indexing mechanism, SPath, to address the graph query problem on large networks. SPath leverages decomposed shortest paths around vertex neighborhood as basic indexing units, which prove to be both effective in graph search space pruning and highly scalable in index construction and deployment. Via SPath, a graph query is processed and optimized beyond the traditional vertex-at-a-time fashion to a more efficient path-at-a-time way: the query is first decomposed to a set of shortest paths, among which a subset of candidates with good selectivity is picked by a query plan optimizer; Candidate paths are further joined together to help recover the query graph to finalize the graph query processing. We evaluate SPath with the state-of-the-art GraphQL on both real and synthetic data sets. Our experimental studies demonstrate the effectiveness and scalability of SPath, which proves to be a more practical and efficient indexing method in addressing graph queries on large networks. 1.
STAR: Steiner Tree Approximation in Relationship-Graphs
, 2008
"... We would like to thank Gerard de Melo for the thorough proof reading of Large-scale graphs and networks are abundant in modern information systems: entity-relationship graphs over relational data or Web-extracted entities, biological networks, social online communities, knowledge bases, and many mor ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We would like to thank Gerard de Melo for the thorough proof reading of Large-scale graphs and networks are abundant in modern information systems: entity-relationship graphs over relational data or Web-extracted entities, biological networks, social online communities, knowledge bases, and many more. Often such data comes with expressive node and edge labels that allow an interpretation as a semantic graph, and edge weights that reflect the strengths of semantic relations between entities. Finding close relationships between a given set of two, three, or more entities is an important building block for many search, ranking, and analysis tasks. From an algorithmic point of view, this translates into computing the best Steiner trees between the given nodes, a classical NP-hard problem. In this paper, we present a new approximation algorithm, coined STAR, for relationship queries over large graphs that do not fit into memory. We prove that for n query entities, STAR yields an O(log(n))-approximation of the optimal Steiner tree, and

