Results 1  10
of
34
Taming Verification Hardness: An Efficient Algorithm for Testing Subgraph Isomorphism
"... Graphs are widely used to model complicated data semantics in many applications. In this paper, we aim to develop efficient techniques to retrieve graphs, containing a given query graph, from a large set of graphs. Considering the problem of testing subgraph isomorphism is generally NPhard, most of ..."
Abstract

Cited by 50 (9 self)
 Add to MetaCart
(Show Context)
Graphs are widely used to model complicated data semantics in many applications. In this paper, we aim to develop efficient techniques to retrieve graphs, containing a given query graph, from a large set of graphs. Considering the problem of testing subgraph isomorphism is generally NPhard, most of the existing techniques are based on the framework of filteringandverification to reduce the precise computation costs; consequently various novel featurebased indexes have been developed. While the existing techniques work well for small query graphs, the verification phase becomes a bottleneck when the query graph size increases. Motivated by this, in the paper we firstly propose a novel and efficient algorithm for testing subgraph isomorphism, QuickSI. Secondly, we develop a new featurebased index technique to accommodate QuickSI in the filtering phase. Our extensive experiments on real and synthetic data demonstrate the efficiency and scalability of the proposed techniques, which significantly improve the existing techniques. 1.
Neighborhood based fast graph search in large networks
 in SIGMOD
, 2011
"... Complex social and information network search becomes important with a variety of applications. In the core of these applications, lies a common and critical problem: Given a labeled network and a query graph, how to efficiently search the query graph in the target network. The presence of noise a ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
Complex social and information network search becomes important with a variety of applications. In the core of these applications, lies a common and critical problem: Given a labeled network and a query graph, how to efficiently search the query graph in the target network. The presence of noise and the incomplete knowledge about the structure and content of the target network make it unrealistic to find an exact match. Rather, it is more appealing to find the topk approximate matches. In this paper, we propose a neighborhoodbased similarity measure that could avoid costly graph isomorphism and edit distance computation. Under this new measure, we prove that subgraph similarity search is NP hard, while graph similarity match is polynomial. By studying the principles behind this measure, we found an information propagation model that is able to convert a large net
GADDI: Distance index based subgraph matching in biological networks
 In Proceedings of the 12th international conference on extending database technology (EDBT’09
, 2009
"... Currently, a huge amount of biological data can be naturally represented by graphs, e.g., protein interaction networks, gene regulatory networks, etc. The need for indexing large graphs is an urgent research problem of great practical importance. The main challenge is size. Each graph may contain ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
(Show Context)
Currently, a huge amount of biological data can be naturally represented by graphs, e.g., protein interaction networks, gene regulatory networks, etc. The need for indexing large graphs is an urgent research problem of great practical importance. The main challenge is size. Each graph may contain thousands (or more) vertices. Most of the previous work focuses on indexing a set of small or medium sized database graphs (with only tens of vertices) and finding whether a query graph occurs in any of these. In this paper, we are interested in finding all the matches of a query graph in a given large graph of thousands of vertices, which is a very important task in many biological applications. This increases the complexity significantly. We propose a novel distance measurement which reintroduces the idea of frequent substructures in a single large graph. We devise the novel structure distance based approach (GADDI) to efficiently find matches of the query graph. GADDI is further optimized by the use of a dynamic matching scheme to minimize redundant calculations. Last but not least, a number of real and synthetic data sets are used to evaluate the efficiency and scalability of our proposed method. 1.
A novel approach for efficient supergraph query processing on graph databases
 In EDBT
"... In recent years, large amount of data modeled by graphs, namely graph data, have been collected in various domains. Efficiently processing queries on graph databases has attracted a lot of research attentions. Supergraph query is a kind of new and important queries in practice. A supergraph query, q ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
In recent years, large amount of data modeled by graphs, namely graph data, have been collected in various domains. Efficiently processing queries on graph databases has attracted a lot of research attentions. Supergraph query is a kind of new and important queries in practice. A supergraph query, q, on a graph database D is to retrieve all graphs in D such that q is a supergraph of them. Because the number of graphs in databases is large and subgraph isomorphism testing is NPcomplete, efficiently processing such queries is a big challenge. This paper first proposes an optimal compact method for organizing graph databases. Common subgraphs of the graphs in a database are stored only once in the compact organization of the database, in order to reduce the overall cost of subgraph isomorphism testings from stored graphs to queries during query processing. Then, an exact algorithm and an approximate algorithm for generating significant feature set with optimal order are proposed to construct indices on graph databases. The optimal order on the feature set is to reduce the number of subgraph isomorphism testings during query processing. Based on the compact organization of graph databases, a novel algorithm of testing subgraph isomorphisms from multiple graphs to one graph is presented. Finally, based on all these techniques, a query processing method is proposed. Analytical and experimental results show that the proposed algorithms outperform the existing similar algorithms by one to two orders of magnitude. 1.
Connected Substructure Similarity Search
"... Substructure similarity search is to retrieve graphs that approximately contain a given query graph. It has many applications, e.g., detecting similar functions among chemical compounds. The problem is challenging as even testing subgraph containment between two graphs is NPcomplete. Hence, existin ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Substructure similarity search is to retrieve graphs that approximately contain a given query graph. It has many applications, e.g., detecting similar functions among chemical compounds. The problem is challenging as even testing subgraph containment between two graphs is NPcomplete. Hence, existing techniques adopt the filteringandverification framework with the focus on developing effective and efficient techniques to remove nonpromising graphs. Nevertheless, existing filtering techniques may be still unable to effectively remove many “low ” quality candidates. To resolve this, in this paper we propose a novel indexing technique, GrafDIndex, to index graphs according to their “distances ” to features. We characterize a tight condition under which the distancebased triangular inequality holds. We then develop lower and upper bounding techniques that exploit the GrafDIndex to (1) prune nonpromising graphs and (2) include graphs whose similarities are guaranteed to exceed the given similarity threshold. Considering that the verification phase is not well studied and plays the dominant role in the whole process, we devise efficient algorithms to verify candidates. A comprehensive experiment using real datasets demonstrates that our proposed methods significantly outperform existing methods.
Finding TopK Similar Graphs in Graph Databases
"... Querying similar graphs in graph databases has been widely studied in graph query processing in recent years. Existing works mainly focus on subgraph similarity search and supergraph similarity search. In this paper, we study the problem of finding topk graphs in a graph database that are most simi ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
Querying similar graphs in graph databases has been widely studied in graph query processing in recent years. Existing works mainly focus on subgraph similarity search and supergraph similarity search. In this paper, we study the problem of finding topk graphs in a graph database that are most similar to a query graph. This problem has many applications, such as image retrieval and chemical compound structure search. Regarding the similarity measure, feature based and kernel based similarity measures have been used in the literature. But such measures are rough and may lose the connectivity information among substructures. In this paper, we introduce a new similarity measure based on the maximum common subgraph (MCS) of two graphs. We show that this measure can better capture the common and different structures of two graphs. Since computing the MCS of two graphs is NPhard, we propose an algorithm to answer the topk graph similarity query using two distance lower bounds with different computational costs, in order to reduce the number of MCS computations. We further introduce an indexing technique, which can better make use of the triangle property of similarities among graphs in the database to get tighter lower bounds. Three different indexing methods are proposed with different tradeoffs between pruning power and construction cost. We conducted extensive performance studies on large real datasets to evaluate the performance of our approaches.
An Indepth Comparison of Subgraph Isomorphism Algorithms in Graph Databases
"... Finding subgraph isomorphisms is an important problem in many applications which deal with data modeled as graphs. While this problem is NPhard, in recent years, many algorithms have been proposed to solve it in a reasonable time for real datasets using different join orders, pruning rules, and aux ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Finding subgraph isomorphisms is an important problem in many applications which deal with data modeled as graphs. While this problem is NPhard, in recent years, many algorithms have been proposed to solve it in a reasonable time for real datasets using different join orders, pruning rules, and auxiliary neighborhood information. However, since they have not been empirically compared one another in most research work, it is not clear whether the later work outperforms the earlier work. Another problem is that reported comparisons were often done using the original authors ’ binaries which were written in different programming environments. In this paper, we address these serious problems by reimplementing five stateoftheart subgraph isomorphism algorithms in a common code base and by comparing them using many realworld datasets and their query loads. Through our indepth analysis of experimental results, we report surprising empirical findings. 1.
Similarity Search on Supergraph Containment
"... Abstract — A supergraph containment search is to retrieve the data graphs contained by a query graph. In this paper, we study the problem of efficiently retrieving all data graphs approximately contained by a query graph, namely similarity search on supergraph containment. We propose a novel and eff ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract — A supergraph containment search is to retrieve the data graphs contained by a query graph. In this paper, we study the problem of efficiently retrieving all data graphs approximately contained by a query graph, namely similarity search on supergraph containment. We propose a novel and efficient index to boost the efficiency of query processing. We have studied the query processing cost and propose two index construction strategies aimed at optimizing the performance of different types of data graphs: topdown strategy and bottomup strategy. Moreover, a novel indexing technique is proposed by effectively merging the indexes of individual data graphs; this not only reduces the index size but also further reduces the query processing time. We conduct extensive experiments on real data sets to demonstrate the efficiency and the effectiveness of our techniques. I.
GBLENDER: Towards blending visual query formulation and query processing in graph databases
 In SIGMOD
, 2010
"... Given a graph database D and a query graph g, an exact subgraph matching query asks for the set S of graphs in D that contain g as a subgraph. This type of queries find important applications in several domains such as bioinformatics and chemoinformatics, where users are generally not familiar with ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Given a graph database D and a query graph g, an exact subgraph matching query asks for the set S of graphs in D that contain g as a subgraph. This type of queries find important applications in several domains such as bioinformatics and chemoinformatics, where users are generally not familiar with complex graph query languages. Consequently, userfriendly visual interfaces which support query graph construction can reduce the burden of data retrieval for these users. Existing techniques for subgraph matching queries built on top of such visual framework are designed to optimize the time required in retrieving the result set S fromD, assuming that the whole query graph has been constructed. This leads to suboptimal system response time as the query processing is initiated only after the user has finished drawing the query graph. In this paper, we take the first step towards exploring a novel graph query processing paradigm, where instead of processing a query graph after its construction, it interleaves visual query construction and processing to improve system response time. To realize this, we present an algorithm called GBLENDER that prunes false results and prefetches partial query results by exploiting the latency offered by the visual query formulation. It employs a novel actionaware indexing scheme that exploits users ’ interaction characteristics with visual interfaces to support efficient retrieval. Extensive experiments on both real and synthetic datasets demonstrate the effectiveness and efficiency of our solution.
A Latticebased Graph Index for Subgraph Search
"... Given a query graph q, a “subgraphsearch ” algorithm retrieves from a graph database D all graphs that have q as a subgraph, D(q). Subgraph search is costly because of its involvement of a subgraphisomorphism test, which is a NPcomplete problem. Graph indexes are used to improve the algorithm effi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Given a query graph q, a “subgraphsearch ” algorithm retrieves from a graph database D all graphs that have q as a subgraph, D(q). Subgraph search is costly because of its involvement of a subgraphisomorphism test, which is a NPcomplete problem. Graph indexes are used to improve the algorithm efficiency by first filtering out a set of false answers and then verifying each graph that has passed the filtration with subgraph isomorphism tests. Many substructure features have been proposed to build the index aiming at improving the filtering power of the index. In this paper we improve the filtering power and query processing time by design of the index structure. We propose a lattice like index, Lindex, which is generally applicable on all graph features. Lindex achieves a high filtering rate by organizing index subgraphs in a graph lattice and adopting a specific design of value sets. Besides finding the candidate set C(q) after filtering, Lindex can also find a set of true answers T r(q) without involving subgraph isomorphism tests. Accordingly, only candidate graphs in C(q) − T r(q) need to be verified. Our experiments show that Lindex outperforms other cuttingedge indexes on both frequent subgraph and infrequent subgraph queries. 1.