Results 1  10
of
20
Taming Verification Hardness: An Efficient Algorithm for Testing Subgraph Isomorphism
"... Graphs are widely used to model complicated data semantics in many applications. In this paper, we aim to develop efficient techniques to retrieve graphs, containing a given query graph, from a large set of graphs. Considering the problem of testing subgraph isomorphism is generally NPhard, most of ..."
Abstract

Cited by 47 (8 self)
 Add to MetaCart
Graphs are widely used to model complicated data semantics in many applications. In this paper, we aim to develop efficient techniques to retrieve graphs, containing a given query graph, from a large set of graphs. Considering the problem of testing subgraph isomorphism is generally NPhard, most of the existing techniques are based on the framework of filteringandverification to reduce the precise computation costs; consequently various novel featurebased indexes have been developed. While the existing techniques work well for small query graphs, the verification phase becomes a bottleneck when the query graph size increases. Motivated by this, in the paper we firstly propose a novel and efficient algorithm for testing subgraph isomorphism, QuickSI. Secondly, we develop a new featurebased index technique to accommodate QuickSI in the filtering phase. Our extensive experiments on real and synthetic data demonstrate the efficiency and scalability of the proposed techniques, which significantly improve the existing techniques. 1.
Frequent Subgraph Mining in Outerplanar Graphs
 PROC. 12TH ACM SIGKDD INT. CONF. ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2006
"... In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases ..."
Abstract

Cited by 39 (7 self)
 Add to MetaCart
(Show Context)
In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of the search space, but have not identified a practically relevant tractable graph class beyond trees. In this paper, we consider the class of outerplanar graphs, a strict generalization of trees, develop a frequent subgraph mining algorithm for outerplanar graphs, and show that it works in incremental polynomial time for the practically relevant subclass of wellbehaved outerplanar graphs, i.e., which have only polynomially many simple cycles. We evaluate the algorithm empirically on chemo and bioinformatics applications.
Mining closed and maximal frequent subtrees from databases of labeled rooted trees
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. Because of the combinatorial explosion, the number of frequen ..."
Abstract

Cited by 38 (1 self)
 Add to MetaCart
Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. Because of the combinatorial explosion, the number of frequent subtrees usually grows exponentially with the size of frequent subtrees and, therefore, mining all frequent subtrees becomes infeasible for large tree sizes. In this paper, we present CMTreeMiner, a computationally efficient algorithm that discovers only closed and maximal frequent subtrees in a database of labeled rooted trees, where the rooted trees can be either ordered or unordered. The algorithm mines both closed and maximal frequent subtrees by traversing an enumeration tree that systematically enumerates all frequent subtrees. Several techniques are proposed to prune the branches of the enumeration tree that do not correspond to closed or maximal frequent subtrees. Heuristic techniques are used to arrange the order of computation so that relatively expensive computation is avoided as much as possible. We study the performance of our algorithm through extensive experiments, using both synthetic data and data sets from real applications. The experimental results show that our algorithm is very efficient in reducing the search space and quickly discovers all closed and maximal frequent subtrees.
A Survey of Frequent Subgraph Mining Algorithms
 THE KNOWLEDGE ENGINEERING REVIEW
, 2004
"... Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplica ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective. This paper presents a survey of current research in the field of frequent subgraph mining, and proposed solutions to address the main research issues.
Fast frequent free tree mining in graph databases
 IN TECHNICAL REPORT OF DEPARTMENT OF SEEM, CUHK, 2006. SIXTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING  WORKSHOPS (ICDMW'06) 0769527027/06 $20.00 © 2006
"... Free tree, as a special graph which is connected, undirected and acyclic, is extensively used in domains such as computational biology, pattern recognition, computer networks, XML databases, etc. In this paper, we present a computationally efficient algorithm F3TM (Fast Frequent Free Tree Mining) to ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
Free tree, as a special graph which is connected, undirected and acyclic, is extensively used in domains such as computational biology, pattern recognition, computer networks, XML databases, etc. In this paper, we present a computationally efficient algorithm F3TM (Fast Frequent Free Tree Mining) to discover all frequent free trees in a graph database. We focus ourselves on how to reduce the cost of candidate generation and minimize the number of candidates being generated. We prove a theorem that the completeness of frequent free trees can be guaranteed by growing vertices from a limited range of vertices in a free tree. Two pruning techniques, automorphismbased pruning and pruning based on canonical mapping are proposed which significantly reduce the cost of candidate generation. We conducted experimental studies on a real application dataset and we show that our F3TM outperforms the uptodate algorithms by an order of magnitude.
Classification of galactograms using fractal properties of the breast ductal network
 Proceedings of the IEEE International Symposium on Biomedical Imaging
, 2006
"... Several types of breast carcinomas tend to spread along the surface of the ductal lumen. Spontaneous nipple discharge can be an early symptom of such cancer development that does not otherwise result in visible mammographic changes. An imaging procedure that can visualize such symptoms is galactogra ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Several types of breast carcinomas tend to spread along the surface of the ductal lumen. Spontaneous nipple discharge can be an early symptom of such cancer development that does not otherwise result in visible mammographic changes. An imaging procedure that can visualize such symptoms is galactography. We focus on characterizing the topology of the ductal network in galactograms based on fractal properties. Statistically significant differences of fractal properties were detected among healthy subjects and patients with reported galactographic findings. We performed receiver operating characteristic (ROC) curve analysis in order to assess the accuracy of using the regularization dimension values for separating among ductal trees. The area under the ROC curve observed was 0.86. 1.
Varro: An Algorithm and Toolkit for Regular Structure Discovery in
"... The Varro toolkit is a system for identifying and counting a major class of regularity in treebanks and annotated natural language data in the form of treestructures: frequently recurring unordered subtrees. This software has been designed for use in linguistics to be maximally applicable to actuall ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The Varro toolkit is a system for identifying and counting a major class of regularity in treebanks and annotated natural language data in the form of treestructures: frequently recurring unordered subtrees. This software has been designed for use in linguistics to be maximally applicable to actually existing treebanks and other stores of treestructurable natural language data. It minimizes memory use so that moderately large treebanks are tractable on commonly available computer hardware. This article introduces condensed canonically ordered trees as a data structure for efficiently discovering frequently recurring unordered subtrees.
Fast Extraction of Maximal Frequent Subtrees Using Bits Representation *
"... With the continuous growth in XML data sources over the Internet, the discovery of useful information from a collection of XML documents is currently one of the main research areas occupying the data mining community. The most commonly adopted approach to this task is to extract frequently occurring ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
With the continuous growth in XML data sources over the Internet, the discovery of useful information from a collection of XML documents is currently one of the main research areas occupying the data mining community. The most commonly adopted approach to this task is to extract frequently occurring subtree patterns from XML trees. But, the number of frequent subtrees usually grows exponentially with the size of trees, and therefore, mining all frequent subtrees becomes infeasible for large size trees. A more practical and scalable alternative is to use maximal frequent subtrees, the number of which is much smaller than that of frequent subtrees. Handling the maximal frequent subtrees is an interesting challenge, though, and represents the core of this paper. We present a novel, conceptually simple, yet effective algorithm, called EXiTB, that significantly simplifies the process of mining maximal frequent subtrees. This is achieved by two distinct features. First, EXiTB represents all of string node labels of trees by some specified length of bits. Through fast bitwise operations, the process of deciding on which paths of trees contain a given node is accelerated. Second, EXiTB avoids timeconsuming tree join operations by using a specially devised data structure called PairSet. To the best of our knowledge, EXiTB is the first algorithm that discovers maximal frequent subtrees adopting bits representation. We also demonstrate the performance of our algorithm through extensive experiments using synthetic datasets which were generated artificially by a randomized treestructure generator.