Results 1  10
of
132
Frequent Subgraph Discovery
, 2001
"... Over the years, frequent itemset discovery algorithms have been used to solve various interesting problems. As data mining techniques are being increasingly applied to nontraditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of th ..."
Abstract

Cited by 406 (10 self)
 Add to MetaCart
(Show Context)
Over the years, frequent itemset discovery algorithms have been used to solve various interesting problems. As data mining techniques are being increasingly applied to nontraditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets, is to use a graph to model the database objects. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm for finding all frequent subgraphs in large graph databases. We evaluated the performance of the algorithm by experiments with synthetic datasets as well as a chemical compound dataset. The empirical results show that our algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though we have to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discovery.
An Aprioribased Algorithm for Mining Frequent Substructures from Graph Data
, 2000
"... This paper proposes a novel approach named AGM to efficiently mine the association rules among the frequently appearing substructures in a given graph data set. A graph transaction is represented by an adjacency matrix, and the frequent patterns appearing in the matrices are mined through the exte ..."
Abstract

Cited by 310 (7 self)
 Add to MetaCart
(Show Context)
This paper proposes a novel approach named AGM to efficiently mine the association rules among the frequently appearing substructures in a given graph data set. A graph transaction is represented by an adjacency matrix, and the frequent patterns appearing in the matrices are mined through the extended algorithm of the basket analysis. Its performance has been evaluated for the artificial simulation data and the carcinogenesis data of Oxford University and NTP. Its high efficiency has been confirmed for the size of a realworld problem. . . .
Efficiently Mining Frequent Trees in a Forest
, 2002
"... Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semistructured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees ..."
Abstract

Cited by 213 (6 self)
 Add to MetaCart
(Show Context)
Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semistructured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scopelist. We contrast TreeMiner with a pattern matching tree mining algorithm (PatternMatcher). We conduct detailed experiments to test the performance and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.
Efficient Mining of Frequent Subgraph in the Presence of Isomorphism
"... Frequent subgraph mining is an active research topic in the data mining community. A graph is a general model to represent data and has been used in many domains like cheminformatics and bioinformatics. Mining patterns from graph databases is challenging since graph related operations, such as subgr ..."
Abstract

Cited by 194 (23 self)
 Add to MetaCart
Frequent subgraph mining is an active research topic in the data mining community. A graph is a general model to represent data and has been used in many domains like cheminformatics and bioinformatics. Mining patterns from graph databases is challenging since graph related operations, such as subgraph testing, generally have higher time complexity than the corresponding operations on itemsets, sequences, and trees, which have been studied extensively. In this paper, we propose a novel frequent subgraph mining algorithm: FFSM, which employs a vertical search scheme within an algebraic graphical framework we have developed to reduce the number of redundant candidates proposed. Our empirical study on synthetic and real datasets demonstrates that FFSM achieves a substantial performance gain over the current startoftheart subgraph mining algorithm gSpan.
A quickstart in frequent structure mining can make a difference
 In Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2004
, 2004
"... Given a database, structure mining algorithms search for substructures that satisfy constraints such as minimum frequency, minimum confidence, minimum interest and maximum frequency. Examples of substructures include graphs, trees and paths. For these substructures many mining algorithms have bee ..."
Abstract

Cited by 159 (5 self)
 Add to MetaCart
(Show Context)
Given a database, structure mining algorithms search for substructures that satisfy constraints such as minimum frequency, minimum confidence, minimum interest and maximum frequency. Examples of substructures include graphs, trees and paths. For these substructures many mining algorithms have been proposed. In order to make graph mining more efficient, we investigate the use of the “quickstart principle”, which is based on the fact that these classes of structures are contained in each other, thus allowing for the development of structure mining algorithms that split the search into steps of increasing complexity. We introduce the GrAph/Sequence/Tree extractiON (Gaston) algorithm that implements this idea by searching first for frequent paths, then frequent free trees and finally cyclic graphs. We investigate two alternatives for computing the frequency of structures and present experimental results to relate these alternatives.
Algorithmics and Applications of Tree and Graph Searching
 In Symposium on Principles of Database Systems
, 2002
"... Modern search engines answer keywordbased queries extremely efficiently. The impressive speed is due to clever inverted index structures, caching, a domainindependent knowledge of strings, and thousands of machines. Several research efforts have attempted to generalize keyword search to keytree an ..."
Abstract

Cited by 146 (8 self)
 Add to MetaCart
(Show Context)
Modern search engines answer keywordbased queries extremely efficiently. The impressive speed is due to clever inverted index structures, caching, a domainindependent knowledge of strings, and thousands of machines. Several research efforts have attempted to generalize keyword search to keytree and keygraph searching, because trees and graphs have many applications in nextgeneration database systems. This paper surveys both algorithms and applications, giving some emphasis to our own work.
Frequent SubStructureBased Approaches for Classifying Chemical Compounds
 In Proceedings of ICDM’03
, 2003
"... In this paper we study the problem of classifying chemical compound datasets. We present a substructurebased classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topologi ..."
Abstract

Cited by 140 (6 self)
 Add to MetaCart
In this paper we study the problem of classifying chemical compound datasets. We present a substructurebased classification algorithm that decouples the substructure discovery process from the classification model construction and uses frequent subgraph discovery algorithms to find all topological and geometric substructures present in the dataset. The advantage of our approach is that during classification model construction, all relevant substructures are available allowing the classifier to intelligently select the most discriminating ones. The computational scalability is ensured by the use of highly efficient frequent subgraph discovery algorithms coupled with aggressive feature selection. Our experimental evaluation on eight different classification problems shows that our approach is computationally scalable and outperforms existing schemes by 10% to 35%, on the average.
Efficient substructure discovery from large semistructured data
, 2002
"... By rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML data [23] has been available on intra and internet. These electronic data are heterogeneous collection of illstructured data that have no rigid structures, and often called semistructu ..."
Abstract

Cited by 139 (12 self)
 Add to MetaCart
(Show Context)
By rapid progress of network and storage technologies, a huge amount of electronic data such as Web pages and XML data [23] has been available on intra and internet. These electronic data are heterogeneous collection of illstructured data that have no rigid structures, and often called semistructured data [1]. Hence, there have been
Graph mining: laws, generators, and algorithms
 ACM COMPUT SURV (CSUR
, 2006
"... How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in ..."
Abstract

Cited by 132 (7 self)
 Add to MetaCart
How does the Web look? How could we tell an abnormal social network from a normal one? These and similar questions are important in many fields where the data can intuitively be cast as a graph; examples range from computer networks to sociology to biology and many more. Indeed, any M: N relation in database terminology can be represented as a graph. A lot of these questions boil down to the following: “How can we generate synthetic but realistic graphs? ” To answer this, we must first understand what patterns are common in realworld graphs and can thus be considered a mark of normality/realism. This survey give an overview of the incredible variety of work that has been done on these problems. One of our main contributions is the integration of points of view from physics, mathematics, sociology, and computer science. Further, we briefly describe recent advances on some related and interesting graph problems.
Finding frequent patterns in a large sparse graph
 SIAM Data Mining Conference
, 2004
"... This paper presents two algorithms based on the horizontal and vertical pattern discovery paradigms that find the connected subgraphs that have a sufficient number of edgedisjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods to determine ..."
Abstract

Cited by 130 (4 self)
 Add to MetaCart
(Show Context)
This paper presents two algorithms based on the horizontal and vertical pattern discovery paradigms that find the connected subgraphs that have a sufficient number of edgedisjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods to determine the number of the edgedisjoint embeddings of a subgraph that are based on approximate and exact maximum independent set computations and use it to prune infrequent subgraphs. Experimental evaluation on real datasets from various domains show that both algorithms achieve good performance, scale well to sparse input graphs with more than 100,000 vertices, and significantly outperform a previously developed algorithm.