Results 1  10
of
85
Frequent Subgraph Discovery
, 2001
"... Over the years, frequent itemset discovery algorithms have been used to solve various interesting problems. As data mining techniques are being increasingly applied to nontraditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of th ..."
Abstract

Cited by 407 (14 self)
 Add to MetaCart
(Show Context)
Over the years, frequent itemset discovery algorithms have been used to solve various interesting problems. As data mining techniques are being increasingly applied to nontraditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets, is to use a graph to model the database objects. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm for finding all frequent subgraphs in large graph databases. We evaluated the performance of the algorithm by experiments with synthetic datasets as well as a chemical compound dataset. The empirical results show that our algorithm scales linearly with the number of input transactions and it is able to discover frequent subgraphs from a set of graph transactions reasonably fast, even though we have to deal with computationally hard problems such as canonical labeling of graphs and subgraph isomorphism which are not necessary for traditional frequent itemset discovery.
An Aprioribased Algorithm for Mining Frequent Substructures from Graph Data
, 2000
"... This paper proposes a novel approach named AGM to efficiently mine the association rules among the frequently appearing substructures in a given graph data set. A graph transaction is represented by an adjacency matrix, and the frequent patterns appearing in the matrices are mined through the exte ..."
Abstract

Cited by 306 (7 self)
 Add to MetaCart
(Show Context)
This paper proposes a novel approach named AGM to efficiently mine the association rules among the frequently appearing substructures in a given graph data set. A graph transaction is represented by an adjacency matrix, and the frequent patterns appearing in the matrices are mined through the extended algorithm of the basket analysis. Its performance has been evaluated for the artificial simulation data and the carcinogenesis data of Oxford University and NTP. Its high efficiency has been confirmed for the size of a realworld problem. . . .
Finding frequent patterns in a large sparse graph
 SIAM Data Mining Conference
, 2004
"... This paper presents two algorithms based on the horizontal and vertical pattern discovery paradigms that find the connected subgraphs that have a sufficient number of edgedisjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods to determine ..."
Abstract

Cited by 131 (5 self)
 Add to MetaCart
(Show Context)
This paper presents two algorithms based on the horizontal and vertical pattern discovery paradigms that find the connected subgraphs that have a sufficient number of edgedisjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods to determine the number of the edgedisjoint embeddings of a subgraph that are based on approximate and exact maximum independent set computations and use it to prune infrequent subgraphs. Experimental evaluation on real datasets from various domains show that both algorithms achieve good performance, scale well to sparse input graphs with more than 100,000 vertices, and significantly outperform a previously developed algorithm.
An efficient algorithm for discovering frequent subgraphs
 IEEE Transactions on Knowledge and Data Engineering
, 2002
"... Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approach cannot be used. This i ..."
Abstract

Cited by 120 (9 self)
 Add to MetaCart
(Show Context)
Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approach cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the datasets in these domains. An alternate way of modeling the objects in these datasets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is as that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph datasets. We experimentally evaluate the performance of FSG using a variety of real and synthetic datasets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in datasets containing over 200,000 graph transactions and scales linearly with respect to the size of the dataset. Index Terms — Data mining, scientific datasets, frequent pattern discovery, chemical compound datasets.
Complete mining of frequent patterns from graphs: Mining graph data
 MACHINE LEARNING
, 2003
"... Basket Analysis, which is a standard method for data mining, derives frequent itemsets from database. However, its mining ability is limited to transaction data consisting of items. In reality, there are many applications where data are described in a more structural way, e.g. chemical compounds and ..."
Abstract

Cited by 98 (4 self)
 Add to MetaCart
Basket Analysis, which is a standard method for data mining, derives frequent itemsets from database. However, its mining ability is limited to transaction data consisting of items. In reality, there are many applications where data are described in a more structural way, e.g. chemical compounds and Web browsing history. There are a few approaches that can discover characteristic patterns from graphstructured data in the field of machine learning. However, almost all of them are not suitable for such applications that require a complete search for all frequent subgraph patterns in the data. In this paper, we propose a novel principle and its algorithm that derive the characteristic patterns which frequently appear in graphstructured data. Our algorithm can derive all frequent induced subgraphs from both directed and undirected graph structured data having loops (including selfloops) with labeled or unlabeled nodes and links. Its performance is evaluated through the applications to Web browsing pattern analysis and chemical carcinogenesis analysis.
Graph database indexing using structured graph decomposition
 In ICDE
, 2007
"... We introduce a novel method of indexing graph databases in order to facilitate subgraph isomorphism and similarity queries. The index is comprised of two major data structures. The primary structure is a directed acyclic graph which contains a node for each of the unique, induced subgraphs of the da ..."
Abstract

Cited by 56 (5 self)
 Add to MetaCart
(Show Context)
We introduce a novel method of indexing graph databases in order to facilitate subgraph isomorphism and similarity queries. The index is comprised of two major data structures. The primary structure is a directed acyclic graph which contains a node for each of the unique, induced subgraphs of the database graphs. The secondary structure is a hash table which crossindexes each subgraph for fast isomorphic lookup. In order to create a hash key independent of isomorphism, we utilize a codebased canonical representation of adjacency matrices, which we have further refined to improve computation speed. We validate the concept by demonstrating its effectiveness in answering queries for two practical datasets. Our experiments show that for subgraph isomorphism queries, our method outperforms existing methods by more than an order of magnitude. 1.
A novel spectral coding in a large graph database
 In Proceedings of the International Conference on Extending Database Technology
, 2008
"... Retrieving related graphs containing a query graph from a large graph database is a key issue in many graphbased applications, such as drug discovery and structural pattern recognition. Because subgraph isomorphism is a NPcomplete problem [4], we have to employ a filterandverification framework ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
(Show Context)
Retrieving related graphs containing a query graph from a large graph database is a key issue in many graphbased applications, such as drug discovery and structural pattern recognition. Because subgraph isomorphism is a NPcomplete problem [4], we have to employ a filterandverification framework to speed up the search efficiency, that is, using an effective and efficient pruning strategy to filter out the false positives (graphs that are not possible in the results) as many as possible first, then validating the remaining candidates by subgraph isomorphism checking. In this paper, we propose a novel filtering method, a spectral encoding method, i.e. GCoding. Specifically, we assign a signature to each vertex based on its local structures. Then, we generate a spectral graph code by combining all vertex signatures in a graph. Based on spectral graph codes, we derive a necessary condition for subgraph isomorphism. Then we propose two pruning rules for subgraph search problem, and prove that they satisfy the nofalsenegative requirement (no dismissal in answers). Since graph codes are in numerical space, we take this advantage and conduct efficient filtering over graph codes. Extensive experiments show that GCoding outperforms existing counterpart methods. 1.
A Survey of Frequent Subgraph Mining Algorithms
 THE KNOWLEDGE ENGINEERING REVIEW
, 2004
"... Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplica ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Graph mining is an important research area within the domain of data mining. The field of study concentrates on the identification of frequent subgraphs within graph data sets. The research goals are directed at: (i) effective mechanisms for generating candidate subgraphs (without generating duplicates) and (ii) how best to process the generated candidate subgraphs so as to identify the desired frequent subgraphs in a way that is computationally efficient and procedurally effective. This paper presents a survey of current research in the field of frequent subgraph mining, and proposed solutions to address the main research issues.
Polynomial Equivalence Problems: Algorithmic and Theoretical Aspects
 In EUROCRYPT
, 2006
"... Abstract. The Isomorphism of Polynomials (IP) [28], which is the main concern of this paper, originally corresponds to the problem of recovering the secret key of a C ∗ scheme [26]. Besides, the security of various other schemes (signature, authentication [28], traitor tracing [5],...) also depends ..."
Abstract

Cited by 25 (10 self)
 Add to MetaCart
(Show Context)
Abstract. The Isomorphism of Polynomials (IP) [28], which is the main concern of this paper, originally corresponds to the problem of recovering the secret key of a C ∗ scheme [26]. Besides, the security of various other schemes (signature, authentication [28], traitor tracing [5],...) also depends on the practical hardness of IP. Due to its numerous applications, the Isomorphism of Polynomials is thus one of the most fundamental problems in multivariate cryptography. In this paper, we address two complementary aspects of IP, namely its theoretical and practical difficulty. We present an upper bound on the theoretical complexity of “IPlike ” problems, i.e. a problem consisting in recovering a particular transformation between two sets of multivariate polynomials. We prove that these problems are not NPHard (provided that the polynomial hierarchy does not collapse). Concerning the practical aspect, we present a new algorithm for solving IP. In a nutshell, the idea is to generate a suitable algebraic system of equations whose zeroes correspond to a solution of IP. From a practical point of view, we employed a fast Gröbner basis algorithm, namely F5 [17], for solving this system. This approach is efficient in practice and obliges to modify the current security criteria for IP. We have indeed broken several challenges proposed in literature [28, 29,5]. For instance, we solved a challenge proposed by O. Billet and H. Gilbert at Asiacrypt’03 [5] in less than one second.
Improved Algorithms for Isomorphisms of Polynomials
 Advances in Cryptology – EUROCRYPT’98 (Kaisa Nyberg, Ed
, 1998
"... This paper is about the design of improved algorithms to solve Isomorphisms of Polynomials (IP) problems. These problems were first explicitly related to the problem of finding the secret key of some asymmetric cryptographic algorithms (such as Matsumoto and Imai's C # scheme of [13], or some v ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
This paper is about the design of improved algorithms to solve Isomorphisms of Polynomials (IP) problems. These problems were first explicitly related to the problem of finding the secret key of some asymmetric cryptographic algorithms (such as Matsumoto and Imai's C # scheme of [13], or some variations of Patarin's HFE scheme of [15]). Moreover, in [15], it was shown that IP can be used in order to design an asymmetric authentication or signature scheme in a straightforward way. We also introduce the more general Morphisms of Polynomials problem (MP). As we see in this paper, these problems IP and MP have deep links with famous problems such as the Isomorphism of Graphs problem or the problem of fast multiplication of n n matrices. The complexities of our algorithms for IP are still not polynomial, but they are much more e#cient than the previously known algorithms. For example, for the IP problem of finding the two secret matrices of a MatsumotoImai C # scheme over K = F q , the complexity of our algorithms is O(q n/2 ) instead of O(q (n 2 ) ) for previous algorithms. (In [14], the C # scheme was broken, but the secret key was not found). Moreover, we have algorithms to achieve a complexity O(q 3 2 n ) on any system of n quadratic equations with n variables over K = F q (not only equations from C # ). We also show that the problem of deciding whether a polynomial isomorphism exists between two sets of equations is not NPcomplete (assuming the classical hypothesis about ArthurMerlin games), but solving IP is at least as di#cult as the Graph Isomorphism problem (GI) (and perhaps much more di#cult), so that IP ! is unl ikely to be solvable in polynomial time. Moreover, the more general Morphisms of Polynomials problem (MP) is NPhard. Finally, we suggest...