Results 1 - 10
of
16
An efficient algorithm for enumerating closed patterns in transaction databases
- In Proc. DS’04, LNAI 3245
, 2004
"... Abstract: The class of closed patterns is a well known condensed representations of frequent patterns, and have recently attracted considerable interest. In this paper, we propose an efficient algorithm LCM (Linear time Closed pattern Miner) for mining frequent closed patterns from large transaction ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
Abstract: The class of closed patterns is a well known condensed representations of frequent patterns, and have recently attracted considerable interest. In this paper, we propose an efficient algorithm LCM (Linear time Closed pattern Miner) for mining frequent closed patterns from large transaction databases. The main theoretical contribution is our proposed prefix-preserving closure extension of closed patterns, which enables us to search all frequent closed patterns in a depth-first manner, in linear time for the number of frequent closed patterns. Our algorithm do not need any storage space for the previously obtained patterns, while the existing algorithms needs it. Performance comparisons of LCM with straightforward algorithms demonstrate the advantages of our prefix-preserving closure extension. 1
HybridTreeMiner: An Efficient Algorithm for Mining Frequent Rooted Trees and Free Trees Using Canonical Forms
, 2004
"... Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we present HybridTreeMiner, a computationally efficient algorithm that discovers all frequently occurring subtrees in a database of rooted u ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we present HybridTreeMiner, a computationally efficient algorithm that discovers all frequently occurring subtrees in a database of rooted unordered trees. The algorithm mines frequent subtrees by traversing an enumeration tree that systematically enumerates all subtrees. The enumeration tree is defined based on a novel canonical form for rooted unordered trees -- the breadth-first canonical form (BFCF). By extending the definitions of our canonical form and enumeration tree to free trees, our algorithm can efficiently handle databases of free trees as well. We study the performance of our algorithms through extensive experiments based on both synthetic data and datasets from real applications. The experiments show that our algorithm is competitive in comparison to known rooted tree mining algorithms and is faster by one to two orders of magnitudes compared to a known algorithm for mining frequent free trees.
CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees
- In The Eighth Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD’04
, 2003
"... Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. However, because of the combinatorial explosion, the numbe ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. One important problem in mining databases of trees is to find frequently occurring subtrees. However, because of the combinatorial explosion, the number of frequent subtrees usually grows exponentially with the size of the subtrees. In this paper, we present CMTreeMiner, a computationally e#cient algorithm that discovers all closed and maximal frequent subtrees in a database of rooted unordered trees. The algorithm mines both closed and maximal frequent subtrees by traversing an enumeration tree that systematically enumerates all subtrees, while using an enumeration DAG to prune the branches of the enumeration tree that do not correspond to closed or maximal frequent subtrees. The enumeration tree and the enumeration DAG are defined based on a canonical form for rooted unordered trees--the depth-first canonical form (DFCF).
An output-polynomial time algorithm for mining frequent closed attribute trees
- In Proc. ILP’05, LNAI 3625, 1–19
, 2005
"... Abstract. Frequent closed pattern discovery is one of the most important topics in the studies of the compact representation for data mining. In this paper, we consider the frequent closed pattern discovery problem for a class of structured data, called attribute trees (AT), which is a subclass of l ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
Abstract. Frequent closed pattern discovery is one of the most important topics in the studies of the compact representation for data mining. In this paper, we consider the frequent closed pattern discovery problem for a class of structured data, called attribute trees (AT), which is a subclass of labeled ordered trees and can be also regarded as a fragment of description logic with functional roles only. We present an efficient algorithm for discovering all frequent closed patterns appearing in a given collection of attribute trees. By using a new enumeration method, called the prefix-preserving closure extension, which enable efficient depth-first search over all closed patterns without duplicates, we show that this algorithm works in polynomial time both in the total size of the input database and the number of output trees generated by the algorithm. To our knowledge, this is one of the first result for output-sensitive algorithms for frequent closed substructure disocvery from trees and graphs.
The Design and Evolution of Fiducials for the reacTIVision System
- Proc. 3rd International Conference on Generative Systems in the Electronic Arts
, 2005
"... The reacTIVision system is software for tracking specially designed fiducials (markers) in a real-time video stream. ReacTIVision was designed to enable expressive gestural control of musical sound, and can track many markers at a high frame rate. The development of reacTIVision involved not only co ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
The reacTIVision system is software for tracking specially designed fiducials (markers) in a real-time video stream. ReacTIVision was designed to enable expressive gestural control of musical sound, and can track many markers at a high frame rate. The development of reacTIVision involved not only computer vision algorithms, but also the design of a new marker system. We co-designed the computer vision system and markers, applying evolutionary computation to minimise marker size while meeting geometric constraints required to efficiently compute the location and 2D orientation of the markers. The computer vision techniques adopted would not have been practical had the genetic algorithms not been able to solve these constraints. This paper focuses on the design and evolution of the markers, the computer vision aspects of reacTIVision are documented elsewhere. 1.
Canonical Forms for Labeled Trees and Their Applications in Frequent Subtree Mining
, 2004
"... Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we first present two canonical forms for labeled rooted unordered trees--the breadth-first canonical form (BFCF) and the depth-first canonic ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we first present two canonical forms for labeled rooted unordered trees--the breadth-first canonical form (BFCF) and the depth-first canonical form (DFCF). Then the canonical forms are applied to the frequent subtree mining problem. Based on the BFCF, we develop a vertical mining algorithm, RootedTreeMiner, to discover all frequently occurring subtrees in a database of labeled rooted unordered trees. The RootedTreeMiner algorithm uses an enumeration tree to enumerate all (frequent) labeled rooted unordered subtrees. Next, we extend the definition of the DFCF to labeled free trees and present an Apriori-like algorithm, FreeTreeMiner, to discover all frequently occurring subtrees in a database of labeled free trees. Finally, we study the performance and the scalability of our algorithms through extensive experiments based on both synthetic data and datasets from real applications.
Mining Frequent Rooted Trees and Free Trees Using Canonical Forms
, 2003
"... Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we present HybridTreeMiner, a computationally e#cient algorithm that discovers all frequently occurring subtrees in a database of rooted uno ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we present HybridTreeMiner, a computationally e#cient algorithm that discovers all frequently occurring subtrees in a database of rooted unordered trees. The algorithm mines frequent subtrees by traversing an enumeration tree that systematically enumerates all subtrees. The enumeration tree is defined based on a novel canonical form for rooted unordered trees-- the breadth-first canonical form (BFCF). By extending the definitions of our canonical form and enumeration tree to free trees, our algorithm can e#ciently handle databases of free trees as well. We study the performance of our algorithms through extensive experiments based on both synthetic data and datasets from real applications. The experiments show that our algorithm is competitive in comparison to known rooted tree mining algorithms and is faster by one to two orders of magnitudes compared to known algorithms for mining frequent free trees.
Mining frequent closed unordered trees through natural representations
- Proceedings of ICCS 2007, 15th International Conference on Conceptual Structures
, 2007
"... Abstract. Many knowledge representation mechanisms consist of linkbased structures; they may be studied formally by means of unordered trees. Here we consider the case where labels on the nodes are nonexistent or unreliable, and propose data mining processes focusing on just the link structure. We p ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
Abstract. Many knowledge representation mechanisms consist of linkbased structures; they may be studied formally by means of unordered trees. Here we consider the case where labels on the nodes are nonexistent or unreliable, and propose data mining processes focusing on just the link structure. We propose a representation of ordered trees, describe a combinatorial characterization and some properties, and use them to propose an efficient algorithm for mining frequent closed subtrees from a set of input trees. Then we focus on unordered trees, and show that intrinsic characterizations of our representation provide for a way of avoiding the repeated exploration of unordered trees, and then we give an efficient algorithm for mining frequent closed unordered trees. 1
Mining XML-Enabled Association Rule with Templates
- In Proceedings of KDID04
, 2004
"... Abstract. XML-enabled association rule framework [8] extends the notion of associated items to XML fragments to present associations among trees rather than simple-structured items of atomic values. They are more flexible and powerful in representing both simple and complex structured association re ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Abstract. XML-enabled association rule framework [8] extends the notion of associated items to XML fragments to present associations among trees rather than simple-structured items of atomic values. They are more flexible and powerful in representing both simple and complex structured association relationships inherent in XML data. Compared with traditional association mining in the well-structured world, mining from XML data, however, is confronted with more challenges due to the inherent flexibilities of XML in both structure and semantics. The primary challenges include 1) a more complicated hierarchical data structure; 2) an ordered data context; and 3) a much bigger data size. In order to make XML-enabled association rule mining truly practical and computationally tractable, in this study, we present a template model to help users specify the interesting XML-enabled associations to be mined. Techniques for template-guided mining of association rules from large XML data are also described in the paper. We demonstrate the effectiveness of these techniques through a set of experiments on both synthetic and real-life data. 1
W3-Miner: Mining Weighted Frequent Subtree Patterns in a Collection of Trees
- In Proceedings of the Second International Conference on Pattern Analysis
"... Abstract—Mining frequent tree patterns have many useful applications in XML mining, bioinformatics, network routing, etc. Most of the frequent subtree mining algorithms (i.e. FREQT, TreeMiner and CMTreeMiner) use anti-monotone property in the phase of candidate subtree generation. However, none of t ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract—Mining frequent tree patterns have many useful applications in XML mining, bioinformatics, network routing, etc. Most of the frequent subtree mining algorithms (i.e. FREQT, TreeMiner and CMTreeMiner) use anti-monotone property in the phase of candidate subtree generation. However, none of these algorithms have verified the correctness of this property in tree structured data. In this research it is shown that anti-monotonicity does not generally hold, when using weighed support in tree pattern discovery. As a result, tree mining algorithms that are based on this property would probably miss some of the valid frequent subtree patterns in a collection of trees. In this paper, we investigate the correctness of anti-monotone property for the problem of weighted frequent subtree mining. In addition we propose W3-Miner, a new algorithm for full extraction of frequent subtrees. The experimental results confirm that W3-Miner finds some frequent subtrees that the previously proposed algorithms are not able to discover.

