Results 1  10
of
309
CHARM: An efficient algorithm for closed itemset mining
, 2002
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets usin ..."
Abstract

Cited by 317 (14 self)
 Add to MetaCart
(Show Context)
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemsettidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally it uses a fast hashbased approach to remove any “nonclosed” sets found during computation. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM significantly outperforms previous methods. It is also linearly scalable in the number of transactions.
CloSpan: Mining Closed Sequential Patterns in Large Datasets
 In SDM
, 2003
"... Previous sequential pattern mining algorithms mine the full set of frequent subsequences satisfying a rain_sup threshold in a sequence database. However, since a frequent long sequence contains a combinatorial number of frequent subsequences, such mining will generate an explosive number of frequent ..."
Abstract

Cited by 215 (18 self)
 Add to MetaCart
(Show Context)
Previous sequential pattern mining algorithms mine the full set of frequent subsequences satisfying a rain_sup threshold in a sequence database. However, since a frequent long sequence contains a combinatorial number of frequent subsequences, such mining will generate an explosive number of frequent subsequences for long patterns, which is prohibitively expensive in both time and space.
Efficiently Mining Frequent Trees in a Forest
, 2002
"... Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semistructured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees ..."
Abstract

Cited by 209 (6 self)
 Add to MetaCart
(Show Context)
Mining frequent trees is very useful in domains like bioinformatics, web mining, mining semistructured data, and so on. We formulate the problem of mining (embedded) subtrees in a forest of rooted, labeled, and ordered trees. We present TreeMiner, a novel algorithm to discover all frequent subtrees in a forest, using a new data structure called scopelist. We contrast TreeMiner with a pattern matching tree mining algorithm (PatternMatcher). We conduct detailed experiments to test the performance and scalability of these methods. We find that TreeMiner outperforms the pattern matching approach by a factor of 4 to 20, and has good scaleup properties. We also present an application of tree mining to analyze real web logs for usage patterns.
Closet+: searching for the best strategies for mining frequent closed itemsets
, 2003
"... Mining frequent closed itemsets provides complete and nonredundant results for frequent pattern analysis. Extensive studies have proposed various strategies for efficient frequent closed itemset mining, such as depthfirst search vs. breadthfirst search, vertical formats vs. horizontal formats, tree ..."
Abstract

Cited by 183 (20 self)
 Add to MetaCart
Mining frequent closed itemsets provides complete and nonredundant results for frequent pattern analysis. Extensive studies have proposed various strategies for efficient frequent closed itemset mining, such as depthfirst search vs. breadthfirst search, vertical formats vs. horizontal formats, treestructure vs. other data structures, topdown vs. bottomup traversal, pseudo projection vs. physical projection of conditional database, etc. It is the right time to ask “what are the pros and cons of the strategies? ” and “what and how can we pick and integrate the best strategies to achieve higher performance in general cases?” In this study, we answer the above questions by a systematic study of the search strategies and develop a winning algorithm CLOSET+. CLOSET+ integrates the advantages of the previously proposed effective strategies as well as some ones newly developed here. A thorough performance study on synthetic and real data sets has shown the advantages of the strategies and the improvement of CLOSET+ over existing mining algorithms, including CLOSET, CHARM and OP, in terms of runtime, memory usage and scalability.
Efficiently Using Prefixtrees in Mining Frequent Itemsets
, 2003
"... Efficient algorithms for mining frequent itemsets are crucial for mining association rules. Methods for mining frequent itemsets and for iceberg data cube computation have been implemented using a prefixtree structure, known as an FPtree, for storing compressed information about frequent itemsets. ..."
Abstract

Cited by 171 (1 self)
 Add to MetaCart
Efficient algorithms for mining frequent itemsets are crucial for mining association rules. Methods for mining frequent itemsets and for iceberg data cube computation have been implemented using a prefixtree structure, known as an FPtree, for storing compressed information about frequent itemsets. Numerous experimental results have demonstrated that these algorithms perform extremely well. In this paper we present a novel arraybased technique that greatly reduces the need to traverse FPtrees, thus obtaining significantly improved performance for FPtree based algorithms. Our technique works especially well for sparse datasets. Furthermore,
Efficiently mining maximal frequent itemsets
 In ICDM
, 2001
"... We present GenMax, a backtrack search based algorithm for mining maximal frequent itemsets. GenMax uses a number of optimizations to prune the search space. It uses a novel technique called progressive focusing to perform maximality checking, and diffset propagation to perform fast frequency computa ..."
Abstract

Cited by 162 (11 self)
 Add to MetaCart
(Show Context)
We present GenMax, a backtrack search based algorithm for mining maximal frequent itemsets. GenMax uses a number of optimizations to prune the search space. It uses a novel technique called progressive focusing to perform maximality checking, and diffset propagation to perform fast frequency computation. Systematic experimental comparison with previous work indicates that different methods have varying strengths and weaknesses based on dataset characteristics. We found GenMax to be a highly efficient method to mine the exact set of maximal patterns. 1
Fast Vertical Mining Using Diffsets
, 2001
"... A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction i ..."
Abstract

Cited by 154 (5 self)
 Add to MetaCart
A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.
Spin: Mining maximal frequent subgraphs from graph databases
 IN KDD
, 2004
"... One fundamental challenge for mining recurring subgraphs from semistructured data sets is the overwhelming abundance of such patterns. In large graph databases, the total number of frequent subgraphs can become too large to allow a full enumeration using reasonable computational resources. In this ..."
Abstract

Cited by 97 (12 self)
 Add to MetaCart
(Show Context)
One fundamental challenge for mining recurring subgraphs from semistructured data sets is the overwhelming abundance of such patterns. In large graph databases, the total number of frequent subgraphs can become too large to allow a full enumeration using reasonable computational resources. In this paper, we propose a new algorithm that mines only maximal frequent subgraphs, i.e. subgraphs that are not a part of any other frequent subgraphs. This may exponentially decrease the size of the output set in the best case; in our experiments on practical data sets, mining maximal frequent subgraphs reduces the total number of mined patterns by two to three orders of magnitude. Our method first mines all frequent trees from a general graph database and then reconstructs all maximal subgraphs from the mined trees. Using two chemical structure benchmarks and a set of synthetic graph data sets, we demonstrate that, in addition to decreasing the output size, our algorithm can achieve a fivefold speed up over the current stateoftheart subgraph mining algorithms.
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets u ..."
Abstract

Cited by 80 (7 self)
 Add to MetaCart
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemsettidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hashbased approach to remove any "nonclosed" sets found during computation. We also present CHARML, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a stateoftheart algorithm that outperforms previous methods. Further, CHARML explicitly generates the frequent closed itemset lattice.
DualMiner: A DualPruning Algorithm for Itemsets with Constraints
, 2003
"... Recently, constraintbased mining of itemsets for questions like "find all frequent itemsets whose total price is at least $50" has attracted much attention. Two classes of constraints, monotone and antimonotone, have been very useful in this area. There exist algorithms that efficiently t ..."
Abstract

Cited by 79 (1 self)
 Add to MetaCart
Recently, constraintbased mining of itemsets for questions like "find all frequent itemsets whose total price is at least $50" has attracted much attention. Two classes of constraints, monotone and antimonotone, have been very useful in this area. There exist algorithms that efficiently take advantage of either one of these two classes, but no previous algorithms can efficiently handle both types of constraints simultaneously. In this paper, we present DualMiner, the first algorithm that efficiently prunes its search space using both monotone and antimonotone constraints. We complement a theoretical analysis and proof of correctness of DualMiner with an experimental study that shows the efficacy of DualMiner compared to previous work.