Results 1 - 10
of
414
Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
- DATA MINING AND KNOWLEDGE DISCOVERY
, 2004
"... Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still co ..."
Abstract
-
Cited by 1700 (64 self)
- Add to MetaCart
Mining frequent patterns in transaction databases, time-series databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriori-like candidate set generation-and-test approach. However, candidate set generation is still costly, especially when there exist a large number of patterns and/or long patterns. In this study, we propose a novel
frequent-pattern tree
(FP-tree) structure, which is an extended prefix-tree
structure for storing compressed, crucial information about frequent patterns, and develop an efficient FP-tree-
based mining method, FP-growth, for mining the complete set of frequent patterns by pattern fragment growth.
Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a condensed,
smaller data structure, FP-tree which avoids costly, repeated database scans, (2) our FP-tree-based mining adopts
a pattern-fragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a
partitioning-based, divide-and-conquer method is used to decompose the mining task into a set of smaller tasks for
mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance
study shows that the FP-growth method is efficient and scalable for mining both long and short frequent patterns,
and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported
new frequent-pattern mining methods
CHARM: An efficient algorithm for closed itemset mining
, 2002
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets usin ..."
Abstract
-
Cited by 317 (14 self)
- Add to MetaCart
(Show Context)
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemset-tidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally it uses a fast hash-based approach to remove any “non-closed” sets found during computation. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM significantly outperforms previous methods. It is also linearly scalable in the number of transactions.
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
, 2000
"... Association mining may often derive an undesirably large set of frequent itemsets and association rules. Recent studies have proposed an interesting alternative: mining frequent closed itemsets and their corresponding rules, which has the same power as association mining but substantially reduces th ..."
Abstract
-
Cited by 313 (29 self)
- Add to MetaCart
(Show Context)
Association mining may often derive an undesirably large set of frequent itemsets and association rules. Recent studies have proposed an interesting alternative: mining frequent closed itemsets and their corresponding rules, which has the same power as association mining but substantially reduces the number of rules to be presented. In this paper, we propose an efficient algorithm, CLOSET, for mining closed itemsets, with the development of three techniques: (1) applying a compressed, frequent pattern tree FP-tree structure for mining closed itemsets without candidate generation, (2) developing a single prefix path compression technique to identify frequent closed itemsets quickly, and (3) exploring a partition-based projection mechanism for scalable mining in large databases. Our performance study shows that CLOSET is efficient and scalable over large databases, and is faster than the previously proposed methods. 1 Introduction It has been well recognized that frequent pattern minin...
MAFIA: A maximal frequent itemset algorithm for transactional databases
- In ICDE
, 2001
"... We present a new algorithm for mining maximal frequent itemsets from a transactional database. Our algorithm is especially efficient when the itemsets in the database are very long. The search strategy of our algorithm integrates a depth-first traversal of the itemset lattice with effective pruning ..."
Abstract
-
Cited by 312 (3 self)
- Add to MetaCart
(Show Context)
We present a new algorithm for mining maximal frequent itemsets from a transactional database. Our algorithm is especially efficient when the itemsets in the database are very long. The search strategy of our algorithm integrates a depth-first traversal of the itemset lattice with effective pruning mechanisms. Our implementation of the search strategy combines a vertical bitmap representation of the database with an efficient relative bitmap compression schema. In a thorough experimental analysis of our algorithm on real data, we isolate the effect of the individual components of the algorithm. Our performance numbers show that our algorithm outperforms previous work by a factor of three to five. 1
CloSpan: Mining Closed Sequential Patterns in Large Datasets
- In SDM
, 2003
"... Previous sequential pattern mining algorithms mine the full set of frequent subsequences satisfying a rain_sup threshold in a sequence database. However, since a frequent long sequence contains a combinatorial number of frequent subsequences, such mining will generate an explosive number of frequent ..."
Abstract
-
Cited by 215 (18 self)
- Add to MetaCart
(Show Context)
Previous sequential pattern mining algorithms mine the full set of frequent subsequences satisfying a rain_sup threshold in a sequence database. However, since a frequent long sequence contains a combinatorial number of frequent subsequences, such mining will generate an explosive number of frequent subsequences for long patterns, which is prohibitively expensive in both time and space.
Mining sequential patterns by pattern-growth: The PrefixSpan approach
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2004
"... Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining ..."
Abstract
-
Cited by 194 (10 self)
- Add to MetaCart
Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generation-and-test approach [1] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projection-based, sequential pattern-growth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growth-based sequential pattern mining, FreeSpan [8], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a priori-based algorithm GSP, FreeSpan, and SPADE [29] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with user-specified constraints. The high promise of the pattern-growth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.
Closet+: searching for the best strategies for mining frequent closed itemsets
, 2003
"... Mining frequent closed itemsets provides complete and nonredundant results for frequent pattern analysis. Extensive studies have proposed various strategies for efficient frequent closed itemset mining, such as depth-first search vs. breadthfirst search, vertical formats vs. horizontal formats, tree ..."
Abstract
-
Cited by 183 (20 self)
- Add to MetaCart
Mining frequent closed itemsets provides complete and nonredundant results for frequent pattern analysis. Extensive studies have proposed various strategies for efficient frequent closed itemset mining, such as depth-first search vs. breadthfirst search, vertical formats vs. horizontal formats, treestructure vs. other data structures, top-down vs. bottomup traversal, pseudo projection vs. physical projection of conditional database, etc. It is the right time to ask “what are the pros and cons of the strategies? ” and “what and how can we pick and integrate the best strategies to achieve higher performance in general cases?” In this study, we answer the above questions by a systematic study of the search strategies and develop a winning algorithm CLOSET+. CLOSET+ integrates the advantages of the previously proposed effective strategies as well as some ones newly developed here. A thorough performance study on synthetic and real data sets has shown the advantages of the strategies and the improvement of CLOSET+ over existing mining algorithms, including CLOSET, CHARM and OP, in terms of runtime, memory usage and scalability.
Fast Vertical Mining Using Diffsets
, 2001
"... A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction i ..."
Abstract
-
Cited by 154 (5 self)
- Add to MetaCart
(Show Context)
A number of vertical mining algorithms have been proposed recently for association mining, which have shown to be very effective and usually outperform horizontal approaches. The main advantage of the vertical format is support for fast frequency counting via intersection operations on transaction ids (tids) and automatic pruning of irrelevant data. The main problem with these approaches is when intermediate results of vertical tid lists become too large for memory, thus affecting the algorithm scalability.
BIDE: Efficient Mining of Frequent Closed Sequences
"... Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, most of the previously developed ..."
Abstract
-
Cited by 150 (12 self)
- Add to MetaCart
Previous studies have presented convincing arguments that a frequent pattern mining algorithm should not mine all frequent patterns but only the closed ones because the latter leads to not only more compact yet complete result set but also better efficiency. However, most of the previously developed closed pattern mining algorithms work under the candidate maintenance-and-test paradigm which is inherently costly in both runtime and space usage when the support threshold is low or the patterns become long.
Algorithms for Association Rule Mining -- A General Survey and Comparison
, 2000
"... Today there are several efficient algorithms that cope with the popular and computationally expensive task of association rule mining. Actually, these algorithms are more or less described on their own. In this paper we explain the fundamentals of association rule mining and moreover derive a genera ..."
Abstract
-
Cited by 150 (5 self)
- Add to MetaCart
Today there are several efficient algorithms that cope with the popular and computationally expensive task of association rule mining. Actually, these algorithms are more or less described on their own. In this paper we explain the fundamentals of association rule mining and moreover derive a general framework. Based on this we describe today's approaches in context by pointing out common aspects and differences. After that we thoroughly investigate their strengths and weaknesses and carry out several runtime experiments. It turns out that the runtime behavior of the algorithms is much more similar as to be expected.