Results 1  10
of
209
Mining Frequent Patterns without Candidate Generation: A FrequentPattern Tree Approach
 DATA MINING AND KNOWLEDGE DISCOVERY
, 2004
"... Mining frequent patterns in transaction databases, timeseries databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriorilike candidate set generationandtest approach. However, candidate set generation is still co ..."
Abstract

Cited by 1145 (56 self)
 Add to MetaCart
Mining frequent patterns in transaction databases, timeseries databases, and many other kinds of databases has been studied popularly in data mining research. Most of the previous studies adopt an Apriorilike candidate set generationandtest approach. However, candidate set generation is still costly, especially when there exist a large number of patterns and/or long patterns. In this study, we propose a novel
frequentpattern tree
(FPtree) structure, which is an extended prefixtree
structure for storing compressed, crucial information about frequent patterns, and develop an efficient FPtree
based mining method, FPgrowth, for mining the complete set of frequent patterns by pattern fragment growth.
Efficiency of mining is achieved with three techniques: (1) a large database is compressed into a condensed,
smaller data structure, FPtree which avoids costly, repeated database scans, (2) our FPtreebased mining adopts
a patternfragment growth method to avoid the costly generation of a large number of candidate sets, and (3) a
partitioningbased, divideandconquer method is used to decompose the mining task into a set of smaller tasks for
mining confined patterns in conditional databases, which dramatically reduces the search space. Our performance
study shows that the FPgrowth method is efficient and scalable for mining both long and short frequent patterns,
and is about an order of magnitude faster than the Apriori algorithm and also faster than some recently reported
new frequentpattern mining methods
SPIRIT: Sequential Pattern Mining with Regular Expression Constraints”. Bell Labs Tech. Memorandum BL011237099022303TM
, 1999
"... Discovering sequential patterns is an important problem in data mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional mining systems provide users with only a very restricted mechanism (based on minimum support) for specifying patterns ..."
Abstract

Cited by 158 (2 self)
 Add to MetaCart
Discovering sequential patterns is an important problem in data mining with a host of application domains including medicine, telecommunications, and the World Wide Web. Conventional mining systems provide users with only a very restricted mechanism (based on minimum support) for specifying patterns of interest. In this paper, we propose the use of Regular Expressions (REs) as a flexible constraint specification tool that enables usercontrolled focus to be incorporated into the pattern mining process. We develop a family of novel algorithms (termed SPIRIT – Sequential Pattern mIning with Regular expressIon consTraints) for mining frequent sequential patterns that also satisfy userspecified RE constraints. The main distinguishing factor among the proposed schemes is the degree to which the RE constraints are enforced to prune the search space of patterns during computation. Our solutions provide valuable insights into the tradeoffs that arise when constraints that do not subscribe to nice properties (like antimonotonicity) are integrated into the mining process. A quantitative exploration of these tradeoffs is conducted through an extensive experimental study on synthetic and reallife data sets. 1
Constraintbased rule mining in large, dense databases
, 1999
"... Constraintbased rule miners find all rules in a given dataset meeting userspecified constraints such as minimum support and confidence. We describe a new algorithm that directly exploits all userspecified constraints including minimum support, minimum confidence, and a new constraint that ensures ..."
Abstract

Cited by 150 (3 self)
 Add to MetaCart
Constraintbased rule miners find all rules in a given dataset meeting userspecified constraints such as minimum support and confidence. We describe a new algorithm that directly exploits all userspecified constraints including minimum support, minimum confidence, and a new constraint that ensures every mined rule offers a predictive advantage over any of its simplifications. Our algorithm maintains efficiency even at low supports on data that is dense (e.g. relational data). Previous approaches such as Apriori and its variants exploit only the minimum support constraint, and as a result are ineffective on dense data due to a combinatorial explosion of “frequent itemsets”.
Bottomup computation of sparse and Iceberg CUBE
 In Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP, DOLAP ’02
, 1999
"... We introduce the IcebergCUBE problem as a reformulation of the datacube (CUBE) problem. The IcebergCUBE problem is to compute only those groupby partitions with an aggregate value (e.g., count) above some minimum support threshold. The result of IcebergCUBE can be used (1) to answer groupby que ..."
Abstract

Cited by 150 (3 self)
 Add to MetaCart
We introduce the IcebergCUBE problem as a reformulation of the datacube (CUBE) problem. The IcebergCUBE problem is to compute only those groupby partitions with an aggregate value (e.g., count) above some minimum support threshold. The result of IcebergCUBE can be used (1) to answer groupby queries with a clause such as HAVING COUNT(*)> = X, where X is greater than the threshold, (2) for mining multidimensional association rules, and (3) to complement existing strategies for identifying interesting subsets of the CUBE for precomputation. We present a new algorithm (BUC) for IcebergCUBE computation. BUC builds the CUBE bottomup; i.e., it builds the CUBE by starting from a groupby on a single attribute, then a groupby on a pair of attributes, then a groupby on three attributes, and so on. This is the opposite of all techniques proposed earlier for computing the CUBE, and has an important practical advantage: BUC avoids computing the larger groupbys that do not meet minimum support. The pruning in BUC is similar to the pruning in the Apriori algorithm for association rules, except that BUC trades some pruning for locality of reference and reduced memory requirements. BUC uses the same pruning strategy when computing sparse, complete CUBES. We present a thorough performance evaluation over a broad range of workloads. Our evaluation demonstrates that (in contrast to earlier assumptions) minimizing the aggregations or the number of sorts is not the most important aspect of the sparse CUBE problem. The pruning in BUC, combined with an efficient sort method, enables BUC to outperform all previous algorithms for sparse CUBES, even for computing entire CUBES, and to dramatically improve IcebergCUBE computation. 1
Efficient mining of partial periodic patterns in time series database
 Proc. Int. Conf. on Data Engineering
, 1999
"... Partial periodicity search, i.e., search for partial periodic patterns in timeseries databases, is an interesting data mining problem. Previous studies on periodicity search mainly consider finding full periodic patterns, where every point in time contributes (precisely or approximately) to the per ..."
Abstract

Cited by 127 (16 self)
 Add to MetaCart
Partial periodicity search, i.e., search for partial periodic patterns in timeseries databases, is an interesting data mining problem. Previous studies on periodicity search mainly consider finding full periodic patterns, where every point in time contributes (precisely or approximately) to the periodicity. However, partial periodicity is very common in practice since it is more likely that only some of the time episodes may exhibit periodic patterns. We present several algorithms for efficient mining of partial periodic patterns, by exploring some interesting properties related to partial periodicity, such as the Apriori property and the maxsubpattern hit set property, and by shared mining of multiple periods. The maxsubpattern hit set property is a vital new property which allows us to derive the counts of all frequent patterns from a relatively small subset of patterns existing in the time series. We show that mining partial periodicity needs only two scans over the time series database, even for mining multiple periods. The performance study shows our proposed methods are very efficient in mining long periodic patterns.
Algorithms for Association Rule Mining  A General Survey and Comparison
, 2000
"... Today there are several efficient algorithms that cope with the popular and computationally expensive task of association rule mining. Actually, these algorithms are more or less described on their own. In this paper we explain the fundamentals of association rule mining and moreover derive a genera ..."
Abstract

Cited by 121 (5 self)
 Add to MetaCart
Today there are several efficient algorithms that cope with the popular and computationally expensive task of association rule mining. Actually, these algorithms are more or less described on their own. In this paper we explain the fundamentals of association rule mining and moreover derive a general framework. Based on this we describe today's approaches in context by pointing out common aspects and differences. After that we thoroughly investigate their strengths and weaknesses and carry out several runtime experiments. It turns out that the runtime behavior of the algorithms is much more similar as to be expected.
Pruning and Summarizing the Discovered Associations
, 1999
"... Association rules are a fundamental class of patterns that exist in data. The key strength of association rule mining is its completeness. It finds all associations in the data that satisfy the user specified minimum support and minimum confidence constraints. This strength, however, comes with a ma ..."
Abstract

Cited by 119 (8 self)
 Add to MetaCart
Association rules are a fundamental class of patterns that exist in data. The key strength of association rule mining is its completeness. It finds all associations in the data that satisfy the user specified minimum support and minimum confidence constraints. This strength, however, comes with a major drawback. It often produces a huge number of associations. This is particularly true for data sets whose attributes are highly correlated. The huge number of associations makes it very difficult, if not impossible, for a human user to analyze in order to identify those interesting/useful ones. In this paper, we propose a novel technique to overcome this problem. The technique first prunes the discovered associations to remove those insignificant associations, and then finds a special subset of the unpruned associations to form a summary of the discovered associations. We call this subset of associations the direction setting (DS) rules as they set the directions that are followed by the...
Mining Association Rules with Multiple Minimum Supports
 In Knowledge Discovery and Data Mining
, 1999
"... Association rule mining is an important model in data mining. Its mining algorithms discover all item associations (or rules) in the data that satisfy the userspecified minimum support (minsup) and minimum confidence (minconf) constraints. Minsup controls the minimum number of data cases that a rul ..."
Abstract

Cited by 108 (6 self)
 Add to MetaCart
Association rule mining is an important model in data mining. Its mining algorithms discover all item associations (or rules) in the data that satisfy the userspecified minimum support (minsup) and minimum confidence (minconf) constraints. Minsup controls the minimum number of data cases that a rule must cover. Minconf controls the predictive strength of the rule. Since only one minsup is used for the whole database, the model implicitly assumes that all items in the data are of the same nature and/or have similar frequencies in the data. This is, however, seldom the case in reallife applications. In many applications, some items appear very frequently in the data, while others rarely appear. If minsup is set too high, those rules that involve rare items will not be found. To find rules that involve both frequent and rare items, minsup has to be set very low. This may cause combinatorial explosion because those frequent items will be associated with one another in all possible ways. T...
Mining sequential patterns by patterngrowth: The PrefixSpan approach
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2004
"... Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining ..."
Abstract

Cited by 107 (8 self)
 Add to MetaCart
Sequential pattern mining is an important data mining problem with broad applications. However, it is also a difficult problem since the mining may have to generate or examine a combinatorially explosive number of intermediate subsequences. Most of the previously developed sequential pattern mining methods, such as GSP, explore a candidate generationandtest approach [1] to reduce the number of candidates to be examined. However, this approach may not be efficient in mining large sequence databases having numerous patterns and/or long patterns. In this paper, we propose a projectionbased, sequential patterngrowth approach for efficient mining of sequential patterns. In this approach, a sequence database is recursively projected into a set of smaller projected databases, and sequential patterns are grown in each projected database by exploring only locally frequent fragments. Based on an initial study of the pattern growthbased sequential pattern mining, FreeSpan [8], we propose a more efficient method, called PSP, which offers ordered growth and reduced projected databases. To further improve the performance, a pseudoprojection technique is developed in PrefixSpan. A comprehensive performance study shows that PrefixSpan, in most cases, outperforms the a prioribased algorithm GSP, FreeSpan, and SPADE [29] (a sequential pattern mining algorithm that adopts vertical data format), and PrefixSpan integrated with pseudoprojection is the fastest among all the tested algorithms. Furthermore, this mining methodology can be extended to mining sequential patterns with userspecified constraints. The high promise of the patterngrowth approach may lead to its further extension toward efficient mining of other kinds of frequent patterns, such as frequent substructures.
Mining Frequent Itemsets with Convertible Constraints
 Proc. of 2001 Int. Conf. on Data Engineering
, 2001
"... Recent work has highlighted the importance of the constraintbased mining paradigm in the context of frequent itemsets, associations, correlations, sequential patterns, and many other interesting patterns in large databases. In this paper, we study constraints which cannot be handled with existing t ..."
Abstract

Cited by 96 (16 self)
 Add to MetaCart
Recent work has highlighted the importance of the constraintbased mining paradigm in the context of frequent itemsets, associations, correlations, sequential patterns, and many other interesting patterns in large databases. In this paper, we study constraints which cannot be handled with existing theory and techniques. For example,, , ( can contain items of arbitrary values) "!$ # %'&) ( , are customarily regarded as “tough ” constraints in that they cannot be pushed inside an algorithm such as Apriori. We develop a notion of convertible constraints and systematically analyze, classify, and characterize this class. We also develop techniques which enable them to be readily pushed deep inside the recently developed FPgrowth algorithm for frequent itemset mining. Results from our detailed experiments show the effectiveness of the techniques developed. 1.