Results 1  10
of
115
Summarizing itemset patterns: a profilebased approach
 In KDD
, 2005
"... Frequentpattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequentpattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generat ..."
Abstract

Cited by 48 (6 self)
 Add to MetaCart
(Show Context)
Frequentpattern mining has been studied extensively on scalable methods for mining various kinds of patterns including itemsets, sequences, and graphs. However, the bottleneck of frequentpattern mining is not at the efficiency but at the interpretability, due to the huge number of patterns generated by the mining process. In this paper, we examine how to summarize a collection of itemset patterns using only K representatives, a small number of patterns that a user can handle easily. The K representatives should not only cover most of the frequent patterns but also approximate their supports. A generative model is built to extract and profile these representatives, under which the supports of the patterns can be easily recovered without consulting the original dataset. Based on the restoration error, we propose a quality measure function to determine the optimal value of parameter K. Polynomial time algorithms are developed together with several optimization heuristics for efficiency improvement. Empirical studies indicate that we can obtain compact summarization in real datasets.
Survey on frequent pattern mining
, 2002
"... Frequent itemsets play an essential role in many data mining tasks that try to find interesting patterns from databases, such as association rules, correlations, sequences, episodes, classifiers, clusters and many more of which the mining of association rules is one of the most popular problems. The ..."
Abstract

Cited by 48 (1 self)
 Add to MetaCart
(Show Context)
Frequent itemsets play an essential role in many data mining tasks that try to find interesting patterns from databases, such as association rules, correlations, sequences, episodes, classifiers, clusters and many more of which the mining of association rules is one of the most popular problems. The
Roddick “Association Mining
 Article 5, Publication
, 2006
"... The task of finding correlations between items in a dataset, association mining, has received considerable attention over the last decade. This article presents a survey of association mining fundamentals, detailing the evolution of association mining algorithms from the seminal to the stateofthe ..."
Abstract

Cited by 47 (1 self)
 Add to MetaCart
The task of finding correlations between items in a dataset, association mining, has received considerable attention over the last decade. This article presents a survey of association mining fundamentals, detailing the evolution of association mining algorithms from the seminal to the stateoftheart. This survey focuses on the fundamental principles of association mining, that is, itemset identification, rule generation, and their generic optimizations.
Discovering significant patterns
, 2007
"... Pattern discovery techniques, such as association rule discovery, explore large search spaces of potential patterns to find those that satisfy some userspecified constraints. Due to the large number of patterns considered, they suffer from an extreme risk of type1 error, that is, of finding patter ..."
Abstract

Cited by 46 (3 self)
 Add to MetaCart
(Show Context)
Pattern discovery techniques, such as association rule discovery, explore large search spaces of potential patterns to find those that satisfy some userspecified constraints. Due to the large number of patterns considered, they suffer from an extreme risk of type1 error, that is, of finding patterns that appear due to chance alone to satisfy the constraints on the sample data. This paper proposes techniques to overcome this problem by applying wellestablished statistical practices. These allow the user to enforce a strict upper limit on the risk of experimentwise error. Empirical studies demonstrate that standard pattern discovery techniques can discover numerous spurious patterns when applied to random data and when applied to realworld data result in large numbers of patterns that are rejected when subjected to sound statistical evaluation. They also reveal that a number of pragmatic choices about how such tests are performed can greatly affect their power.
Depthfirst nonderivable itemset mining
 In SIAM Int. Conf. on Data Mining (SDM’05
, 2005
"... Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, indepe ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
(Show Context)
Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, independently of the algorithm used. Therefore, it is often more interesting to mine a reduced collection of interesting itemsets, i.e., a condensed representation. Recently, in this context, the nonderivable itemsets were proposed as an important class of itemsets. An itemset is called derivable when its support is completely determined by the support of its subsets. As such, derivable itemsets represent redundant information and can be pruned from the collection of frequent itemsets. It was shown both theoretically and experimentally that the collection of nonderivable frequent itemsets is in general much smaller than the complete set of frequent itemsets. A breadthfirst, Aprioribased algorithm, called NDI, to find all nonderivable itemsets was proposed. In this paper we present a depthfirst algorithm, dfNDI, that is based on Eclat for mining the nonderivable itemsets. dfNDI is evaluated on reallife datasets, and experiments show that dfNDI outperforms NDI with an order of magnitude. 1
Weighted Association Rule Mining using Weighted Support and Significance Framework
 ACM SIGKDD
, 2003
"... We address the issues of discovering significant binary relationships in transaction datasets in a weighted setting. Traditional model of association rule mining is adapted to handle weighted association rule mining problems where each item is allowed to have a weight. The goal is to steer the minin ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
(Show Context)
We address the issues of discovering significant binary relationships in transaction datasets in a weighted setting. Traditional model of association rule mining is adapted to handle weighted association rule mining problems where each item is allowed to have a weight. The goal is to steer the mining focus to those significant relationships involving items with significant weights rather than being flooded in the combinatorial explosion of insignificant relationships. We identify the challenge of using weights in the iterative process of generating large itemsets. The problem of invalidation of the “downward closure property ” in the weighted setting is solved by using an improved model of weighted support measurements and exploiting a “weighted downward closure property”. A new algorithm called WARM (Weighted Association Rule Mining) is developed based on the improved model. The algorithm is both scalable and efficient in discovering significant relationships in weighted settings as illustrated by experiments performed on simulated datasets.
A survey on condensed representations for frequent sets
 In: Constraint Based Mining and Inductive Databases, SpringerVerlag, LNAI
, 2005
"... Abstract. Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Freq ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Frequent sets are indeed useful for many data mining tasks, including the popular association rule mining task but also feature construction, associationbased classification, clustering, etc. The research in this area has been boosted by the fascinating concept of condensed representations w.r.t. frequency queries. Such representations can be used to support the discovery of every frequent set and its support without looking back at the data. Interestingly, the size of condensed representations can be several orders of magnitude smaller than the size of frequent set collections. Most of the proposals concern exact representations while it is also possible to consider approximated ones, i.e., to trade computational complexity with a bounded approximation on the computed support values. This paper surveys the core concepts used in the recent works on condensed representation for frequent sets. 1
Minimal kFree Representations of Frequent Sets
"... Due to the potentially immense amount of frequent sets that can be generated from transactional databases, recent studies have demonstrated the need for concise representations of all frequent sets. ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
(Show Context)
Due to the potentially immense amount of frequent sets that can be generated from transactional databases, recent studies have demonstrated the need for concise representations of all frequent sets.
Extracting redundancyaware topk patterns
 In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, 2006
"... Observed in many applications, there is a potential need of extracting a small set of frequent patterns having not only high significance but also low redundancy. The significance is usually defined by the context of applications. Previous studies have been concentrating on how to compute topk sign ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
(Show Context)
Observed in many applications, there is a potential need of extracting a small set of frequent patterns having not only high significance but also low redundancy. The significance is usually defined by the context of applications. Previous studies have been concentrating on how to compute topk significant patterns or how to remove redundancy among patterns separately. There is limited work on finding those topk patterns which demonstrate highsignificance and lowredundancy simultaneously. In this paper, we study the problem of extracting redundancyaware topk patterns from a large collection of frequent patterns. We first examine the evaluation functions for measuring the combined significance of a pattern set and propose the MMS (Maximal Marginal Significance) as the problem formulation. The problem is known as NPhard. We further present a greedy algorithm which approximates the optimal solution with performance bound O(log k) (with conditions on redundancy), where k is the number of reported patterns. The direct usage of redundancyaware topk patterns is illustrated through two real applications: disk block prefetch and document theme extraction. Our method can also be applied to processing redundancyaware topk queries in traditional database.
On Inverse Frequent Set Mining
, 2003
"... Frequent set mining is a wellknown technique to summarize binary data. However, it is an open problem how difficult it is to invert the frequent set mining, i.e., how difficult it is to find a binary data set that is compatible with frequent set mining results, the frequent sets. This inverse data ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Frequent set mining is a wellknown technique to summarize binary data. However, it is an open problem how difficult it is to invert the frequent set mining, i.e., how difficult it is to find a binary data set that is compatible with frequent set mining results, the frequent sets. This inverse data mining problem is related to the questions of how well privacy is preserved in the frequent sets and how well the frequent sets characterize the original data set. In this paper we analyze the computational complexity of the problem of finding a binary data set compatible with a given collection of frequent sets and show that in many cases the problem is computationally very difficult.