Results 1  10
of
104
CHARM: An efficient algorithm for closed itemset mining
, 2002
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets usin ..."
Abstract

Cited by 272 (14 self)
 Add to MetaCart
(Show Context)
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemsettidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally it uses a fast hashbased approach to remove any “nonclosed” sets found during computation. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM significantly outperforms previous methods. It is also linearly scalable in the number of transactions.
Mining All NonDerivable Frequent Itemsets
, 2002
"... Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be prohibitively large. To overcome this problem, recently sev ..."
Abstract

Cited by 113 (12 self)
 Add to MetaCart
(Show Context)
Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be prohibitively large. To overcome this problem, recently several proposals have been made to construct a concise representation of the frequent itemsets, instead of mining all frequent itemsets. The main goal of this paper is to identify redundancies in the set of all frequent itemsets and to exploit these redundancies in order to reduce the result of a mining operation. We present deduction rules to derive tight bounds on the support of candidate itemsets. We show how the deduction rules allow for constructing a minimal representation for all frequent itemsets. We also present connections between our proposal and recent proposals for concise representations and we give the results of experiments on reallife datasets that show the effectiveness of the deduction rules. In fact, the experiments even show that in many cases, first mining the concise representation, and then creating the frequent itemsets from this representation outperforms existing frequent set mining algorithms.
Computing Iceberg Concept Lattices with TITANIC
, 2002
"... We introduce the notion of iceberg concept lattices... ..."
Abstract

Cited by 93 (13 self)
 Add to MetaCart
We introduce the notion of iceberg concept lattices...
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets u ..."
Abstract

Cited by 60 (6 self)
 Add to MetaCart
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemsettidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hashbased approach to remove any "nonclosed" sets found during computation. We also present CHARML, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a stateoftheart algorithm that outperforms previous methods. Further, CHARML explicitly generates the frequent closed itemset lattice.
Roddick “Association Mining
 Article 5, Publication
, 2006
"... The task of finding correlations between items in a dataset, association mining, has received considerable attention over the last decade. This article presents a survey of association mining fundamentals, detailing the evolution of association mining algorithms from the seminal to the stateofthe ..."
Abstract

Cited by 47 (1 self)
 Add to MetaCart
The task of finding correlations between items in a dataset, association mining, has received considerable attention over the last decade. This article presents a survey of association mining fundamentals, detailing the evolution of association mining algorithms from the seminal to the stateoftheart. This survey focuses on the fundamental principles of association mining, that is, itemset identification, rule generation, and their generic optimizations.
Depthfirst nonderivable itemset mining
 In SIAM Int. Conf. on Data Mining (SDM’05
, 2005
"... Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, indepe ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
(Show Context)
Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, independently of the algorithm used. Therefore, it is often more interesting to mine a reduced collection of interesting itemsets, i.e., a condensed representation. Recently, in this context, the nonderivable itemsets were proposed as an important class of itemsets. An itemset is called derivable when its support is completely determined by the support of its subsets. As such, derivable itemsets represent redundant information and can be pruned from the collection of frequent itemsets. It was shown both theoretically and experimentally that the collection of nonderivable frequent itemsets is in general much smaller than the complete set of frequent itemsets. A breadthfirst, Aprioribased algorithm, called NDI, to find all nonderivable itemsets was proposed. In this paper we present a depthfirst algorithm, dfNDI, that is based on Eclat for mining the nonderivable itemsets. dfNDI is evaluated on reallife datasets, and experiments show that dfNDI outperforms NDI with an order of magnitude. 1
A survey on condensed representations for frequent sets
 In: Constraint Based Mining and Inductive Databases, SpringerVerlag, LNAI
, 2005
"... Abstract. Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Freq ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Frequent sets are indeed useful for many data mining tasks, including the popular association rule mining task but also feature construction, associationbased classification, clustering, etc. The research in this area has been boosted by the fascinating concept of condensed representations w.r.t. frequency queries. Such representations can be used to support the discovery of every frequent set and its support without looking back at the data. Interestingly, the size of condensed representations can be several orders of magnitude smaller than the size of frequent set collections. Most of the proposals concern exact representations while it is also possible to consider approximated ones, i.e., to trade computational complexity with a bounded approximation on the computed support values. This paper surveys the core concepts used in the recent works on condensed representation for frequent sets. 1
Constraintbased concept mining and its application to microarray data analysis
 Intell. Data Anal
, 2005
"... data analysis ..."
Adaptive and ResourceAware Mining of Frequent Sets
, 2002
"... The performance of an algorithm that mines frequent sets from transactional databases may severely depend on the specific features of the data being analyzed. Moreover, some architectural characteristics of the computational platform used  e.g. the available main memory  can dramatically chang ..."
Abstract

Cited by 30 (8 self)
 Add to MetaCart
The performance of an algorithm that mines frequent sets from transactional databases may severely depend on the specific features of the data being analyzed. Moreover, some architectural characteristics of the computational platform used  e.g. the available main memory  can dramatically change the runtime behaviors of the algorithm. In this paper we present DCI (Direct Count & Intersect), an e#cient data mining algorithm for discovering frequent sets from large databases, which e#ectively addresses the issues mentioned above. DCI adopts a classical levelwise approach based on candidate generation to extract frequent sets, but uses a hybrid method to determine candidate supports. The most innovative contribution of DCI relies on the multiple heuristics strategies employed, which permits DCI to adapt its behavior not only to the features of the specific computing platform, but also to the features of the dataset being mined, so that it results e#ective in mining both short and long patterns from sparse and dense datasets. The large amount of tests conducted permit us to state that DCI sensibly outperforms stateofthe art algorithms for both synthetic and realworld datasets. Finally we also discuss the parallelization strategies adopted in the design of ParDCI, a distributed and multithreaded implementation of DCI. ParDCI
Minimal kFree Representations of Frequent Sets
"... Due to the potentially immense amount of frequent sets that can be generated from transactional databases, recent studies have demonstrated the need for concise representations of all frequent sets. ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
Due to the potentially immense amount of frequent sets that can be generated from transactional databases, recent studies have demonstrated the need for concise representations of all frequent sets.