Results 1  10
of
108
Computing Iceberg Concept Lattices with TITANIC
, 2002
"... We introduce the notion of iceberg concept lattices... ..."
Abstract

Cited by 115 (15 self)
 Add to MetaCart
We introduce the notion of iceberg concept lattices...
CHARM: An Efficient Algorithm for Closed Association Rule Mining
 COMPUTER SCIENCE, RENSSELAER POLYTECHNIC INSTITUTE
, 1999
"... The task of mining association rules consists of two main steps. The first involves finding the set of all frequent itemsets. The second step involves testing and generating all high confidence rules among itemsets. In this paper we show that it is not necessary to mine all frequent itemsets in th ..."
Abstract

Cited by 86 (7 self)
 Add to MetaCart
The task of mining association rules consists of two main steps. The first involves finding the set of all frequent itemsets. The second step involves testing and generating all high confidence rules among itemsets. In this paper we show that it is not necessary to mine all frequent itemsets in the first step, instead it is sufficient to mine the set of closed frequent itemsets, which is much smaller than the set of all frequent itemsets. It is also not necessary to mine the set of all possible rules. We show that any rule between itemsets is equivalent to some rule between closed itemsets. Thus many redundant rules can be eliminated. Furthermore, we present CHARM, an efficient algorithm for mining all closed frequent itemsets. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM outperforms previous methods by an order of magnitude or more. It is also linearly scalable in the number of transactions and the number of closed itemsets found.
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets u ..."
Abstract

Cited by 80 (7 self)
 Add to MetaCart
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemsettidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hashbased approach to remove any "nonclosed" sets found during computation. We also present CHARML, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a stateoftheart algorithm that outperforms previous methods. Further, CHARML explicitly generates the frequent closed itemset lattice.
New Algorithms for Enumerating All Maximal Cliques
, 2004
"... Abstract. In this paper, we consider the problems of generating all maximal (bipartite) cliques in a given (bipartite) graph G = (V, E) with n vertices and m edges. We propose two algorithms for enumerating all maximal cliques. One runs with O(M(n)) time delay and in O(n 2) space and the other runs ..."
Abstract

Cited by 58 (1 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper, we consider the problems of generating all maximal (bipartite) cliques in a given (bipartite) graph G = (V, E) with n vertices and m edges. We propose two algorithms for enumerating all maximal cliques. One runs with O(M(n)) time delay and in O(n 2) space and the other runs with O( ∆ 4) time delay and in O(n + m) space, where ∆ denotes the maximum degree of G, M(n) denotes the time needed to multiply two n × n matrices, and the latter one requires O(nm) time as a preprocessing. For a given bipartite graph G, we propose three algorithms for enumerating all maximal bipartite cliques. The first algorithm runs with O(M(n)) time delay and in O(n 2) space, which immediately follows from the algorithm for the nonbipartite case. The second one runs with O( ∆ 3) time delay and in O(n + m) space, and the last one runs with O( ∆ 2) time delay and in O(n + m + N∆) space, where N denotes the number of all maximal bipartite cliques in G and both algorithms require O(nm) time as a preprocessing. Our algorithms improve upon all the existing algorithms, when G is either dense or sparse. Furthermore, computational experiments show that our algorithms for sparse graphs have significantly good performance for graphs which are generated randomly and appear in realworld problems. 1
Frequent Closures as a Concise Representation for Binary Data Mining
, 2000
"... Frequent set discovery from binary data is an important problem in data mining. It concerns the discovery of a concise representation of large tables from which descriptive rules can be derived, e.g., the popular association rules. Our work concerns the study of two representations, namely frequ ..."
Abstract

Cited by 52 (22 self)
 Add to MetaCart
(Show Context)
Frequent set discovery from binary data is an important problem in data mining. It concerns the discovery of a concise representation of large tables from which descriptive rules can be derived, e.g., the popular association rules. Our work concerns the study of two representations, namely frequent sets and frequent closures. N. Pasquier and colleagues designed the algorithm that provides frequent sets via the discovery of frequent closures. When one mines highly correlated data, algorithms clearly fail while remains tractable. We discuss our implementation of and the experimental evidence we got from two reallife binary data mining processes. Then, we introduce the concept of almostclosure (generation of every frequent set from frequent almostclosures remains possible but with a bounded error on frequency). To the best of our knowledge, this is a new concept and, here again, we provide some experimental evidence of its addvalue.
On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets
, 2002
"... Let A be an mn binary matrix, t . . . , m} be a threshold, and # > 0 be a positive parameter. We show that given a family of O(n ) maximal tfrequent column sets for A, it is NPcomplete to decide whether A has any further maximal tfrequent sets, or not, even when the number of such ad ..."
Abstract

Cited by 47 (11 self)
 Add to MetaCart
(Show Context)
Let A be an mn binary matrix, t . . . , m} be a threshold, and # > 0 be a positive parameter. We show that given a family of O(n ) maximal tfrequent column sets for A, it is NPcomplete to decide whether A has any further maximal tfrequent sets, or not, even when the number of such additional maximal tfrequent column sets may be exponentially large. In contrast, all minimal tinfrequent sets of columns of A can be enumerated in incremental quasipolynomial time. The proof of the latter result follows from the inequality # t + 1)#, where # and # are respectively the numbers of all maximal tfrequent and all minimal tinfrequent sets of columns of the matrix A. We also discuss the complexity of generating all closed tfrequent column sets for a given binary matrix.
Generating a condensed representation for association rules
 JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, KLUWER ACADEMIC PUBLISHER
, 2005
"... Association rule extraction from operational datasets often produces several tens of thousands, and even millions, of association rules. Moreover, many of these rules are redundant and thus useless. Using a semantic based on the closure of the Galois connection, we define a condensed representation ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
Association rule extraction from operational datasets often produces several tens of thousands, and even millions, of association rules. Moreover, many of these rules are redundant and thus useless. Using a semantic based on the closure of the Galois connection, we define a condensed representation for association rules. This representation is characterized by frequent closed itemsets and their generators. It contains the nonredundant association rules having minimal antecedent and maximal consequent, called minmax association rules. We think that these rules are the most relevant since they are the most general nonredundant association rules. Furthermore, this representation is a basis, i.e., a generating set for all association rules, their supports and their confidences, and all of them can be retrieved needless accessing the data. We introduce algorithms for extracting this basis and for reconstructing all association rules. Results of experiments carried out on real datasets show the usefulness of this approach. In order to generate this basis when an algorithm for extracting frequent itemsets—such as APRIORI for instance—is used, we also present an algorithm for deriving frequent closed itemsets and their generators from frequent itemsets without using the dataset.
Item sets that compress
, 2006
"... One of the major problems in frequent item set mining is the explosion of the number of results: it is difficult to find the most interesting frequent item sets. The cause of this explosion is that large sets of frequent item sets describe essentially the same set of transactions. In this paper we a ..."
Abstract

Cited by 44 (23 self)
 Add to MetaCart
(Show Context)
One of the major problems in frequent item set mining is the explosion of the number of results: it is difficult to find the most interesting frequent item sets. The cause of this explosion is that large sets of frequent item sets describe essentially the same set of transactions. In this paper we approach this problem using the MDL principle: the best set of frequent item sets is that set that compresses the database best. We introduce four heuristic algorithms for this task, and the experiments show that these algorithms give a dramatic reduction in the number of frequent item sets. Moreover, we show how our approach can be used to determine the best value for the minsup threshold.
Closed Set Based Discovery of Small Covers for Association Rules
 PROC. 15EMES JOURNEES BASES DE DONNEES AVANCEES, BDA
, 1999
"... In this paper, we address the problem of the usefulness of the set of discovered association rules. This problem is important since reallife databases yield most of the time several thousands of rules with high confidence. We propose new algorithms based on Galois closed sets to reduce the extracti ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
(Show Context)
In this paper, we address the problem of the usefulness of the set of discovered association rules. This problem is important since reallife databases yield most of the time several thousands of rules with high confidence. We propose new algorithms based on Galois closed sets to reduce the extraction to small covers, or bases, for exact and approximate rules. Once frequent closed itemsets which constitute a generating set for both frequent itemsets and association rules have been discovered, no additional database pass is needed to derive these bases. Experiments conducted on reallife databases show that these algorithms are efficient and valuable in practice.
A survey on condensed representations for frequent sets
 In: Constraint Based Mining and Inductive Databases, SpringerVerlag, LNAI
, 2005
"... Abstract. Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Freq ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Frequent sets are indeed useful for many data mining tasks, including the popular association rule mining task but also feature construction, associationbased classification, clustering, etc. The research in this area has been boosted by the fascinating concept of condensed representations w.r.t. frequency queries. Such representations can be used to support the discovery of every frequent set and its support without looking back at the data. Interestingly, the size of condensed representations can be several orders of magnitude smaller than the size of frequent set collections. Most of the proposals concern exact representations while it is also possible to consider approximated ones, i.e., to trade computational complexity with a bounded approximation on the computed support values. This paper surveys the core concepts used in the recent works on condensed representation for frequent sets. 1