Results 1  10
of
96
Computing Iceberg Concept Lattices with TITANIC
, 2002
"... We introduce the notion of iceberg concept lattices... ..."
Abstract

Cited by 86 (13 self)
 Add to MetaCart
We introduce the notion of iceberg concept lattices...
CHARM: An Efficient Algorithm for Closed Association Rule Mining
 COMPUTER SCIENCE, RENSSELAER POLYTECHNIC INSTITUTE
, 1999
"... The task of mining association rules consists of two main steps. The first involves finding the set of all frequent itemsets. The second step involves testing and generating all high confidence rules among itemsets. In this paper we show that it is not necessary to mine all frequent itemsets in th ..."
Abstract

Cited by 72 (7 self)
 Add to MetaCart
The task of mining association rules consists of two main steps. The first involves finding the set of all frequent itemsets. The second step involves testing and generating all high confidence rules among itemsets. In this paper we show that it is not necessary to mine all frequent itemsets in the first step, instead it is sufficient to mine the set of closed frequent itemsets, which is much smaller than the set of all frequent itemsets. It is also not necessary to mine the set of all possible rules. We show that any rule between itemsets is equivalent to some rule between closed itemsets. Thus many redundant rules can be eliminated. Furthermore, we present CHARM, an efficient algorithm for mining all closed frequent itemsets. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM outperforms previous methods by an order of magnitude or more. It is also linearly scalable in the number of transactions and the number of closed itemsets found.
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets u ..."
Abstract

Cited by 57 (6 self)
 Add to MetaCart
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemsettidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hashbased approach to remove any "nonclosed" sets found during computation. We also present CHARML, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a stateoftheart algorithm that outperforms previous methods. Further, CHARML explicitly generates the frequent closed itemset lattice.
Frequent Closures as a Concise Representation for Binary Data Mining
, 2000
"... Frequent set discovery from binary data is an important problem in data mining. It concerns the discovery of a concise representation of large tables from which descriptive rules can be derived, e.g., the popular association rules. Our work concerns the study of two representations, namely frequ ..."
Abstract

Cited by 51 (22 self)
 Add to MetaCart
Frequent set discovery from binary data is an important problem in data mining. It concerns the discovery of a concise representation of large tables from which descriptive rules can be derived, e.g., the popular association rules. Our work concerns the study of two representations, namely frequent sets and frequent closures. N. Pasquier and colleagues designed the algorithm that provides frequent sets via the discovery of frequent closures. When one mines highly correlated data, algorithms clearly fail while remains tractable. We discuss our implementation of and the experimental evidence we got from two reallife binary data mining processes. Then, we introduce the concept of almostclosure (generation of every frequent set from frequent almostclosures remains possible but with a bounded error on frequency). To the best of our knowledge, this is a new concept and, here again, we provide some experimental evidence of its addvalue.
On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets
, 2002
"... Let A be an mn binary matrix, t . . . , m} be a threshold, and # > 0 be a positive parameter. We show that given a family of O(n ) maximal tfrequent column sets for A, it is NPcomplete to decide whether A has any further maximal tfrequent sets, or not, even when the number of such ad ..."
Abstract

Cited by 39 (9 self)
 Add to MetaCart
Let A be an mn binary matrix, t . . . , m} be a threshold, and # > 0 be a positive parameter. We show that given a family of O(n ) maximal tfrequent column sets for A, it is NPcomplete to decide whether A has any further maximal tfrequent sets, or not, even when the number of such additional maximal tfrequent column sets may be exponentially large. In contrast, all minimal tinfrequent sets of columns of A can be enumerated in incremental quasipolynomial time. The proof of the latter result follows from the inequality # t + 1)#, where # and # are respectively the numbers of all maximal tfrequent and all minimal tinfrequent sets of columns of the matrix A. We also discuss the complexity of generating all closed tfrequent column sets for a given binary matrix.
Closed Set Based Discovery of Small Covers for Association Rules
 PROC. 15EMES JOURNEES BASES DE DONNEES AVANCEES, BDA
, 1999
"... In this paper, we address the problem of the usefulness of the set of discovered association rules. This problem is important since reallife databases yield most of the time several thousands of rules with high confidence. We propose new algorithms based on Galois closed sets to reduce the extracti ..."
Abstract

Cited by 33 (4 self)
 Add to MetaCart
In this paper, we address the problem of the usefulness of the set of discovered association rules. This problem is important since reallife databases yield most of the time several thousands of rules with high confidence. We propose new algorithms based on Galois closed sets to reduce the extraction to small covers, or bases, for exact and approximate rules. Once frequent closed itemsets which constitute a generating set for both frequent itemsets and association rules have been discovered, no additional database pass is needed to derive these bases. Experiments conducted on reallife databases show that these algorithms are efficient and valuable in practice.
New Algorithms for Enumerating All Maximal Cliques
, 2004
"... Abstract. In this paper, we consider the problems of generating all maximal (bipartite) cliques in a given (bipartite) graph G = (V, E) with n vertices and m edges. We propose two algorithms for enumerating all maximal cliques. One runs with O(M(n)) time delay and in O(n 2) space and the other runs ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
Abstract. In this paper, we consider the problems of generating all maximal (bipartite) cliques in a given (bipartite) graph G = (V, E) with n vertices and m edges. We propose two algorithms for enumerating all maximal cliques. One runs with O(M(n)) time delay and in O(n 2) space and the other runs with O( ∆ 4) time delay and in O(n + m) space, where ∆ denotes the maximum degree of G, M(n) denotes the time needed to multiply two n × n matrices, and the latter one requires O(nm) time as a preprocessing. For a given bipartite graph G, we propose three algorithms for enumerating all maximal bipartite cliques. The first algorithm runs with O(M(n)) time delay and in O(n 2) space, which immediately follows from the algorithm for the nonbipartite case. The second one runs with O( ∆ 3) time delay and in O(n + m) space, and the last one runs with O( ∆ 2) time delay and in O(n + m + N∆) space, where N denotes the number of all maximal bipartite cliques in G and both algorithms require O(nm) time as a preprocessing. Our algorithms improve upon all the existing algorithms, when G is either dense or sparse. Furthermore, computational experiments show that our algorithms for sparse graphs have significantly good performance for graphs which are generated randomly and appear in realworld problems. 1
A survey on condensed representations for frequent sets
 In: Constraint Based Mining and Inductive Databases, SpringerVerlag, LNAI
, 2005
"... Abstract. Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Freq ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
Abstract. Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Frequent sets are indeed useful for many data mining tasks, including the popular association rule mining task but also feature construction, associationbased classification, clustering, etc. The research in this area has been boosted by the fascinating concept of condensed representations w.r.t. frequency queries. Such representations can be used to support the discovery of every frequent set and its support without looking back at the data. Interestingly, the size of condensed representations can be several orders of magnitude smaller than the size of frequent set collections. Most of the proposals concern exact representations while it is also possible to consider approximated ones, i.e., to trade computational complexity with a bounded approximation on the computed support values. This paper surveys the core concepts used in the recent works on condensed representation for frequent sets. 1
Generating a condensed representation for association rules
 JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, KLUWER ACADEMIC PUBLISHER
, 2005
"... Association rule extraction from operational datasets often produces several tens of thousands, and even millions, of association rules. Moreover, many of these rules are redundant and thus useless. Using a semantic based on the closure of the Galois connection, we define a condensed representation ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
Association rule extraction from operational datasets often produces several tens of thousands, and even millions, of association rules. Moreover, many of these rules are redundant and thus useless. Using a semantic based on the closure of the Galois connection, we define a condensed representation for association rules. This representation is characterized by frequent closed itemsets and their generators. It contains the nonredundant association rules having minimal antecedent and maximal consequent, called minmax association rules. We think that these rules are the most relevant since they are the most general nonredundant association rules. Furthermore, this representation is a basis, i.e., a generating set for all association rules, their supports and their confidences, and all of them can be retrieved needless accessing the data. We introduce algorithms for extracting this basis and for reconstructing all association rules. Results of experiments carried out on real datasets show the usefulness of this approach. In order to generate this basis when an algorithm for extracting frequent itemsets—such as APRIORI for instance—is used, we also present an algorithm for deriving frequent closed itemsets and their generators from frequent itemsets without using the dataset.
Generalizing the Notion of Support
 In KDD’04
, 2004
"... The goal of this paper is to show that generalizing the notion of support can be useful in extending association analysis to nontraditional types of patterns and nonbinary data. To that end, we describe a framework for generalizing support that is based on the simple, but useful observation that s ..."
Abstract

Cited by 19 (7 self)
 Add to MetaCart
The goal of this paper is to show that generalizing the notion of support can be useful in extending association analysis to nontraditional types of patterns and nonbinary data. To that end, we describe a framework for generalizing support that is based on the simple, but useful observation that support can be viewed as the composition of two functions: a function that evaluates the strength or presence of a pattern in each object (transaction) and a function that summarizes these evaluations with a single number. A key goal of any framework is to allow people to more easily express, explore, and communicate ideas, and hence, we illustrate how our support framework can be used to describe support for a variety of commonly used association patterns, such as frequent itemsets, general Boolean patterns, and errortolerant itemsets. We also present two examples of the practical usefulness of generalized support. One example shows the usefulness of support functions for continuous data. Another example shows how the hyperclique patternan association pattern originally defined for binary datacan be extended to continuous data by generalizing a support function.