Results 1  10
of
243
Discovering Frequent Closed Itemsets for Association Rules
, 1999
"... In this paper, we address the problem of finding frequent itemsets in a database. Using the closed itemset lattice framework, we show that this problem can be reduced to the problem of finding frequent closed itemsets. Based on this statement, we can construct efficient data mining algorithms by lim ..."
Abstract

Cited by 354 (10 self)
 Add to MetaCart
(Show Context)
In this paper, we address the problem of finding frequent itemsets in a database. Using the closed itemset lattice framework, we show that this problem can be reduced to the problem of finding frequent closed itemsets. Based on this statement, we can construct efficient data mining algorithms by limiting the search space to the closed itemset lattice rather than the subset lattice. Moreover, we show that the set of all frequent closed itemsets suffices to determine a reduced set of association rules, thus addressing another important data mining problem: limiting the number of rules produced without information loss. We propose a new algorithm, called AClose, using a closure mechanism to find frequent closed itemsets. We realized experiments to compare our approach to the commonly used frequent itemset search approach. Those experiments showed that our approach is very valuable for dense and/or correlated data that represent an important part of existing databases.
Discovery of frequent episodes in event sequences
 Data Min. Knowl. Discov
, 1997
"... Abstract. Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a given partial order. We consider the problem of discovering frequently occurring episodes in ..."
Abstract

Cited by 317 (14 self)
 Add to MetaCart
Abstract. Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a given partial order. We consider the problem of discovering frequently occurring episodes in a sequence. Once such episodes are known, one can produce rules for describing or predicting the behavior of the sequence. We give efficient algorithms for the discovery of all frequent episodes from a given class of episodes, and present detailed experimental results. The methods are in use in telecommunication alarm management. Keywords: event sequences, frequent episodes, sequence analysis 1.
Efficient Mining of Emerging Patterns: Discovering Trends and Differences
, 1999
"... We introduce a new kind of patterns, called emerging patterns (EPs), for knowledge discovery from databases. EPs are defined as itemsets whose supports increase significantly from one dataset to another. EPs can capture emerging trends in timestamped databases, or useful contrasts between data clas ..."
Abstract

Cited by 265 (31 self)
 Add to MetaCart
(Show Context)
We introduce a new kind of patterns, called emerging patterns (EPs), for knowledge discovery from databases. EPs are defined as itemsets whose supports increase significantly from one dataset to another. EPs can capture emerging trends in timestamped databases, or useful contrasts between data classes. EPs have been proven useful: we have used them to build very powerful classifiers, which are more accurate than C4.5 and CBA, for many datasets. We believe that EPs with low to medium support, such as 1% 20%, can give useful new insights and guidance to experts, in even "well understood" applications. The efficient mining of EPs is a challenging problem, since (i) the Apriori property no longer holds for EPs, and (ii) there are usually too many candidates for high dimensional databases or for small support thresholds such as 0.5%. Naive algorithms are too costly. To solve this problem, (a) we promote the description of large collections of itemsets using their concise borders (the pa...
An Aprioribased Algorithm for Mining Frequent Substructures from Graph Data
, 2000
"... This paper proposes a novel approach named AGM to efficiently mine the association rules among the frequently appearing substructures in a given graph data set. A graph transaction is represented by an adjacency matrix, and the frequent patterns appearing in the matrices are mined through the exte ..."
Abstract

Cited by 261 (7 self)
 Add to MetaCart
(Show Context)
This paper proposes a novel approach named AGM to efficiently mine the association rules among the frequently appearing substructures in a given graph data set. A graph transaction is represented by an adjacency matrix, and the frequent patterns appearing in the matrices are mined through the extended algorithm of the basket analysis. Its performance has been evaluated for the artificial simulation data and the carcinogenesis data of Oxford University and NTP. Its high efficiency has been confirmed for the size of a realworld problem. . . .
Finding Frequent Substructures in Chemical Compounds
, 1998
"... The discovery of the relationships between chemical structure and biological function is central to biological science and medicine. In this paper we apply data mining to the problem of predicting chemical carcinogenicity. This toxicology application was launched at IJCAI'97 as a research chall ..."
Abstract

Cited by 123 (9 self)
 Add to MetaCart
(Show Context)
The discovery of the relationships between chemical structure and biological function is central to biological science and medicine. In this paper we apply data mining to the problem of predicting chemical carcinogenicity. This toxicology application was launched at IJCAI'97 as a research challenge for artificial intelligence. Our approach to the problem is descriptive rather than based on classification; the goal being to find common substructures and properties in chemical compounds, and in this way to contribute to scientific insight. This approach contrasts with previous machine learning research on this problem, which has mainly concentrated on predicting the toxicity of unknown chemicals. Our contribution to the field of data mining is the ability to discover useful frequent patterns that are beyond the complexity of association rules or their known variants. This is vital to the problem, which requires the discovery of patterns that are out of the reach of simple transformations...
PincerSearch: A New Algorithm for Discovering the Maximum Frequent Set
 In 6th Intl. Conf. Extending Database Technology
, 1997
"... Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottomup breadthfirst search direction. The computation starts from f ..."
Abstract

Cited by 106 (2 self)
 Add to MetaCart
Discovering frequent itemsets is a key problem in important data mining applications, such as the discovery of association rules, strong rules, episodes, and minimal keys. Typical algorithms for solving this problem operate in a bottomup breadthfirst search direction. The computation starts from frequent 1itemsets (minimal length frequent itemsets) and continues until all maximal (length) frequent itemsets are found. During the execution, every frequent itemset is explicitly considered. Such algorithms perform reasonably well when all maximal frequent itemsets are short. However, performance drastically decreases when some of the maximal frequent itemsets are relatively long. We present a new algorithm which combines both the bottomup and topdown directions. The main search direction is still bottomup but a restricted search is conducted in the topdown direction. This search is used only for maintaining and updating a new data structure we designed, the maximum frequent candidat...
Computing Iceberg Concept Lattices with TITANIC
, 2002
"... We introduce the notion of iceberg concept lattices... ..."
Abstract

Cited by 93 (13 self)
 Add to MetaCart
We introduce the notion of iceberg concept lattices...
Freesets: a condensed representation of Boolean data for the approximation of frequency queries
 Data Mining and Knowledge Discovery
, 2003
"... Abstract. Given a large collection of transactions containing items, a basic common data mining problem is to extract the socalled frequent itemsets (i.e., sets of items appearing in at least a given number of transactions). In this paper, we propose a structure called freesets, from which we can ..."
Abstract

Cited by 91 (20 self)
 Add to MetaCart
(Show Context)
Abstract. Given a large collection of transactions containing items, a basic common data mining problem is to extract the socalled frequent itemsets (i.e., sets of items appearing in at least a given number of transactions). In this paper, we propose a structure called freesets, from which we can approximate any itemset support (i.e., the number of transactions containing the itemset) and we formalize this notion in the framework of ɛadequate representations (H. Mannila and H. Toivonen, 1996. In Proc. of the Second International Conference on Knowledge Discovery and Data Mining (KDD’96), pp. 189–194). We show that frequent freesets can be efficiently extracted using pruning strategies developed for frequent itemset discovery, and that they can be used to approximate the support of any frequent itemset. Experiments on real dense data sets show a significant reduction of the size of the output when compared with standard frequent itemset extraction. Furthermore, the experiments show that the extraction of frequent freesets is still possible when the extraction of frequent itemsets becomes intractable, and that the supports of the frequent freesets can be used to approximate very closely the supports of the frequent itemsets. Finally, we consider the effect of this approximation on association rules (a popular kind of patterns that can be derived from frequent itemsets) and show that the corresponding errors remain very low in practice.
Detecting group differences: Mining contrast sets
 Data Mining and Knowledge Discovery
, 2001
"... A fundamental task in data analysis is understanding the differences between several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mini ..."
Abstract

Cited by 84 (3 self)
 Add to MetaCart
A fundamental task in data analysis is understanding the differences between several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mining contrast sets: conjunctions of attributes and values that differ meaningfully in their distribution across groups. We provide a search algorithm for mining contrast sets with pruning rules that drastically reduce the computational complexity. Once the contrast sets are found, we postprocess the results to present a subset that are surprising to the user given what we have already shown. We explicitly control the probability of Type I error (false positives) and guarantee a maximum error rate for the entire analysis by using Bonferroni corrections.
TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies
, 1999
"... this paper, we also consider the approximate dependency inference task: given a relation r and a threshold #, find all minimal nontrivial approximate dependencies ..."
Abstract

Cited by 81 (0 self)
 Add to MetaCart
this paper, we also consider the approximate dependency inference task: given a relation r and a threshold #, find all minimal nontrivial approximate dependencies