Results 1  10
of
96
Constraintbased rule mining in large, dense databases
, 1999
"... Constraintbased rule miners find all rules in a given dataset meeting userspecified constraints such as minimum support and confidence. We describe a new algorithm that directly exploits all userspecified constraints including minimum support, minimum confidence, and a new constraint that ensures ..."
Abstract

Cited by 151 (3 self)
 Add to MetaCart
Constraintbased rule miners find all rules in a given dataset meeting userspecified constraints such as minimum support and confidence. We describe a new algorithm that directly exploits all userspecified constraints including minimum support, minimum confidence, and a new constraint that ensures every mined rule offers a predictive advantage over any of its simplifications. Our algorithm maintains efficiency even at low supports on data that is dense (e.g. relational data). Previous approaches such as Apriori and its variants exploit only the minimum support constraint, and as a result are ineffective on dense data due to a combinatorial explosion of “frequent itemsets”.
Mining the Most Interesting Rules
, 1999
"... Several algorithms have been proposed for finding the “best, ” “optimal,” or “most interesting ” rule(s) in a database according to a variety of metrics including confidence, support, gain, chisquared value, gini, entropy gain, laplace, lift, and conviction. In this paper, we show that the best rul ..."
Abstract

Cited by 124 (1 self)
 Add to MetaCart
Several algorithms have been proposed for finding the “best, ” “optimal,” or “most interesting ” rule(s) in a database according to a variety of metrics including confidence, support, gain, chisquared value, gini, entropy gain, laplace, lift, and conviction. In this paper, we show that the best rule according to any of these metrics must reside along a support/confidence border. Further, in the case of conjunctive rule mining within categorical data, the number of rules along this border is conveniently small, and can be mined efficiently from a variety of realworld datasets. We also show how this concept can be generalized to mine all rules that are best according to any of these criteria with respect to an arbitrary subset of the population of interest. We argue that by returning a broader set of rules than previous algorithms, our techniques allow for improved insight into the data and support more userinteraction in the optimized rulemining process. 1.
How to Summarize the Universe: Dynamic Maintenance of Quantiles
 In VLDB
, 2002
"... Order statistics, i.e., quantiles, are frequently used in databases both at the database server as well as the application level. For example, they are useful in selectivity estimation during query optimization, in partitioning large relations, in estimating query result sizes when building us ..."
Abstract

Cited by 104 (13 self)
 Add to MetaCart
Order statistics, i.e., quantiles, are frequently used in databases both at the database server as well as the application level. For example, they are useful in selectivity estimation during query optimization, in partitioning large relations, in estimating query result sizes when building user interfaces, and in characterizing the data distribution of evolving datasets in the process of data mining.
Detecting group differences: Mining contrast sets
 Data Mining and Knowledge Discovery
, 2001
"... A fundamental task in data analysis is understanding the differences between several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mini ..."
Abstract

Cited by 78 (3 self)
 Add to MetaCart
A fundamental task in data analysis is understanding the differences between several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mining contrast sets: conjunctions of attributes and values that differ meaningfully in their distribution across groups. We provide a search algorithm for mining contrast sets with pruning rules that drastically reduce the computational complexity. Once the contrast sets are found, we postprocess the results to present a subset that are surprising to the user given what we have already shown. We explicitly control the probability of Type I error (false positives) and guarantee a maximum error rate for the entire analysis by using Bonferroni corrections.
Detecting Change in Categorical Data: Mining Contrast Sets
 In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining
, 1999
"... A fundamental task in data analysis is understanding the differences between several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 versus 1998. We present the problem of mining ..."
Abstract

Cited by 64 (5 self)
 Add to MetaCart
A fundamental task in data analysis is understanding the differences between several contrasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 versus 1998. We present the problem of mining contrastsets: conjunctions of attributes and values that differ meaningfully in their distribution across groups. We provide an algorithm for mining contrastsets as well as several pruning rules to reduce the computational complexity. Once the deviations are found, we postprocess the results to present a subset that are surprising to the user given what we have already shown. We explicitly control the probability of Type I error (false positives) and guarantee a maximum error rate for the entire analysis by using Bonferroni corrections. 1 Introduction A common question in exploratory research is: "How do several contrasting groups differ?" Learning about group differences is a central ...
Efficient Search for Association Rules
, 2000
"... This paper argues that for some applications direct search for association rules can be more efficient than the two stage process of the Apriori algorithm which first finds large itemsets which are then used to identify associations. In particular, it is argued, Apriori can impose large computationa ..."
Abstract

Cited by 59 (10 self)
 Add to MetaCart
This paper argues that for some applications direct search for association rules can be more efficient than the two stage process of the Apriori algorithm which first finds large itemsets which are then used to identify associations. In particular, it is argued, Apriori can impose large computational overheads when the number of frequent itemsets is very large. This will often be the case when association rule analysis is performed on domains other than basket analysis or when it is performed for basket analysis with basket information augmented by other customer information. An algorithm is presented that is computationally efficient for association rule analyses during which the number of rules to be found can be constrained and all data can be maintained in memory.
Ontology Learning
 HANDBOOK ON ONTOLOGIES
"... ... we show in this paper some exemplary techniques in the ontology learning cycle that we have implemented in our ontology learning environment, KAON TextToOnto. ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
... we show in this paper some exemplary techniques in the ontology learning cycle that we have implemented in our ontology learning environment, KAON TextToOnto.
Discovering significant patterns
, 2007
"... Pattern discovery techniques, such as association rule discovery, explore large search spaces of potential patterns to find those that satisfy some userspecified constraints. Due to the large number of patterns considered, they suffer from an extreme risk of type1 error, that is, of finding patter ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
Pattern discovery techniques, such as association rule discovery, explore large search spaces of potential patterns to find those that satisfy some userspecified constraints. Due to the large number of patterns considered, they suffer from an extreme risk of type1 error, that is, of finding patterns that appear due to chance alone to satisfy the constraints on the sample data. This paper proposes techniques to overcome this problem by applying wellestablished statistical practices. These allow the user to enforce a strict upper limit on the risk of experimentwise error. Empirical studies demonstrate that standard pattern discovery techniques can discover numerous spurious patterns when applied to random data and when applied to realworld data result in large numbers of patterns that are rejected when subjected to sound statistical evaluation. They also reveal that a number of pragmatic choices about how such tests are performed can greatly affect their power.
Java Programming for HighPerformance Numerical Computing
, 2000
"... Class Figure 5 Simple Array construction operations //Simple 3 x 3 array of integers intArray2D A = new intArray2D(3,3); //This new array has a copy of the data in A, //and the same rank and shape. ..."
Abstract

Cited by 40 (8 self)
 Add to MetaCart
Class Figure 5 Simple Array construction operations //Simple 3 x 3 array of integers intArray2D A = new intArray2D(3,3); //This new array has a copy of the data in A, //and the same rank and shape.
On the Complexity of Generating Maximal Frequent and Minimal Infrequent Sets
, 2002
"... Let A be an mn binary matrix, t . . . , m} be a threshold, and # > 0 be a positive parameter. We show that given a family of O(n ) maximal tfrequent column sets for A, it is NPcomplete to decide whether A has any further maximal tfrequent sets, or not, even when the number of such addit ..."
Abstract

Cited by 39 (9 self)
 Add to MetaCart
Let A be an mn binary matrix, t . . . , m} be a threshold, and # > 0 be a positive parameter. We show that given a family of O(n ) maximal tfrequent column sets for A, it is NPcomplete to decide whether A has any further maximal tfrequent sets, or not, even when the number of such additional maximal tfrequent column sets may be exponentially large. In contrast, all minimal tinfrequent sets of columns of A can be enumerated in incremental quasipolynomial time. The proof of the latter result follows from the inequality # t + 1)#, where # and # are respectively the numbers of all maximal tfrequent and all minimal tinfrequent sets of columns of the matrix A. We also discuss the complexity of generating all closed tfrequent column sets for a given binary matrix.