Results 1 - 10
of
67
Generating Non-Redundant Association Rules
, 2000
"... The traditional association rule mining framework produces many redundant rules. The extent of redundancy is a lot larger than previously suspected. We present a new framework for associations based on the novel concept of closed frequent itemsets. The number of non-redundant rules produced by the n ..."
Abstract
-
Cited by 159 (10 self)
- Add to MetaCart
The traditional association rule mining framework produces many redundant rules. The extent of redundancy is a lot larger than previously suspected. We present a new framework for associations based on the novel concept of closed frequent itemsets. The number of non-redundant rules produced by the new approach is exponentially (in the length of the longest frequent itemset) smaller than the rule set from the traditional approach. Experiments using several "hard" real and synthetic databases confirm the utility of our framework in terms of reduction in the number of rules presented to the user, and in terms of time.
Selecting the Right Interestingness Measure for Association Patterns
, 2002
"... Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestinghess of assoc ..."
Abstract
-
Cited by 143 (6 self)
- Add to MetaCart
Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestinghess of association patterns. However, many such measures provide conflicting information about the interestinghess of a pattern, and the best metric to use for a given application domain is rarely known. In this paper, we present an overview of various measures proposed in the statistics, machine learning and data mining literature. We describe several key properties one should examine in order to select the right measure for a given application domain. A comparative study of these properties is made using twenty one of the existing measures. We show that each measure has different properties which make them useful for some application domains, but not for others. We also present two scenarios in which most of the existing measures agree with each other, namely, support-based pruning and table standardization. Finally, we present an algorithm to select a small set of tables such that an expert can select a desirable measure by looking at just this small set of tables.
Detecting group differences: Mining contrast sets
- Data Mining and Knowledge Discovery
, 2001
"... A fundamental task in data analysis is understanding the differences between several con-trasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mini ..."
Abstract
-
Cited by 61 (3 self)
- Add to MetaCart
A fundamental task in data analysis is understanding the differences between several con-trasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mining contrast sets: conjunctions of attributes and values that differ meaningfully in their distribution across groups. We provide a search algorithm for mining contrast sets with pruning rules that drastically reduce the computational complexity. Once the contrast sets are found, we post-process the results to present a subset that are surprising to the user given what we have already shown. We explicitly control the probability of Type I error (false positives) and guarantee a maximum error rate for the entire analysis by using Bonferroni corrections.
Interestingness Measures for Association Patterns: A Perspective
, 2000
"... Department of Computer Science, University of Minnesota, ..."
Abstract
-
Cited by 41 (1 self)
- Add to MetaCart
Department of Computer Science, University of Minnesota,
Understanding the crucial role of attribute interaction in data mining
- Artif. Intel. Rev
, 2001
"... This is a review paper, whose goal is to significantly improve our understanding of the crucial role of attribute interaction in data mining. The main contributions of this paper are as follows. Firstly, we show that the concept of attribute interaction has a crucial role across different kinds of p ..."
Abstract
-
Cited by 39 (11 self)
- Add to MetaCart
This is a review paper, whose goal is to significantly improve our understanding of the crucial role of attribute interaction in data mining. The main contributions of this paper are as follows. Firstly, we show that the concept of attribute interaction has a crucial role across different kinds of problem in data mining, such as attribute construction, coping with small disjuncts, induction of first-order logic rules, detection of Simpson’s paradox, and finding several types of interesting rules. Hence, a better understanding of attribute interaction can lead to a better understanding of the relationship between these kinds of problems, which are usually studied separately from each other. Secondly, we draw attention to the fact that most rule induction algorithms are based on a greedy search which does not cope well with the problem of attribute interaction, and point out some alternative kinds of rule discovery methods which tend to cope better with this problem. Thirdly, we discussed several algorithms and methods for discovering interesting knowledge that, implicitly or explicitly, are based on the concept of attribute interaction.
Analyzing the Subjective Interestingness of Association Rules
, 2000
"... Association rules are a class of important regularities in databases. They are found to be very useful in practical applications. However, association rule mining algorithms tend to produce a huge number of rules, most of which are of no interest to the user. Due to the large number of rules, it ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
Association rules are a class of important regularities in databases. They are found to be very useful in practical applications. However, association rule mining algorithms tend to produce a huge number of rules, most of which are of no interest to the user. Due to the large number of rules, it is very difficult for the user to analyze them manually in order to identify those truly interesting ones. In this paper, we propose a new approach to assist the user in finding interesting rules (in particular, unexpected rules) from a set of discovered association rules. This technique is characterized by analyzing the discovered association rules using the user's existing knowledge about the domain and then ranking the discovered rules according to various interestingness criteria, e.g., conformity and various types of unexpectedness. This technique has been implemented and successfully used in a number of applications. Keywords: subjective interestingness, association rules, interestingness analysis in data mining. 1.
Discovering significant patterns
, 2007
"... Pattern discovery techniques, such as association rule discovery, explore large search spaces of potential patterns to find those that satisfy some user-specified constraints. Due to the large number of patterns considered, they suffer from an extreme risk of type-1 error, that is, of finding patter ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
Pattern discovery techniques, such as association rule discovery, explore large search spaces of potential patterns to find those that satisfy some user-specified constraints. Due to the large number of patterns considered, they suffer from an extreme risk of type-1 error, that is, of finding patterns that appear due to chance alone to satisfy the constraints on the sample data. This paper proposes techniques to overcome this problem by applying well-established statistical practices. These allow the user to enforce a strict upper limit on the risk of experimentwise error. Empirical studies demonstrate that standard pattern discovery techniques can discover numerous spurious patterns when applied to random data and when applied to real-world data result in large numbers of patterns that are rejected when subjected to sound statistical evaluation. They also reveal that a number of pragmatic choices about how such tests are performed can greatly affect their power.
Assessing data mining results via swap randomization
- ACM Transactions on Knowledge Discovery from Data
"... The problem of assessing the significance of data mining results on high-dimensional 0–1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by, e.g., chi-square tests, or many other methods. H ..."
Abstract
-
Cited by 22 (6 self)
- Add to MetaCart
The problem of assessing the significance of data mining results on high-dimensional 0–1 data sets has been studied extensively in the literature. For problems such as mining frequent sets and finding correlations, significance testing can be done by, e.g., chi-square tests, or many other methods. However, the results of such tests depend only on the specific attributes and not on the dataset as a whole. Moreover, the tests are more difficult to apply to sets of patterns or other complex results of data mining. In this paper, we consider a simple randomization technique that deals with this shortcoming. The approach consists of producing random datasets that have the same row and column margins with the given dataset, computing the results of interest on the randomized instances, and comparing them against the results on the actual data. This randomization technique can be used to assess the results of many different types of data mining algorithms, such as frequent sets, clustering, and rankings. To generate random datasets with given margins, we use variations of a Markov chain approach, which is based on a simple swap operation. We give theoretical results on the efficiency of different randomization methods, and apply the swap randomization method to several wellknown datasets. Our results indicate that for some datasets the structure discovered by the data mining algorithms is a random artifact, while for other datasets the discovered structure conveys meaningful information.
Mining top-k covering rule groups for gene expression data
- In the 24th ACM SIGMOD International Conference on Management of Data
, 2005
"... ..."

