Results 1 
6 of
6
Selecting the Right Interestingness Measure for Association Patterns
, 2002
"... Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestinghess of assoc ..."
Abstract

Cited by 189 (9 self)
 Add to MetaCart
Many techniques for association rule mining and feature selection require a suitable metric to capture the dependencies among variables in a data set. For example, metrics such as support, confidence, lift, correlation, and collective strength are often used to determine the interestinghess of association patterns. However, many such measures provide conflicting information about the interestinghess of a pattern, and the best metric to use for a given application domain is rarely known. In this paper, we present an overview of various measures proposed in the statistics, machine learning and data mining literature. We describe several key properties one should examine in order to select the right measure for a given application domain. A comparative study of these properties is made using twenty one of the existing measures. We show that each measure has different properties which make them useful for some application domains, but not for others. We also present two scenarios in which most of the existing measures agree with each other, namely, supportbased pruning and table standardization. Finally, we present an algorithm to select a small set of tables such that an expert can select a desirable measure by looking at just this small set of tables.
Selecting the right objective measure for association analysis
 Information Systems
"... Abstract. Objective measures such as support, confidence, interest factor, correlation, and entropy are often used to evaluate the interestingness of association patterns. However, in many situations, these measures may provide conflicting information about the interestingness of a pattern. Data min ..."
Abstract

Cited by 61 (6 self)
 Add to MetaCart
Abstract. Objective measures such as support, confidence, interest factor, correlation, and entropy are often used to evaluate the interestingness of association patterns. However, in many situations, these measures may provide conflicting information about the interestingness of a pattern. Data mining practitioners also tend to apply an objective measure without realizing that there may be better alternatives available for their application. In this paper, we describe several key properties one should examine in order to select the right measure for a given application. A comparative study of these properties is made using twentyone measures that were originally developed in diverse fields such as statistics, social science, machine learning, and data mining. We show that depending on its properties, each measure is useful for some application, but not for others. We also demonstrate two scenarios in which many existing measures become consistent with each other, namely, when supportbased pruning and a technique known as table standardization are applied. Finally, we present an algorithm for selecting a small set of patterns such that domain experts can find a measure that best fits their requirements by ranking this small set of patterns. 1
Heuristics for Ranking the Interestingness of Discovered Knowledge
 Proceedings of the Third PacificAsia Conference on Knowledge Discovery and Data Mining (PAKDD'99
, 1999
"... We describe heuristics, based upon information theory and statistics, for ranking the interestingness of summaries generated from databases. The tuples in a summary are unique, and therefore, can be considered to be a population described by some probability distribution. ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
We describe heuristics, based upon information theory and statistics, for ranking the interestingness of summaries generated from databases. The tuples in a summary are unique, and therefore, can be considered to be a population described by some probability distribution.
Data Mining in Large Databases Using Domain Generalization Graphs
 Journal of Intelligent Information Systems
, 1999
"... Attributeoriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to userdefined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of at ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
Attributeoriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to userdefined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the MultiAttribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generateandtest approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.
Heuristic Measures of Interestingness
 Proceedings of the Third European Conference on the Principles of Data Mining and Knowledge Discovery (PKDD'99
, 1999
"... The tuples in a generalized relation (i.e., a summary generated from a database) are unique, and therefore, can be considered to be a population with a structure that can be described by some probability distribution. In this paper, we present and empirically compare sixteen heuristic measures t ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
The tuples in a generalized relation (i.e., a summary generated from a database) are unique, and therefore, can be considered to be a population with a structure that can be described by some probability distribution. In this paper, we present and empirically compare sixteen heuristic measures that evaluate the structure of a summary to assign a single realvalued index that represents its interestingness relative to other summaries generated from the same database. The heuristics are based upon wellknown measures of diversity, dispersion, dominance, and inequality used in several areas of the physical, social, ecological, management, information, and computer sciences. Their use for ranking summaries generated from databases is a new application area. All sixteen heuristics rank less complex summaries (i.e., those with few tuples and/or few nonANY attributes) as most interesting. We demonstrate that for sample data sets, the order in which some of the measures rank summaries is highly correlated.
A study on interestingness measures for associative classifiers
, 2009
"... Associative classification is a rulebased approach to classify data relying on association rule mining by discovering associations between a set of features and a class label. Support and confidence are the defacto “interestingness measures” used for discovering relevant association rules. The sup ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Associative classification is a rulebased approach to classify data relying on association rule mining by discovering associations between a set of features and a class label. Support and confidence are the defacto “interestingness measures” used for discovering relevant association rules. The supportconfidence framework has also been used in most, if not all, associative classifiers. Although support and confidence are appropriate measures for building a strong model in many cases, they are still not the ideal measures and other measures could be better suited. There are many other rule interestingness measures already used in machine learning, data mining and statistics. This work focuses on using 53 different objective measures for associative classification rules. A wide range of UCI datasets are used to study the impact of different “interestingness measures ” on different phases of associative classifiers based on the number of rules generated and the accuracy obtained. The results show that there are interestingness measures that can significantly reduce the number of rules for almost all datasets while the accuracy of the model is hardly jeopardized or even improved. However, no single measure can be introduced as an obvious winner.