Results 1 - 10
of
15
The geometry of ROC space: understanding machine learning metrics through ROC isometrics
- in Proceedings of the Twentieth International Conference on Machine Learning
, 2003
"... Many different metrics are used in machine learning and data mining to build and evaluate models. However, there is no general theory of machine learning metrics, that could answer questions such as: When we simultaneously want to optimise two criteria, how can or should they be traded off? Some met ..."
Abstract
-
Cited by 48 (5 self)
- Add to MetaCart
Many different metrics are used in machine learning and data mining to build and evaluate models. However, there is no general theory of machine learning metrics, that could answer questions such as: When we simultaneously want to optimise two criteria, how can or should they be traded off? Some metrics are inherently independent of class and misclassification cost distributions, while other are not — can this be made more precise? This paper provides a derivation of ROC space from first principles through 3D ROC space and the skew ratio, and redefines metrics in these dimensions. The paper demonstrates that the graphical depiction of machine learning metrics by means of ROC isometrics gives many useful insights into the characteristics of these metrics, and provides a foundation on which a theory of machine learning metrics can be built. 1.
ROC 'n' Rule Learning - Towards a Better Understanding of Covering Algorithms
- Machine Learning
, 2005
"... This paper provides an analysis of the behavior of separate-and-conquer or covering rule learning algorithms by visualizing their evaluation metrics and their dynamics in PNspace, a variant of ROC-space. Our results show that most commonly used search heuristics, including accuracy, weighted relativ ..."
Abstract
-
Cited by 39 (11 self)
- Add to MetaCart
This paper provides an analysis of the behavior of separate-and-conquer or covering rule learning algorithms by visualizing their evaluation metrics and their dynamics in PNspace, a variant of ROC-space. Our results show that most commonly used search heuristics, including accuracy, weighted relative accuracy, entropy, and Gini index, are equivalent to one of two fundamental prototypes: precision, which tries to optimize the area under the ROC curve for unknown costs, and a cost-weighted difference between covered positive and negative examples, which tries to find the optimal point under known or assumed costs. We also show that a straightforward generalization of the m-estimate trades off these two prototypes. Furthermore, our results show that stopping and filtering criteria like CN2's significance test focus on identifying significant deviations from random classification, which does not necessarily avoid overfitting. We also identify a problem with Foil's MDL-based encoding length restriction, which proves to be largely equivalent to a variable threshold on the recall of the rule. In general, we interpret these results as evidence that, contrary to common conception, pre-pruning heuristics are not very well understood and deserve more investigation.
An Analysis of Rule Evaluation Metrics
- Proceedings of the 20th International Conference on Machine Learning (ICML-03
, 2003
"... In this paper we analyze the most popular evaluation metrics for separate-and-conquer rule learning algorithms. Our results show that all commonly used heuristics, including accuracy, weighted relative accuracy, entropy, Gini index and information gain, are equivalent to one of two fundamental ..."
Abstract
-
Cited by 32 (9 self)
- Add to MetaCart
In this paper we analyze the most popular evaluation metrics for separate-and-conquer rule learning algorithms. Our results show that all commonly used heuristics, including accuracy, weighted relative accuracy, entropy, Gini index and information gain, are equivalent to one of two fundamental prototypes: precision, which tries to optimize the area under the ROC curve for unknown costs, and a cost-weighted difference between covered positive and negative examples, which tries to find the optimal point under known or assumed costs. We also show that a straightforward generalization of the m-estimate trades off these two prototypes.
Propositionalization-based relational subgroup discovery with RSD
- Machine Learning
, 2006
"... Abstract Relational rule learning algorithms are typically designed to construct classification and prediction rules. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through ap ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Abstract Relational rule learning algorithms are typically designed to construct classification and prediction rules. However, relational rule learning can be adapted also to subgroup discovery. This paper proposes a propositionalization approach to relational subgroup discovery, achieved through appropriately adapting rule learning and first-order feature construction. The proposed approach was successfully applied to standard ILP problems (East-West trains, King-Rook-King chess endgame and mutagenicity prediction) and two real-life problems (analysis of telephone calls and traffic accident analysis).
Decision support through subgroup discovery: Three case studies and the lessons learned
- Machine Learning
"... Abstract. This paper presents ways to use subgroup discovery to generate actionable knowledge for decision support. Actionable knowledge is explicit symbolic knowledge, typically presented in the form of rules, that allows the decision maker to recognize some important relations and to perform an ap ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Abstract. This paper presents ways to use subgroup discovery to generate actionable knowledge for decision support. Actionable knowledge is explicit symbolic knowledge, typically presented in the form of rules, that allows the decision maker to recognize some important relations and to perform an appropriate action, such as targeting a direct marketing campaign, or planning a population screening campaign aimed at detecting individuals with high disease risk. Different subgroup discovery approaches are outlined, and their advantages over using standard classification rule learning are discussed. Three case studies, a medical and two marketing ones, are used to present the lessons learned in solving problems requiring actionable knowledge generation for decision support. Keywords: data mining, subgroup discovery, decision support, actionability, lessons learned
Profiling Examiners using Intelligent Subgroup Mining
- In Proc. 10th Intl. Workshop on Intelligent Data Analysis in Medicine and Pharmacology (IDAMAP-2005
, 2005
"... The demand for effective knowledge discovery methods in a clinical setting is growing: the number of hospital information systems and medical documentation systems in routine-use increases rapidly. Then, often high-quality collections of electronic patient records are available for statistical analy ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
The demand for effective knowledge discovery methods in a clinical setting is growing: the number of hospital information systems and medical documentation systems in routine-use increases rapidly. Then, often high-quality collections of electronic patient records are available for statistical analysis. One interesting issue concerns the quality of the examinations records which depends both on the examination quality and the documentation habits of the individual examiners. We apply a subgroup mining approach for explorative and descriptive data mining to tackle this issue, and we provide a case study of the proposed approach using data from a fielded system in the medical domain. Purely automatic data mining methods often suffer from the limitation that too many uninteresting results are presented to the user. In order to improve upon this situation, we propose two strategies: we use background knowledge, if available, and provide suitable visualizations for guiding the discovery process. The context of the presented approach is a knowledge-based documentation and consultation system. 1
Closed Sets for Labeled Data ⋆
"... Abstract. Closed sets are being successfully applied in the context of compacted data representation for association rule learning. However, their use is mainly descriptive. This paper shows that, when considering labeled data, closed sets can be adapted for prediction and discrimination purposes by ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Abstract. Closed sets are being successfully applied in the context of compacted data representation for association rule learning. However, their use is mainly descriptive. This paper shows that, when considering labeled data, closed sets can be adapted for prediction and discrimination purposes by conveniently contrasting covering properties on positive and negative examples. We formally justify that these sets characterize the space of relevant combinations of features for discriminating the target class. In practice, identifying relevant/irrelevant combinations of features through closed sets is useful in many applications. Here we apply it to compacting emerging patterns and essential rules and to learn descriptions for subgroup discovery. 1
Quality Measures for Semi-Automatic Learning of Simple Diagnostic Rule Bases
- In Proceedings of the 15th International Conference on Applications of Declarative Programming and Knowledge Management (INAP
, 2004
"... Semi-automatic data mining approaches often yield better results than plain automatic methods, due to the early integration of the user's goals. For example in the medical domain, experts are likely to favor simpler models instead of more complex models. Then, the accuracy of discovered patterns ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Semi-automatic data mining approaches often yield better results than plain automatic methods, due to the early integration of the user's goals. For example in the medical domain, experts are likely to favor simpler models instead of more complex models. Then, the accuracy of discovered patterns is often not the only criterion to consider. Instead, the simplicity of the discovered knowledge is of prime importance, since this directly relates to the understandability and the interpretability of the learned knowledge.
From local pattern mining to relevant bi-cluster characterization
- In Proceedings IDA’05, volume 3646 of LNCS
, 2005
"... Abstract. Clustering or bi-clustering techniques have been proved quite useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. We consider eventually large Boolean data sets which record properties of objects and we assume that a bi ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. Clustering or bi-clustering techniques have been proved quite useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. We consider eventually large Boolean data sets which record properties of objects and we assume that a bi-partition is available. We introduce a generic cluster characterization technique which is based on collections of bi-sets (i.e., sets of objects associated to sets of properties) which satisfy some userdefined constraints, and a measure of the accuracy of a given bi-set as a bi-cluster characterization pattern. The method is illustrated on both formal concepts (i.e., “maximal rectangles of true values”) and the new type of δ-bi-sets (i.e., “rectangles of true values with a bounded number of exceptions per column”). The added-value is illustrated on benchmark data and two real data sets which are intrinsically noisy: a medical data about meningitis and Plasmodium falciparum gene expression data. 1
Supporting bi-cluster interpretation in 0/1 data by means of local patterns
, 2006
"... Clustering or co-clustering techniques have been proved useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. As a result, interpreting clustering results and discovering knowledge from them can be quite hard. We consider potential ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Clustering or co-clustering techniques have been proved useful in many application domains. A weakness of these techniques remains the poor support for grouping characterization. As a result, interpreting clustering results and discovering knowledge from them can be quite hard. We consider potentially large Boolean data sets which record properties of objects and we assume the availability of a bi-partition which has to be characterized by means of a symbolic description. Our generic approach exploits collections of local patterns which satisfy some user-defined constraints in the data, and a measure of the accuracy of a given local pattern as a bi-cluster characterization pattern. We consider local patterns which are bi-sets, i.e., sets of objects associated to sets of properties. Two concrete examples are formal concepts (i.e., associated closed sets) and the so-called δ-bi-sets (i.e., an extension of formal concepts towards faulttolerance). We introduce the idea of characterizing query which can be used by experts to support knowledge discovery from bi-partitions thanks to available local patterns. The added-value is illustrated on benchmark data and three real data sets: a medical data set and two gene expression data sets. 1

