Results 1 -
9 of
9
Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions
- In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining
, 1997
"... Applications of inductive learning algorithms to realworld data mining problems have shown repeatedly that using accuracy to compare classifiers is not adequate because the underlying assumptions rarely hold. We present a method for the comparison of classifier performance that is robust to imprecis ..."
Abstract
-
Cited by 225 (13 self)
- Add to MetaCart
Applications of inductive learning algorithms to realworld data mining problems have shown repeatedly that using accuracy to compare classifiers is not adequate because the underlying assumptions rarely hold. We present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses. Introduction When mining data with inductive methods, we often experiment with a wide variety of learning algorithms, using different algorithm parameters, varying output threshold values, and using different training regimens. Such experimentation yields a large number of classifiers to be evaluated a...
Robust Classification for Imprecise Environments
, 1989
"... In real-world environments it is usually difficult to specify target operating conditions precisely. This uncertainty makes building robust classification systems problematic. We present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclas ..."
Abstract
-
Cited by 209 (12 self)
- Add to MetaCart
In real-world environments it is usually difficult to specify target operating conditions precisely. This uncertainty makes building robust classification systems problematic. We present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses. We then show that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions. This robust performance extends across a wide variety of comparison frameworks, including the optimization of metrics such as accuracy, expected cost, lift, precision, recall, and ...
ROC Graphs: Notes and Practical Considerations for Researchers
, 2004
"... Receiver Operating Characteristics (ROC) graphs are a useful technique for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been increasingly adopted in the machine learning and data mining research communitie ..."
Abstract
-
Cited by 150 (1 self)
- Add to MetaCart
Receiver Operating Characteristics (ROC) graphs are a useful technique for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been increasingly adopted in the machine learning and data mining research communities. Although ROC graphs are apparently simple, there are some common misconceptions and pitfalls when using them in practice. This article serves both as a tutorial introduction to ROC graphs and as a practical guide for using them in research.
ROC graphs: Notes and practical considerations for data mining researchers
, 2003
"... Receiver Operating Characteristics (ROC) graphs are a useful technique for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been increasingly adopted in the machine learning and data mining research communitie ..."
Abstract
-
Cited by 122 (0 self)
- Add to MetaCart
Receiver Operating Characteristics (ROC) graphs are a useful technique for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been increasingly adopted in the machine learning and data mining research communities. Although ROC graphs are apparently simple, there are some common misconceptions and pitfalls when using them in practice. This article serves both as a tutorial introduction to ROC graphs and as a practical guide for using them in research. Keywords: 1
Network-based marketing: Identifying likely adopters via consumer networks
- Statistical Science
"... Abstract. Network-based marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on su ..."
Abstract
-
Cited by 48 (10 self)
- Add to MetaCart
Abstract. Network-based marketing refers to a collection of marketing techniques that take advantage of links between consumers to increase sales. We concentrate on the consumer networks formed using direct interactions (e.g., communications) between consumers. We survey the diverse literature on such marketing with an emphasis on the statistical methods used and the data to which these methods have been applied. We also provide a discussion of challenges and opportunities for this burgeoning research topic. Our survey highlights a gap in the literature. Because of inadequate data, prior studies have not been able to provide direct, statistical support for the hypothesis that network linkage can directly affect product/service adoption. Using a new data set that represents the adoption of a new telecommunications service, we show very strong support for the hypothesis. Specifically, we show three main results: (1) “Network neighbors”—those consumers linked to a prior customer—adopt the service at a rate 3–5 times greater than baseline groups selected by the best practices of the firm’s marketing team. In addition, analyzing the network allows the firm to acquire new customers who otherwise would have fallen through the cracks, because they would not have been identified based on traditional attributes. (2) Statistical models, built with a very large amount of geographic, demographic and prior purchase data, are significantly and substantially improved by including network information. (3) More detailed network information allows the ranking of the network neighbors so as to permit the selection of small sets of individuals with very high probabilities of adoption. Key words and phrases: Viral marketing, word of mouth, targeted marketing, network analysis, classification, statistical relational learning. 1.
Rule-Space Search for Knowledge-Based Discovery
- CIIO Working Paper IS 99-012, Stern School of Business
, 1999
"... Because the knowledge discovery process is ill-defined, iterative, and requires intense interaction, algorithm flexibility is crucial. In this paper, we present a straighforward, heuristic generate-and-test search algorithm for knowledge discovery. An analysis of the literature shows that this basic ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Because the knowledge discovery process is ill-defined, iterative, and requires intense interaction, algorithm flexibility is crucial. In this paper, we present a straighforward, heuristic generate-and-test search algorithm for knowledge discovery. An analysis of the literature shows that this basic algorithm underlies many of the systems that have had practical success in data mining and knowledge discovery over the past twenty years. We argue that this search algorithm has persevered because it is flexible and well behaved as background knowledge is introduced in various forms - exactly what is needed to support the ill-defined knowledge discovery process.
Learning to live with false alarms
- In Proceedings of the KDD Data Mining Methods for Anomaly Detection workshop
, 2005
"... Anomalies are rare events. For anomaly detection, severe class imbalance is the norm. Although there has been much research into imbalanced classes, there are surprisingly few examples of dealing with severe imbalance. Alternative performance measures have superseded error rate, or accuracy, for alg ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Anomalies are rare events. For anomaly detection, severe class imbalance is the norm. Although there has been much research into imbalanced classes, there are surprisingly few examples of dealing with severe imbalance. Alternative performance measures have superseded error rate, or accuracy, for algorithm comparison. But whatever their other merits, they tend to obscure the severe imbalance problem. We use the relative cost reduction of a classifier over a trivial classifier that chooses the less costly class. We show that for applications that are inherently noisy there is a limit to the cost reduction achievable. Even a Bayes optimal classifier has a vanishingly small reduction in costs as imbalance increases. If events are rare and not too costly, the unpalatable conclusion is that our learning algorithms can do little. If the events have a higher cost then a large number of false alarms must be tolerated, even if the end user finds that undesirable. 1.
Intelligent Enterprise Technologies Laboratory
, 2003
"... We present a means to represent utility, the measure of goodness of a possible deal. This representation includes a number of features necessary to represent complex requirements, such as time dependence, explicit combinations of terms, and cross dependences. The formulation is closely tied to t ..."
Abstract
- Add to MetaCart
We present a means to represent utility, the measure of goodness of a possible deal. This representation includes a number of features necessary to represent complex requirements, such as time dependence, explicit combinations of terms, and cross dependences. The formulation is closely tied to the form used to represent contracts, which makes it useful for automated negotiation software.

