Results 1 - 10
of
126
Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions
- In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining
, 1997
"... Applications of inductive learning algorithms to realworld data mining problems have shown repeatedly that using accuracy to compare classifiers is not adequate because the underlying assumptions rarely hold. We present a method for the comparison of classifier performance that is robust to imprecis ..."
Abstract
-
Cited by 225 (13 self)
- Add to MetaCart
Applications of inductive learning algorithms to realworld data mining problems have shown repeatedly that using accuracy to compare classifiers is not adequate because the underlying assumptions rarely hold. We present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses. Introduction When mining data with inductive methods, we often experiment with a wide variety of learning algorithms, using different algorithm parameters, varying output threshold values, and using different training regimens. Such experimentation yields a large number of classifiers to be evaluated a...
Robust Classification for Imprecise Environments
, 1989
"... In real-world environments it is usually difficult to specify target operating conditions precisely. This uncertainty makes building robust classification systems problematic. We present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclas ..."
Abstract
-
Cited by 209 (12 self)
- Add to MetaCart
In real-world environments it is usually difficult to specify target operating conditions precisely. This uncertainty makes building robust classification systems problematic. We present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses. We then show that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions. This robust performance extends across a wide variety of comparison frameworks, including the optimization of metrics such as accuracy, expected cost, lift, precision, recall, and ...
Heterogeneous Uncertainty Sampling for Supervised Learning
- In Proceedings of the Eleventh International Conference on Machine Learning
, 1994
"... Uncertainty sampling methods iteratively request class labels for training instances whose classes are uncertain despite the previous labeled instances. These methods can greatly reduce the number of instances that an expert need label. One problem with this approach is that the classifier best suit ..."
Abstract
-
Cited by 194 (3 self)
- Add to MetaCart
Uncertainty sampling methods iteratively request class labels for training instances whose classes are uncertain despite the previous labeled instances. These methods can greatly reduce the number of instances that an expert need label. One problem with this approach is that the classifier best suited for an application may be too expensive to train or use during the selection of instances. We test the use of one classifier (a highly efficient probabilistic one) to select examples for training another (the C4.5 rule induction program). Despite being chosen by this heterogeneous approach, the uncertainty samples yielded classifiers with lower error rates than random samples ten times larger. 1 Introduction Machine learning algorithms have been used to build classification rules from data sets consisting of hundreds of thousands of instances [4]. In some applications unlabeled training instances are abundant but the cost of labeling an instance with its class is high. In the informatio...
The DET curve in assessment of detection task performance
, 1997
"... We introduce the DET Curve as a means of representing performance on detection tasks that involve a tradeoff of error types. We discuss why we prefer it to the traditional ROC Curve and offer several examples of its use in speaker recognition and language recognition. We explain why it is likely to ..."
Abstract
-
Cited by 183 (4 self)
- Add to MetaCart
We introduce the DET Curve as a means of representing performance on detection tasks that involve a tradeoff of error types. We discuss why we prefer it to the traditional ROC Curve and offer several examples of its use in speaker recognition and language recognition. We explain why it is likely to produce approximately linear curves. We also note special points that may be included on these curves, how they are used with multiple targets, and possible further applications.
ROC graphs: Notes and practical considerations for data mining researchers
, 2003
"... Receiver Operating Characteristics (ROC) graphs are a useful technique for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been increasingly adopted in the machine learning and data mining research communitie ..."
Abstract
-
Cited by 122 (0 self)
- Add to MetaCart
Receiver Operating Characteristics (ROC) graphs are a useful technique for organizing classifiers and visualizing their performance. ROC graphs are commonly used in medical decision making, and in recent years have been increasingly adopted in the machine learning and data mining research communities. Although ROC graphs are apparently simple, there are some common misconceptions and pitfalls when using them in practice. This article serves both as a tutorial introduction to ROC graphs and as a practical guide for using them in research. Keywords: 1
Evaluating intrusion detection systems: The 1998 darpa off-line intrusion detection evaluation
- in Proceedings of the 2000 DARPA Information Survivability Conference and Exposition
, 2000
"... A intrusion detection evaluation test bed was developed which generated normal traffic similar to that on a government site containing 100’s of users on 1000’s of hosts. More than 300 instances of 38 different automated attacks were launched against victim UNIX hosts in seven weeks of training data ..."
Abstract
-
Cited by 101 (2 self)
- Add to MetaCart
A intrusion detection evaluation test bed was developed which generated normal traffic similar to that on a government site containing 100’s of users on 1000’s of hosts. More than 300 instances of 38 different automated attacks were launched against victim UNIX hosts in seven weeks of training data and two weeks of test data. Six research groups participated in a blind evaluation and results were analyzed for probe, denialof-service (DoS), remote-to-local (R2L), and user to root (U2R) attacks. The best systems detected old attacks included in the training data, at moderate detection rates ranging from 63 % to 93 % at a false alarm rate of 10 false alarms per day. Detection rates were much worse for new and novel R2L and DoS attacks included only in the test data. The best systems failed to detect roughly half these new attacks which included damaging access to root-level privileges by remote users. These results suggest that further research should focus on developing techniques to find new attacks instead of extending existing rule-based approaches. 1.
Measuring the Effects of Internet Path Faults on Reactive Routing
- in Proc. ACM SIGMETRICS
, 2003
"... Empirical evidence suggests that reactive routing systems improve resilience to Internet path failures. They detect and route around faulty paths based on measurements of path performance. This paper seeks to understand why and under what circumstances these techniques are effective. To do so, this ..."
Abstract
-
Cited by 76 (13 self)
- Add to MetaCart
Empirical evidence suggests that reactive routing systems improve resilience to Internet path failures. They detect and route around faulty paths based on measurements of path performance. This paper seeks to understand why and under what circumstances these techniques are effective. To do so, this paper correlates end-to-end active probing experiments, loss-triggered traceroutes of Internet paths, and BGP routing messages. These correlations shed light on three questions about Internet path failures: (1) Where do failures appear? (2) How long do they last? (3) How do they correlate with BGP routing instability? Data collected over 13 months from an Internet testbed of 31 topologically diverse hosts suggests that most path failures last less than fifteen minutes. Failures that appear in the network core correlate better with BGP instability than failures that appear close to end hosts. On average, most failures precede BGP messages by about four minutes, but there is often increased BGP traffic both before and after failures. Our findings suggest that reactive routing is most effective between hosts that have multiple connections to the Internet. The data set also suggests that passive observations of BGP routing messages could be used to predict about 20% of impending failures, allowing re-routing systems to react more quickly to failures.
AUC optimization vs. error rate minimization
- in Advances in Neural Information Processing Systems
, 2004
"... The area under an ROC curve (AUC) is a criterion used in many applications to measure the quality of a classification algorithm. However, the objective function optimized in most of these algorithms is the error rate and not the AUC value. We give a detailed statistical analysis of the relationship ..."
Abstract
-
Cited by 66 (2 self)
- Add to MetaCart
The area under an ROC curve (AUC) is a criterion used in many applications to measure the quality of a classification algorithm. However, the objective function optimized in most of these algorithms is the error rate and not the AUC value. We give a detailed statistical analysis of the relationship between the AUC and the error rate, including the first exact expression of the expected value and the variance of the AUC for a fixed error rate. Our results show that the average AUC is monotonically increasing as a function of the classification accuracy, but that the standard deviation for uneven distributions and higher error rates is noticeable. Thus, algorithms designed to minimize the error rate may not lead to the best possible AUC values. We show that, under certain conditions, the global function optimized by the RankBoost algorithm is exactly the AUC. We report the results of our experiments with RankBoost in several datasets demonstrating the benefits of an algorithm specifically designed to globally optimize the AUC over other existing algorithms optimizing an approximation of the AUC or only locally optimizing the AUC. 1
Combining filtering and statistical methods for anomaly detection
- In Proceedings of IMC
, 2005
"... In this work we develop an approach for anomaly detection for large scale networks such as that of an enterprize or an ISP. The traffic patterns we focus on for analysis are that of a network-wide view of the traffic state, called the traffic matrix. In the first step a Kalman filter is used to filt ..."
Abstract
-
Cited by 53 (10 self)
- Add to MetaCart
In this work we develop an approach for anomaly detection for large scale networks such as that of an enterprize or an ISP. The traffic patterns we focus on for analysis are that of a network-wide view of the traffic state, called the traffic matrix. In the first step a Kalman filter is used to filter out the “normal ” traffic. This is done by comparing our future predictions of the traffic matrix state to an inference of the actual traffic matrix that is made using more recent measurement data than those used for prediction. In the second step the residual filtered process is then examined for anomalies. We explain here how any anomaly detection method can be viewed as a problem in statistical hypothesis testing. We study and compare four different methods for analyzing residuals, two of which are new. These methods focus on different aspects of the traffic pattern change. One focuses on instantaneous behavior, another focuses on changes in the mean of the residual process, a third on changes in the variance behavior, and a fourth examines variance changes over multiple timescales. We evaluate and compare all of these methods using ROC curves that illustrate the full tradeoff between false positives and false negatives for the complete spectrum of decision thresholds. 1
Fusion Via a Linear Combination of Scores
, 1999
"... We present a thorough analysis of the capabilities of the linear combination (LC) model for fusion of information retrieval systems. The LC model combines the results lists of multiple IR systems by scoring each document using a weighted sum of the scores from each of the component systems. We first ..."
Abstract
-
Cited by 53 (1 self)
- Add to MetaCart
We present a thorough analysis of the capabilities of the linear combination (LC) model for fusion of information retrieval systems. The LC model combines the results lists of multiple IR systems by scoring each document using a weighted sum of the scores from each of the component systems. We first present both empirical and analytical justification for the hypotheses that such a model should only be used when the systems involved have high performance, a large overlap of relevant documents, and a small overlap of nonrelevant documents. The empirical approach allows us to very accurately predict the performance of a combined system. We also derive a formula for a theoretically optimal weighting scheme for combining 2 systems. We introduce d --the difference between the average score on relevant documents and the average score on nonrelevant documents -- as a performance measure which not only allows mathematical reasoning about system performance, but also allows the selection of w...

