Results 21  30
of
414
An ImmunityBased Technique to Characterize Intrusions in Computer Networks
, 2002
"... This paper presents a technique inspired by the negative selection mechanism of the immune system that can detect foreign patterns in the complement (nonself) space. In particular, the novel pattern detectors (in the complement space) are evolved using a genetic search, which could di#erentiate var ..."
Abstract

Cited by 103 (19 self)
 Add to MetaCart
(Show Context)
This paper presents a technique inspired by the negative selection mechanism of the immune system that can detect foreign patterns in the complement (nonself) space. In particular, the novel pattern detectors (in the complement space) are evolved using a genetic search, which could di#erentiate varying degrees of abnormality in network tra#c. The paper demonstrates the usefulness of such a technique to detect a wide variety of intrusive activities on networked computers. We also used a positive characterization method based on a nearestneighbor classification.
Explicitly representing expected cost: an alternative to ROC representation
 KDD
, 2000
"... This paper proposes an alternative to ROC representation, in which the expected cost of a classifier is represented explicitly. This expected cost representation maintains many of the advantages of ROC representation, but is easier to understand. It allows the experimenter to immediately see the ran ..."
Abstract

Cited by 93 (10 self)
 Add to MetaCart
(Show Context)
This paper proposes an alternative to ROC representation, in which the expected cost of a classifier is represented explicitly. This expected cost representation maintains many of the advantages of ROC representation, but is easier to understand. It allows the experimenter to immediately see the range of costs and class frequencies where a particular classifier is the best and quantitatively how much better it is than other classiers. This paper demonstrates there is a point/line duality between the two representations. A point in ROC space representing a classier becomes a line segment spanning the full range of costs and class frequencies. This duality produces equivalent operations in the two spaces, allowing most techniques used in ROC analysis to be readily reproduced in the cost space.
Using AUC and accuracy in evaluating learning algorithms
 IEEE Transactions on Knowledge and Data Engineering
, 2005
"... The area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, has been recently proposed as an alternative singlenumber measure for evaluating the predictive ability of learning algorithms. However, no formal arguments were given as to why AUC should be preferred over accuracy. ..."
Abstract

Cited by 89 (1 self)
 Add to MetaCart
(Show Context)
The area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, has been recently proposed as an alternative singlenumber measure for evaluating the predictive ability of learning algorithms. However, no formal arguments were given as to why AUC should be preferred over accuracy. In this paper, we establish formal criteria for comparing two different measures for learning algorithms, and we show theoretically and empirically that AUC is, in general, a better measure (defined precisely) than accuracy. We then reevaluate wellestablished claims in machine learning based on accuracy using AUC, and obtain interesting and surprising new results. We also show that AUC is more directly associated with the net profit than accuracy in direct marketing, suggesting that learning algorithms should optimize AUC instead of accuracy in realworld applications. The conclusions drawn in this paper may make a significant impact to machine learning and data mining applications. Note: This paper integrates results in our papers published in IJCAI 2003 [22] and ICDM 2003 [15]. It also includes many new results. For example, the concept of indifferency in Section IIB is new, and Sections IIIB, IIIC, IVA, IVD, and V are all new and unpublished. Index Terms Evaluation of learning algorithms, AUC vs accuracy, ROC
Learning when data sets are imbalanced and when costs are unequal and unknown
 ICML2003 Workshop on Learning from Imbalanced Data Sets II
, 2003
"... The problem of learning from imbalanced data sets, while not the same problem as learning when misclassification costs are unequal and unknown, can be handled in a similar manner. That is, in both contexts, we can use techniques from roc analysis to help with classifier design. We present results fr ..."
Abstract

Cited by 87 (0 self)
 Add to MetaCart
(Show Context)
The problem of learning from imbalanced data sets, while not the same problem as learning when misclassification costs are unequal and unknown, can be handled in a similar manner. That is, in both contexts, we can use techniques from roc analysis to help with classifier design. We present results from two studies in which we dealt with skewed data sets and unequal, but unknown costs of error. We also compare for one domain these results to those obtained by oversampling and undersampling the data set. The operations of sampling, moving the decision threshold, and adjusting the cost matrix produced sets of classifiers that fell on the same roc curve. 1.
Tree induction vs. logistic regression: A learningcurve analysis
 CEDER WORKING PAPER #IS0102, STERN SCHOOL OF BUSINESS
, 2001
"... Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership pr ..."
Abstract

Cited by 86 (16 self)
 Add to MetaCart
(Show Context)
Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership probabilities. We use a learningcurve analysis to examine the relationship of these measures to the size of the training set. The results of the study show several remarkable things. (1) Contrary to prior observations, logistic regression does not generally outperform tree induction. (2) More specifically, and not surprisingly, logistic regression is better for smaller training sets and tree induction for larger data sets. Importantly, this often holds for training sets drawn from the same domain (i.e., the learning curves cross), so conclusions about inductionalgorithm superiority on a given domain must be based on an analysis of the learning curves. (3) Contrary to conventional wisdom, tree induction is effective atproducing probabilitybased rankings, although apparently comparatively less so foragiven training{set size than at making classifications. Finally, (4) the domains on which tree induction and logistic regression are ultimately preferable canbecharacterized surprisingly well by a simple measure of signaltonoise ratio.
Active Sampling for Class Probability Estimation and Ranking
 Machine Learning
, 2004
"... In many costsensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels ..."
Abstract

Cited by 78 (9 self)
 Add to MetaCart
(Show Context)
In many costsensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a samplingbased active learning method for estimating class probabilities and classbased rankings. BOOT STRAPLV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAPLV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAPLV and show that it is significantly more competitive with BOOTSTRAPLV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling ...
Simple Estimators for Relational Bayesian Classifiers
 In Proceedings of the 3rd IEEE International Conference on Data Mining
, 2003
"... This paper evaluates several modifications of the Simple Bayesian Classifier to enable estimation and inference over relational data. The resulting Relational Bayesian Classifiers are evaluated on three realworld datasets and compared to a baseline SBC using no relational information ..."
Abstract

Cited by 76 (20 self)
 Add to MetaCart
This paper evaluates several modifications of the Simple Bayesian Classifier to enable estimation and inference over relational data. The resulting Relational Bayesian Classifiers are evaluated on three realworld datasets and compared to a baseline SBC using no relational information
A Framework for Detection and Measurement of Phishing Attacks
, 2006
"... Phishing is form of identity theft that combines social engineering techniques and sophisticated attack vectors to harvest financial information from unsuspecting consumers. Often a phisher tries to lure her victim into clicking a URL pointing to a rogue page. In this paper, we focus on studying the ..."
Abstract

Cited by 74 (1 self)
 Add to MetaCart
(Show Context)
Phishing is form of identity theft that combines social engineering techniques and sophisticated attack vectors to harvest financial information from unsuspecting consumers. Often a phisher tries to lure her victim into clicking a URL pointing to a rogue page. In this paper, we focus on studying the structure of URLs employed in various phishing attacks. We find that it is often possible to tell whether or not a URL belongs to a phishing attack without requiring any knowledge of the corresponding page data. We describe several features that can be used to distinguish a phishing URL from a benign one. These features are used to model a logistic regression filter that is efficient and has a high accuracy. We use this filter to perform thorough measurements on several million URLs and quantify the prevalence of phishing on the Internet today.
Cyclic pattern kernels for Predictive graph mining
, 2004
"... With applications in biology, the worldwide web, and several other areas, mining of graphstructured objects has received significant interest recently. One of the major research directions in this field is concerned with predictive data mining in graph databases where each instance is represented ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
With applications in biology, the worldwide web, and several other areas, mining of graphstructured objects has received significant interest recently. One of the major research directions in this field is concerned with predictive data mining in graph databases where each instance is represented by a graph. Some of the proposed approaches for this task rely on the excellent classification performance of support vector machines. To control the computational cost of these approaches, the underlying kernel functions are based on frequent patterns. In contrast to these approaches, we propose a kernel function based on a natural set of cyclic and tree patterns independent of their frequency, and discuss its computational aspects. To practically demonstrate the effectiveness of our approach, we use the popular NCIHIV molecule dataset. Our experimental results show that cyclic pattern kernels can be computed quickly and offer predictive performance superior to recent graph kernels based on frequent patterns.
AUC: a statistically consistent and more discriminating measure than accuracy
 IN: PROCEEDINGS OF 18TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI2003
, 2003
"... Predictive accuracy has been used as the main and often only evaluation criterion for the predictive performance of classification learning algorithms. In recent years, the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, has been proposed as an alternative singlenumber ..."
Abstract

Cited by 73 (5 self)
 Add to MetaCart
Predictive accuracy has been used as the main and often only evaluation criterion for the predictive performance of classification learning algorithms. In recent years, the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, has been proposed as an alternative singlenumber measure for evaluating learning algorithms. In this paper, we prove that AUC is a better measure than accuracy. More specifically, we present rigourous definitions on consistency and discriminancy in comparing two evaluation measures for learning algorithms. We then present empirical evaluations and a formal proof to establish that AUC is indeed statistically consistent and more discriminating than accuracy. Our result is quite significant since we formally prove that, for the first time, AUC is a better measure than accuracy in the evaluation of learning algorithms.