Results 11  20
of
214
Tree induction vs. logistic regression: A learningcurve analysis
 CEDER WORKING PAPER #IS0102, STERN SCHOOL OF BUSINESS
, 2001
"... Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership pr ..."
Abstract

Cited by 62 (16 self)
 Add to MetaCart
Tree induction and logistic regression are two standard, offtheshelf methods for building models for classi cation. We present a largescale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on classmembership probabilities. We use a learningcurve analysis to examine the relationship of these measures to the size of the training set. The results of the study show several remarkable things. (1) Contrary to prior observations, logistic regression does not generally outperform tree induction. (2) More specifically, and not surprisingly, logistic regression is better for smaller training sets and tree induction for larger data sets. Importantly, this often holds for training sets drawn from the same domain (i.e., the learning curves cross), so conclusions about inductionalgorithm superiority on a given domain must be based on an analysis of the learning curves. (3) Contrary to conventional wisdom, tree induction is effective atproducing probabilitybased rankings, although apparently comparatively less so foragiven training{set size than at making classifications. Finally, (4) the domains on which tree induction and logistic regression are ultimately preferable canbecharacterized surprisingly well by a simple measure of signaltonoise ratio.
Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals
 J. Comput. Biol
, 2004
"... ..."
Benchmarking AnomalyBased Detection Systems
, 2000
"... Anomaly detection is a key element of intrusiondetection and other detection systems in which perturbations of normal behavior suggest the presence of intentionally or unintentionally induced attacks, faults, defects, etc. Because most anomaly detectors are based on probabilistic algorithms that exp ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
Anomaly detection is a key element of intrusiondetection and other detection systems in which perturbations of normal behavior suggest the presence of intentionally or unintentionally induced attacks, faults, defects, etc. Because most anomaly detectors are based on probabilistic algorithms that exploit the intrinsic structure, or regularity, embedded in data logs, a fundamental question is whether or not such structure influences detection performance. If detector performance is indeed a function of environmental regularity, it would be critical to match detectors to environmental characteristics. In intrusiondetection settings, however, this is not done, possibly because such characteristics are not easily ascertained. This paper introduces a metric for characterizing structure in data environments, and tests the hypothesis that intrinsic structure influences probabilistic detection. In a series of experiments, an anomalydetection algorithm was applied to a benchmark suite of 165 c...
Minimum cut model for spoken lecture segmentation
 In Proceedings of the Annual Meeting of the Association for Computational Linguistics (COLINGACL 2006
, 2006
"... We consider the task of unsupervised lecture segmentation. We formalize segmentation as a graphpartitioning task that optimizes the normalized cut criterion. Our approach moves beyond localized comparisons and takes into account longrange cohesion dependencies. Our results demonstrate that global a ..."
Abstract

Cited by 45 (8 self)
 Add to MetaCart
We consider the task of unsupervised lecture segmentation. We formalize segmentation as a graphpartitioning task that optimizes the normalized cut criterion. Our approach moves beyond localized comparisons and takes into account longrange cohesion dependencies. Our results demonstrate that global analysis improves the segmentation accuracy and is robust in the presence of speech recognition errors. 1
Cost curves: an improved method for visualizing classifier performance
 Machine Learning
, 2006
"... Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing ..."
Abstract

Cited by 44 (7 self)
 Add to MetaCart
Abstract. This paper introduces cost curves, a graphical technique for visualizing the performance (error rate or expected cost) of 2class classifiers over the full range of possible class distributions and misclassification costs. Cost curves are shown to be superior to ROC curves for visualizing classifier performance for most purposes. This is because they visually support several crucial types of performance assessment that cannot be done easily with ROC curves, such as showing confidence intervals on a classifier’s performance, and visualizing the statistical significance of the difference in performance of two classifiers. A software tool supporting all the cost curve analysis described in this paper is available from the authors.
Using Rule Sets to Maximize ROC Performance
, 2001
"... Rules are commonly used for classification because they are modular, intelligible and easy to learn. Existing work in classification rule learning assumes the goal is to produce categorical classifications to maximize classification accuracy. Recent work in machine learning has pointed out the limit ..."
Abstract

Cited by 36 (3 self)
 Add to MetaCart
Rules are commonly used for classification because they are modular, intelligible and easy to learn. Existing work in classification rule learning assumes the goal is to produce categorical classifications to maximize classification accuracy. Recent work in machine learning has pointed out the limitations of classification accuracy: when class distributions are skewed, or error costs are unequal, an accuracy maximizing rule set can perform poorly. A more flexible use of a rule set is to produce instance scores indicating the likelihood that an instance belongs to a given class. With such an ability, we can apply rulesets effectively when distributions are skewed or error costs are unequal. This paper empirically investigates different strategies for evaluating rule sets when the goal is to maximize the scoring (ROC) performance.
SJ: BindN: a webbased tool for efficient prediction of DNA and RNA binding sites in amino acid sequences
 Nucleic Acids Res
"... BindN ..."
AUC: a Better Measure than Accuracy In Comparing Learning Algorithms
 IN PROC. OF IJCAI’03
, 2003
"... Predictive accuracy has been widely used as the main criterion for comparing the predictive ability of classification systems (such as C4.5, neural networks, and Naive Bayes). Most of these classifiers also produce probability estimations of the classification, but they are completely ignored in ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
Predictive accuracy has been widely used as the main criterion for comparing the predictive ability of classification systems (such as C4.5, neural networks, and Naive Bayes). Most of these classifiers also produce probability estimations of the classification, but they are completely ignored in the accuracy measure. This is often taken for granted because both training and testing sets only provide class labels. In this paper we establish rigourously that, even in this setting, the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC, provides a better measure than accuracy. Our result is quite significant for three reasons. First, we establish, for the first time, rigourous criteria for comparing evaluation measures for learning algorithms. Second, it suggests that AUC should replace accuracy when measuring and comparing classification systems. Third, our result also prompts us to reevaluate many wellestablished conclusions based on accuracy in machine learning. For example, it is well accepted in the machine learning community that, in terms of predictive accuracy, Naive Bayes and decision trees are very similar. Using AUC, however, we show experimentally that Naive Bayes is significantly better than the decisiontree learning algorithms.
C4.5 and imbalanced data sets: investigating the effect of sampling method, probabilistic estimate, and decision tree structure
 In Proceedings of the ICML’03 Workshop on Class Imbalances
, 2003
"... Imbalanced data sets are becoming ubiquitous, as many applications have very few instances of the “interesting ” or “abnormal” class. Traditional machine learning algorithms can be biased towards majority class due to overprevalence. It is desired that the interesting (minority) class prediction be ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
Imbalanced data sets are becoming ubiquitous, as many applications have very few instances of the “interesting ” or “abnormal” class. Traditional machine learning algorithms can be biased towards majority class due to overprevalence. It is desired that the interesting (minority) class prediction be improved, even if at the cost of additional majority class errors. In this paper, we study three issues, usually considered separately, concerning decision trees and imbalanced data sets — quality of probabilistic estimates, pruning, and effect of preprocessing the imbalanced data set by over or undersampling methods such that a fairly balanced training set is provided to the decision trees. We consider each issue independently and in conjunction with each other, highlighting the scenarios where one method might be preferred over another for learning decision trees from imbalanced data sets. 1.
ROC analysis of statistical methods used in functional MRI: Individual Subjects. NeuroImage 9
, 1999
"... The complicated structure of fMRI signals and associated noise sources make it difficult to assess the validity of various steps involved in the statistical analysis of brain activation. Most methods used for fMRI analysis assume that observations are independent and that the noise can be treated as ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
The complicated structure of fMRI signals and associated noise sources make it difficult to assess the validity of various steps involved in the statistical analysis of brain activation. Most methods used for fMRI analysis assume that observations are independent and that the noise can be treated as white gaussian noise. These assumptions are usually not true but it is difficult to assess how severely these assumptions are violated and what are their practical consequences. In this study a direct comparison is made between the power of various analytical methods used to detect activations, without reference to estimates of statistical significance. The statistics used in fMRI are treated as metrics designed to detect activations and are not interpreted probabilistically. The receiver operator characteristic (ROC) method is used to compare the efficacy of various steps in calculating an activation map in the study of a single subject based on optimizing the ratio of the number of detected activations to the number of falsepositive findings. The main findings are as follows: Preprocessing. The removal of intensity drifts and highpass filtering applied on the voxel timecourse level is beneficial to the efficacy of analysis. Temporal normalization of the global image intensity, smoothing in the temporal domain, and lowpass filtering do not improve power of analysis. Choices of statistics. the crosscorrelation coefficient and tstatistic, as well as nonparametric Mann–Whitney statistics, prove to be the most effective and are similar in performance, by our criterion. Task design. the proper design of task protocols is shown to be crucial. In an alternating block design the optimal block length is be approximately 18 s. Spatial clustering. an initial spatial smoothing of images is more efficient than cluster filtering of the statistical parametric activation maps. � 1999 Academic Press 1.