Results 1 -
4 of
4
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
, 1997
"... The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containin ..."
Abstract
-
Cited by 486 (20 self)
- Add to MetaCart
The simple Bayesian classifier is known to be optimal when attributes are independent given the class, but the question of whether other sufficient conditions for its optimality exist has so far not been explored. Empirical results showing that it performs surprisingly well in many domains containing clear attribute dependences suggest that the answer to this question may be positive. This article shows that, although the Bayesian classifier's probability estimates are only optimal under quadratic loss if the independence assumption holds, the classifier itself can be optimal under zero-one loss (misclassification rate) even when this assumption is violated by a wide margin. The region of quadratic-loss optimality of the Bayesian classifier is in fact a second-order infinitesimal fraction of the region of zero-one optimality. This implies that the Bayesian classifier has a much greater range of applicability than previously thought. For example, in this article it is shown to be opti...
Tree induction vs. logistic regression: A learning-curve analysis
- CEDER WORKING PAPER #IS-01-02, STERN SCHOOL OF BUSINESS
, 2001
"... Tree induction and logistic regression are two standard, off-the-shelf methods for building models for classi cation. We present a large-scale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on class-membership pr ..."
Abstract
-
Cited by 50 (16 self)
- Add to MetaCart
Tree induction and logistic regression are two standard, off-the-shelf methods for building models for classi cation. We present a large-scale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on class-membership probabilities. We use a learning-curve analysis to examine the relationship of these measures to the size of the training set. The results of the study show several remarkable things. (1) Contrary to prior observations, logistic regression does not generally outperform tree induction. (2) More specifically, and not surprisingly, logistic regression is better for smaller training sets and tree induction for larger data sets. Importantly, this often holds for training sets drawn from the same domain (i.e., the learning curves cross), so conclusions about induction-algorithm superiority on a given domain must be based on an analysis of the learning curves. (3) Contrary to conventional wisdom, tree induction is effective atproducing probability-based rankings, although apparently comparatively less so foragiven training{set size than at making classifications. Finally, (4) the domains on which tree induction and logistic regression are ultimately preferable canbecharacterized surprisingly well by a simple measure of signal-to-noise ratio.
Regularized Gaussian Discriminant Analysis Through Eigenvalue Decomposition
- Journal of the American Statistical Association
, 1996
"... Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis in the Gaussian framework. RDA makes use of two regularization parameters to design an intermediate classi cation rule between linear and quadratic discriminant analysis. In this paper, we propose an alternative a ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis in the Gaussian framework. RDA makes use of two regularization parameters to design an intermediate classi cation rule between linear and quadratic discriminant analysis. In this paper, we propose an alternative approach to design classi cation rules which have also a median position between linear and quadratic discriminant analysis. Our approach is based on the reparametrization of the covariance matrix k of a group Gk in terms of its eigenvalue decomposition, k = kDkAkD 0 k where k speci es the volume of Gk, Ak its shape, and Dk its orientation. Variations on constraints concerning k�Ak and Dk lead to 14 discrimination models of interest. For each model, we derived the maximum likelihood parameter estimates and our approach consists in selecting the model among the 14 possible models by minimizing the sample-based estimate of future misclassi cation risk by cross-validation. Numerical experiments show favorable behavior of this approach as compared to RDA.
Clustering Via Normal Mixture Models
"... We consider a model-based approach to clustering, whereby each observation is assumed to have arisen from an underlying mixture of a finite number of distributions. The number of components in this mixture model corresponds to the number of clusters to be imposed on the data. A common assumption is ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We consider a model-based approach to clustering, whereby each observation is assumed to have arisen from an underlying mixture of a finite number of distributions. The number of components in this mixture model corresponds to the number of clusters to be imposed on the data. A common assumption is to take the component distributions to be multivariate normal with perhaps some restrictions on the component covariance matrices. The model can be fitted to the data using maximum likelihood implemented via the EM algorithm. There is a number of computational issues associated with the fitting, including the specification of initial starting points for the EM algorithm and the carrying out of tests for the number of components in the final version of the model. We shall discuss some of these problems and describe an algorithm that attempts to handle them automatically. 1. INTRODUCTION In some applications of mixture models, questions related to clustering may arise only after the mixture mo...

