Results 1  10
of
18
SupportVector Networks
 Machine Learning
, 1995
"... The supportvector network is a new learning machine for twogroup classification problems. The machine conceptually implements the following idea: input vectors are nonlinearly mapped to a very highdimension feature space. In this feature space a linear decision surface is constructed. Special pr ..."
Abstract

Cited by 2155 (32 self)
 Add to MetaCart
The supportvector network is a new learning machine for twogroup classification problems. The machine conceptually implements the following idea: input vectors are nonlinearly mapped to a very highdimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the supportvector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to nonseparable training data.
A robust minimax approach to classification
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2002
"... When constructing a classifier, the probability of correct classification of future data points should be maximized. We consider a binary classification problem where the mean and covariance matrix of each class are assumed to be known. No further assumptions are made with respect to the classcondi ..."
Abstract

Cited by 61 (7 self)
 Add to MetaCart
When constructing a classifier, the probability of correct classification of future data points should be maximized. We consider a binary classification problem where the mean and covariance matrix of each class are assumed to be known. No further assumptions are made with respect to the classconditional distributions. Misclassification probabilities are then controlled in a worstcase setting: that is, under all possible choices of classconditional densities with given mean and covariance matrix, we minimize the worstcase (maximum) probability of misclassification of future data points. For a linear decision boundary, this desideratum is translated in a very direct way into a (convex) second order cone optimization problem, with complexity similar to a support vector machine problem. The minimax problem can be interpreted geometrically as minimizing the maximum of the Mahalanobis distances to the two classes. We address the issue of robustness with respect to estimation errors (in the means and covariances of the
Selfcorrective character recognition system
 IEEE Trans. Information Theory
, 1966
"... AbstracfThe output of a simple statistical categorizer is used to improve recognition performance on a homogeneous data set. An array I $ initial weights contains a coarse description of the various classes; as the system cycles through a set of characters from the same source (a typewritten or pri ..."
Abstract

Cited by 26 (17 self)
 Add to MetaCart
AbstracfThe output of a simple statistical categorizer is used to improve recognition performance on a homogeneous data set. An array I $ initial weights contains a coarse description of the various classes; as the system cycles through a set of characters from the same source (a typewritten or printed page), the weights are modified to correspond more closely with the observed distributions. The true identities of the characters remain inaccessible throughout the training cycle. This experimental study of the effect of the various parameters in the algorithm is based on ~30 000 characters from fourteen different font styles. A fivefold average decrease over the initial rates is obtained in both errors and rejects. HE SELFCORRECTIVE character recognition T system about to be described differs
Minimax Probability Machine
 Advances in Neural Information Processing Systems 14
, 2001
"... When constructing a classi er, the probability of correct classi  cation of future data points should be maximized. In the current paper this desideratum is translated in a very direct way into an optimization problem, which is solved using methods from convex optimization. We also show how t ..."
Abstract

Cited by 22 (4 self)
 Add to MetaCart
When constructing a classi er, the probability of correct classi  cation of future data points should be maximized. In the current paper this desideratum is translated in a very direct way into an optimization problem, which is solved using methods from convex optimization. We also show how to exploit Mercer kernels in this setting to obtain nonlinear decision boundaries. A worstcase bound on the probability of misclassi cation of future data is obtained explicitly.
Two variations on Fisher’s linear discriminant for pattern recognition
 IEEE Transations on Pattern Analysis and Machine Intelligence
, 2002
"... Abstract—Discriminants are often used in pattern recognition to separate clusters of points in some multidimensional “feature ” space. This paper provides two fast and simple techniques for improving on the classification performance provided by Fisher’s linear discriminant for two classes. Both of ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Abstract—Discriminants are often used in pattern recognition to separate clusters of points in some multidimensional “feature ” space. This paper provides two fast and simple techniques for improving on the classification performance provided by Fisher’s linear discriminant for two classes. Both of these methods are also extended to nonlinear decision surfaces through the use of Mercer kernels. Index Terms—Linear discriminant, classification. 1
NonIterative Heteroscedastic Linear Dimension Reduction for TwoClass Data From Fisher to Chernoff
 In Proceedings of the Joint IAPR International Workshops SSPR 2002 and SPR 2002, volume LNCS 2396
, 2002
"... Linear discriminant analysis (LDA) is a traditional solution to the linear dimension reduction (LDR) problem, which is based on the maximization of the betweenclass scatter over the withinclass scatter. This solution is incapable of dealing with heteroscedastic data in a proper way, because of t ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Linear discriminant analysis (LDA) is a traditional solution to the linear dimension reduction (LDR) problem, which is based on the maximization of the betweenclass scatter over the withinclass scatter. This solution is incapable of dealing with heteroscedastic data in a proper way, because of the implicit assumption that the covariance matrices for all the classes are equal. Hence, discriminatory information in the difference between the covariance matrices is not used and, as a consequence, we can only reduce the data to a single dimension in the twoclass case.
An Anticorrelation Kernel for Subsystem Training in Multiple Classifier Systems
"... We present a method for training support vector machine (SVM)based classification systems for combination with other classification systems designed for the same task. Ideally, a new system should be designed such that, when combined with existing systems, the resulting performance is optimized. We ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We present a method for training support vector machine (SVM)based classification systems for combination with other classification systems designed for the same task. Ideally, a new system should be designed such that, when combined with existing systems, the resulting performance is optimized. We present a simple model for this problem and use the understanding gained from this analysis to propose a method to achieve better combination performance when training SVM systems. We include a regularization term in the SVM objective function that aims to reduce the average classconditional covariance between the resulting scores and the scores produced by the existing systems, introducing a tradeoff between such covariance and the system’s individual performance. That is, the new system “takes one for the team”, falling somewhat short of its best possible performance in order to increase the diversity of the ensemble. We report results on the NIST 2005 and 2006 speaker recognition evaluations (SREs) for a variety of subsystems. We show a gain of 19 % on the equal error rate (EER) of a combination of four systems when applying the proposed method with respect to the performance obtained when the four systems are trained
G.: Nonparametric Classification with Polynomial MPMC Cascades
 In: Proc. ICML
, 2004
"... This paper proposes a computationally efficient class of nonparametric binary classification algorithms that generate nonlinear separating boundaries, with minimal tuning of learning parameters. We avoid the computational pitfalls of using extensive cross validation for model selection. For example, ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
This paper proposes a computationally efficient class of nonparametric binary classification algorithms that generate nonlinear separating boundaries, with minimal tuning of learning parameters. We avoid the computational pitfalls of using extensive cross validation for model selection. For example, in Support Vector Machines (SVMs) [6], both the choice of kernels and corresponding kernel parameters is based on extensive cross validation experiments, making generating good SVM models computationally very difficult. Other algorithms, such as Minimax Probability Machine Classification (MPMC) [5], Neural Networks, and even ensemble methods such as Boosting, can suffer from the same computational pitfalls. The Minimax Probability Machine for Classification (MPMC), due to Lanckriet et al. [5], is a recent algorithm that has this characteristic. Given the means and covariance matrices of two classes, MPMC calculates a hyperplane that separates the data by minimizing the maximum probability of misclassification. As such, it generates both a classification and a bound on the expected error for future data. In the same paper, the MPMC is also extended to nonlinear separating
Probabilistic Random Forests: Predicting Data Point Specific Misclassification Probabilities
"... Recently proposed classification algorithms give estimates or worstcase bounds for the probability of misclassification [Lanckriet et al., 2002][L. ..."
Abstract
 Add to MetaCart
Recently proposed classification algorithms give estimates or worstcase bounds for the probability of misclassification [Lanckriet et al., 2002][L.