Results 1  10
of
21
Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods
 ADVANCES IN LARGE MARGIN CLASSIFIERS
, 1999
"... The output of a classifier should be a calibrated posterior probability to enable postprocessing. Standard SVMs do not provide such probabilities. One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. Howev ..."
Abstract

Cited by 699 (0 self)
 Add to MetaCart
The output of a classifier should be a calibrated posterior probability to enable postprocessing. Standard SVMs do not provide such probabilities. One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. However, training with a maximum likelihood score will produce nonsparse kernel machines. Instead, we train an SVM, then train the parameters of an additional sigmoid function to map the SVM outputs into probabilities. This chapter compares classification error rate and likelihood scores for an SVM plus sigmoid versus a kernel method trained with a regularized likelihood error function. These methods are tested on three dataminingstyle data sets. The SVM+sigmoid yields probabilities of comparable quality to the regularized maximum likelihood kernel method, while still retaining the sparseness of the SVM.
The Connection between Regularization Operators and Support Vector Kernels
, 1998
"... In this paper a correspondence is derived between regularization operators used in Regularization Networks and Support Vector Kernels. We prove that the Green's Functions associated with regularization operators are suitable Support Vector Kernels with equivalent regularization properties. Moreover ..."
Abstract

Cited by 146 (43 self)
 Add to MetaCart
In this paper a correspondence is derived between regularization operators used in Regularization Networks and Support Vector Kernels. We prove that the Green's Functions associated with regularization operators are suitable Support Vector Kernels with equivalent regularization properties. Moreover the paper provides an analysis of currently used Support Vector Kernels in the view of regularization theory and corresponding operators associated with the classes of both polynomial kernels and translation invariant kernels. The latter are also analyzed on periodical domains. As a byproduct we show that a large number of Radial Basis Functions, namely conditionally positive definite functions, may be used as Support Vector kernels.
Estimating the Generalization Performance of an SVM Efficiently
, 2000
"... This paper proposes and analyzes an approach to estimating the generalization performance of a support vector machine (SVM) for text classification. Without any computation intensive resampling, the new estimators are computationally much more ecient than crossvalidation or bootstrap, since they ca ..."
Abstract

Cited by 95 (1 self)
 Add to MetaCart
This paper proposes and analyzes an approach to estimating the generalization performance of a support vector machine (SVM) for text classification. Without any computation intensive resampling, the new estimators are computationally much more ecient than crossvalidation or bootstrap, since they can be computed immediately from the form of the hypothesis returned by the SVM. Moreover, the estimators delevoped here address the special performance measures needed for text classification. While they can be used to estimate error rate, one can also estimate the recall, the precision, and the F 1 . A theoretical analysis and experiments on three text classification collections show that the new method can effectively estimate the performance of SVM text classifiers in a very efficient way.
Model Selection for Probabilistic Clustering Using CrossValidated Likelihood
 Statistics and Computing
, 1998
"... Crossvalidated likelihood is investigated as a tool for automatically determining the appropriate number of components (given the data) in finite mixture modelling, particularly in the context of modelbased probabilistic clustering. The conceptual framework for the crossvalidation approach to mod ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
Crossvalidated likelihood is investigated as a tool for automatically determining the appropriate number of components (given the data) in finite mixture modelling, particularly in the context of modelbased probabilistic clustering. The conceptual framework for the crossvalidation approach to model selection is direct in the sense that models are judged directly on their outofsample predictive performance. The method is applied to a wellknown clustering problem in the atmospheric science literature using historical records of upper atmosphere geopotential height in the Northern hemisphere. Crossvalidated likelihood provides strong evidence for three clusters in the data set, providing an objective confirmation of earlier results derived using nonprobabilistic clustering techniques. 1 Introduction Crossvalidation is a wellknown technique in supervised learning to select a model from a family of candidate models. Examples include selecting the best classification tree using cr...
On Feature Selection: Learning with Exponentially many Irrelevant Features as Training Examples
 Proceedings of the Fifteenth International Conference on Machine Learning
, 1998
"... We consider feature selection in the "wrapper " model of feature selection. This typically involves an NPhard optimization problem that is approximated by heuristic search for a "good" feature subset. First considering the idealization where this optimization is performed exactly, we give a rigorou ..."
Abstract

Cited by 37 (4 self)
 Add to MetaCart
We consider feature selection in the "wrapper " model of feature selection. This typically involves an NPhard optimization problem that is approximated by heuristic search for a "good" feature subset. First considering the idealization where this optimization is performed exactly, we give a rigorous bound for generalization error under feature selection. The search heuristics typically used are then immediately seen as trying to achieve the error given in our bounds, and succeeding to the extent that they succeed in solving the optimization. The bound suggests that, in the presence of many "irrelevant" features, the main source of error in wrapper model feature selection is from "overfitting " holdout or crossvalidation data. This motivates a new algorithm that, again under the idealization of performing search exactly, has sample complexity (and error) that grows logarithmically in the number of "irrelevant" features  which means it can tolerate having a number of "irrelevant" f...
Preventing "Overfitting" of CrossValidation Data
 In Proceedings of the Fourteenth International Conference on Machine Learning
, 1997
"... Suppose that, for a learning task, we have to select one hypothesis out of a set of hypotheses (that may, for example, have been generated by multiple applications of a randomized learning algorithm). A common approach is to evaluate each hypothesis in the set on some previously unseen crossvalidat ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
Suppose that, for a learning task, we have to select one hypothesis out of a set of hypotheses (that may, for example, have been generated by multiple applications of a randomized learning algorithm). A common approach is to evaluate each hypothesis in the set on some previously unseen crossvalidation data, and then to select the hypothesis that had the lowest crossvalidation error. But when the crossvalidation data is partially corrupted such as by noise, and if the set of hypotheses we are selecting from is large, then "folklore" also warns about "overfitting" the crossvalidation data [Klockars and Sax, 1986, Tukey, 1949, Tukey, 1953]. In this paper, we explain how this "overfitting" really occurs, and show the surprising result that it can be overcome by selecting a hypothesis with a higher crossvalidation error, over others with lower crossvalidation errors. We give reasons for not selecting the hypothesis with the lowest crossvalidation error, and propose a new algorithm, L...
Adaptive Regularization in Neural Network Modeling
, 1997
"... . In this paper we address the important problem of optimizing regularization parameters in neural network modeling. The suggested optimization scheme is an extended version of the recently presented algorithm [24]. The idea is to minimize an empirical estimate  like the crossvalidation estimate ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
. In this paper we address the important problem of optimizing regularization parameters in neural network modeling. The suggested optimization scheme is an extended version of the recently presented algorithm [24]. The idea is to minimize an empirical estimate  like the crossvalidation estimate  of the generalization error with respect to regularization parameters. This is done by employing a simple iterative gradient descent scheme using virtually no additional programming overhead compared to standard training. Experiments with feedforward neural network models for time series prediction and classification tasks showed the viability and robustness of the algorithm. Moreover, we provided some simple theoretical examples in order to illustrate the potential and limitations of the proposed regularization framework. 1 Introduction Neural networks are flexible tools for time series processing and pattern recognition. By increasing the number of hidden neurons in a 2layer architec...
Bestfirst decision tree learning
 University of Waikato
, 2007
"... Decision trees are potentially powerful predictors and explicitly represent the structure of a dataset. Standard decision tree learners such as C4.5 expand nodes in depthfirst order (Quinlan, 1993), while in bestfirst decision tree learners the ”best ” node is expanded first. The ”best ” node is t ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Decision trees are potentially powerful predictors and explicitly represent the structure of a dataset. Standard decision tree learners such as C4.5 expand nodes in depthfirst order (Quinlan, 1993), while in bestfirst decision tree learners the ”best ” node is expanded first. The ”best ” node is the node whose split leads to maximum reduction of impurity (e.g. Gini index or information in this thesis) among all nodes available for splitting. The resulting tree will be the same when fully grown, just the order in which it is built is different. In practice, some branches of a fullyexpanded tree do not truly reflect the underlying information in the domain. This problem is known as overfitting and is mainly caused by noisy data. Pruning is necessary to avoid overfitting the training data, and discards those parts that are not predictive of future data. Bestfirst node expansion enables us to investigate new pruning techniques by determining the number of expansions performed based on crossvalidation. This thesis first introduces the algorithm for building binary bestfirst decision trees for classification problems. Then, it investigates two new pruning methods that
Additive Regularization: Fusion of Training and Validation Levels in Kernel Methods
 INTERNAL REPORT 03184, ESATSCDSISTA, K.U.LEUVEN
, 2003
"... In this paper the training of Least Squares Support Vector Machines (LSSVMs) for classification and regression and the determination of its regularization constants is reformulated in terms of additive regularization. In contrast with the classical Tikhonov scheme, a major advantage of this additiv ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
In this paper the training of Least Squares Support Vector Machines (LSSVMs) for classification and regression and the determination of its regularization constants is reformulated in terms of additive regularization. In contrast with the classical Tikhonov scheme, a major advantage of this additive regularization mechanism is that it enables to achieve computational fusion of the training and validation levels leading to the solution of one single set of linear equations that characterizes the training and validation at once. The problem of avoiding overfitting on validation data is approached by restricting explicitly the degrees of freedom of the regularization constants. Di#erent restriction schemes are investigated, including an ensemble model approach. The link between the Tikhonov scheme and additive regularization is explained and an efficient crossvalidation method with additive regularization is proposed. The new methods are illustrated with several examples on synthetic and reallife data sets.