Results 1 - 10
of
11
Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods
- ADVANCES IN LARGE MARGIN CLASSIFIERS
, 1999
"... The output of a classifier should be a calibrated posterior probability to enable post-processing. Standard SVMs do not provide such probabilities. One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. Howev ..."
Abstract
-
Cited by 503 (0 self)
- Add to MetaCart
The output of a classifier should be a calibrated posterior probability to enable post-processing. Standard SVMs do not provide such probabilities. One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. However, training with a maximum likelihood score will produce non-sparse kernel machines. Instead, we train an SVM, then train the parameters of an additional sigmoid function to map the SVM outputs into probabilities. This chapter compares classification error rate and likelihood scores for an SVM plus sigmoid versus a kernel method trained with a regularized likelihood error function. These methods are tested on three data-mining-style data sets. The SVM+sigmoid yields probabilities of comparable quality to the regularized maximum likelihood kernel method, while still retaining the sparseness of the SVM.
Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Randomized GACV
, 1998
"... this paper we very briefly review some of these results. RKHS can be chosen tailored to the problem at hand in many ways, and we review a few of them, including radial basis function and smoothing spline ANOVA spaces. Girosi (1997), Smola and Scholkopf (1997), Scholkopf et al (1997) and others have ..."
Abstract
-
Cited by 122 (9 self)
- Add to MetaCart
this paper we very briefly review some of these results. RKHS can be chosen tailored to the problem at hand in many ways, and we review a few of them, including radial basis function and smoothing spline ANOVA spaces. Girosi (1997), Smola and Scholkopf (1997), Scholkopf et al (1997) and others have noted the relationship between SVM's and penalty methods as used in the statistical theory of nonparametric regression. In Section 1.2 we elaborate on this, and show how replacing the likelihood functional of the logit (log odds ratio) in penalized likelihood methods for Bernoulli [yes-no] data, with certain other functionals of the logit (to be called SVM functionals) results in several of the SVM's that are of modern research interest. The SVM functionals we consider more closely resemble a "goodness-of-fit" measured by classification error than a "goodness-of-fit" measured by the comparative Kullback-Liebler distance, which is frequently associated with likelihood functionals. This observation is not new or profound, but it is hoped that the discussion here will help to bridge the conceptual gap between classical nonparametric regression via penalized likelihood methods, and SVM's in RKHS. Furthermore, since SVM's can be expected to provide more compact representations of the desired classification boundaries than boundaries based on estimating the logit by penalized likelihood methods, they have potential as a prescreening or model selection tool in sifting through many variables or regions of attribute space to find influential quantities, even when the ultimate goal is not classification, but to understand how the logit varies as the important variables change throughout their range. This is potentially applicable to the variable/model selection problem in demographic m...
Smoothing Spline ANOVA for Exponential Families, with Application to the Wisconsin Epidemiological Study of Diabetic Retinopathy
- ANN. STATIST
, 1995
"... Let y i ; i = 1; \Delta \Delta \Delta ; n be independent observations with the density of y i of the form h(y i ; f i ) = exp[y i f i \Gammab(f i )+c(y i )], where b and c are given functions and b is twice continuously differentiable and bounded away from 0. Let f i = f(t(i)), where t = (t 1 ; \De ..."
Abstract
-
Cited by 64 (34 self)
- Add to MetaCart
Let y i ; i = 1; \Delta \Delta \Delta ; n be independent observations with the density of y i of the form h(y i ; f i ) = exp[y i f i \Gammab(f i )+c(y i )], where b and c are given functions and b is twice continuously differentiable and bounded away from 0. Let f i = f(t(i)), where t = (t 1 ; \Delta \Delta \Delta ; t d ) 2 T (1)\Omega \Delta \Delta \Delta\Omega T (d) = T , the T (ff) are measureable spaces of rather general form, and f is an unknown function on T with some assumed `smoothness' properties. Given fy i ; t(i); i = 1; \Delta \Delta \Delta ; ng, it is desired to estimate f(t) for t in some region of interest contained in T . We develop the fitting of smoothing spline ANOVA models to this data of the form f(t) = C + P ff f ff (t ff ) + P ff!fi f fffi (t ff ; t fi ) + \Delta \Delta \Delta. The components of the decomposition satisfy side conditions which generalize the usual side conditions for parametric ANOVA. The estimate of f is obtained as the minimizer...
Approximating Thin-Plate Splines for Elastic Registration: Integration of Landmark Errors and Orientation Attributes
- In Proc. of IPMI'99, volume 1613 of LNCS
, 1999
"... . We introduce an approach to elastic registration of tomographic images based on thin-plate splines. Central to this scheme is a well-dened minimizing functional for which the solution can be stated analytically. In this work, we consider the integration of anisotropic landmark errors as well a ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
. We introduce an approach to elastic registration of tomographic images based on thin-plate splines. Central to this scheme is a well-dened minimizing functional for which the solution can be stated analytically. In this work, we consider the integration of anisotropic landmark errors as well as additional attributes at landmarks. As attributes we use orientations at landmarks and we incorporate the corresponding constraints through scalar products. With our approximation scheme it is thus possible to integrate statistical as well as geometric information as additional knowledge in elastic image registration. On the basis of synthetic as well as real tomographic images we show that this additional knowledge can signicantly improve the registration result. In particular, we demonstrate that our scheme incorporating orientation attributes can preserve the shape of rigid structures (such as bone) embedded in an otherwise elastic material. This is achieved without selecting...
Generalization And Regularization in Nonlinear Learning Systems
- The Handbook of Brain Theory and Neural Networks
, 1994
"... this article we will describe generalization and regularization from the point of view of multivariate function estimation in a statistical context. Multivariate function estimation is not, in principle, distinguishable from supervised machine learning. However, until fairly recently supervised mach ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
this article we will describe generalization and regularization from the point of view of multivariate function estimation in a statistical context. Multivariate function estimation is not, in principle, distinguishable from supervised machine learning. However, until fairly recently supervised machine learning and multivariate function estimation had fairly distinct groups of practitioners, and small overlap in language, literature, and in the kinds of practical problems under study. In any case, we are given a training set, consisting of pairs of input (feature) vectors and associated outputs ft(i); y i g, for n training or example subjects, i = 1; :::n. From this data, it is desired to construct a map which generalizes well, that is, given a new value of t, the map will provide a reasonable prediction for the unobserved output associated with this t.
Smoothing Spline ANOVA Fits for Very Large, Nearly Regular Data Sets, with Application to Historical Global Climate Data
, 1995
"... ... validation (GCV), provided that matrix decompositions of size n \Theta n can be carried out, where n is the sample size. We review the randomized trace technique and the backfitting algorithm, and remark that they can be combined to solve the variational problem while choosing the smoothing para ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
... validation (GCV), provided that matrix decompositions of size n \Theta n can be carried out, where n is the sample size. We review the randomized trace technique and the backfitting algorithm, and remark that they can be combined to solve the variational problem while choosing the smoothing parameters by GCV for data sets that are much too large to use matrix decomposition methods directly. Some intermediate calculations to speed up the backfitting algorithm are given which are useful when the data has a tensor product structure. We describe an imputation procedure which can take advantage of data with a (nearly) tensor product structure. As an illustration of an application we discuss the algorithm in the context of fitting and smoothing historical global winter mean surface temperature data and examining the main effects and interactions for time and space.
Smoothing Spline Analysis Of Variance For Polychotomous Response Data
, 1998
"... We consider the penalized likelihood method with smoothing spline ANOVA for estimating nonparametric functions to data involving a polychotomous response. The fitting procedure involves minimizing the penalized likelihood in a Reproducing Kernel Hilbert Space. One Step Block SOR-Newton-Raphson Algor ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
We consider the penalized likelihood method with smoothing spline ANOVA for estimating nonparametric functions to data involving a polychotomous response. The fitting procedure involves minimizing the penalized likelihood in a Reproducing Kernel Hilbert Space. One Step Block SOR-Newton-Raphson Algorithm is used to solve the minimization problem. Generalized CrossValidation or unbiased risk estimation is used to empirically assess the amount of smoothing (which controls the bias and variance trade-off) at each one-step Block SOR-Newton-Raphson iteration. Under some regular smoothness conditions, the one-step Block SOR-Newton-Raphson will produce a sequence which converges to the minimizer of the penalized likelihood for the fixed smoothing parameters. Monte Carlo simulations are conducted to examine the performance of the algorithm. The method is applied to polychotomous data from the Wisconsin Epidemiological Study of Diabetic Retinopathy to estimate the risks of cause-specific mortality given several potential risk factors at the start of the study. Strategies to obtain smoothing spline estimates for large data sets with polychotomous response are also proposed in this thesis. Simulation studies are conducted to check the performance of the proposed method. ii Acknowledgements I would like to express my sincerest gratitude to my advisor, Professor Grace Wahba, for her invaluable advice during the course of this dissertation. Appreciation is extended to Professors Michael Kosorok, Mary Lindstrom, Olvi Mangasarian, and Kam-Wah Tsui for their service on my final examination committee, their careful reading of this thesis and their valuable comments. I would like to thank Ronald Klein, MD and Barbara Klein, MD for providing the WESDR data. Fellow graduate students Fangy...
Approximate methods for propagation of uncertainty with gaussian process models. Doctoral dissertation
, 2004
"... This thesis presents extensions of the Gaussian Process (GP) model, based on approximate methods allowing the model to deal with input uncertainty. Zero-mean GPs with Gaussian covariance function are of particular interest, as they allow to carry out many derivations exactly, as well as having been ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This thesis presents extensions of the Gaussian Process (GP) model, based on approximate methods allowing the model to deal with input uncertainty. Zero-mean GPs with Gaussian covariance function are of particular interest, as they allow to carry out many derivations exactly, as well as having been shown to have modelling abilities and predictive performance comparable to that of neural networks (Rasmussen, 1996a). With this model, given observed data and a new input, making a prediction corresponds to computing the (Gaussian) predictive distribution of the associated output, whose mean can be used as an estimate. This way, the predictive variance provides error-bars or confidence intervals on this estimate: It quantifies the model’s degree of belief in its ‘best guess’. Using the knowledge of the predictive variance in an informative manner is at the centre of this thesis, as the problems of how to propagate it in the model, how to account for it when derivative observations are available, and how to derive a control law with a cautious behaviour are addressed. The task of making a prediction when the new input presented to the model is noisy is introduced. Assuming a normally distributed input, only the mean and variance of the corresponding non-Gaussian predictive distribution are computed (Gaussian approximation). Depending on the parametric form of
Tree Structured Non-linear Signal Modeling and Prediction
- Proc. of the IEEE 1995 International Conference on Acoustics, Speech and Signal Processing
"... Abstract—In this paper, we develop a regression tree approach to identification and prediction of signals that evolve according to an unknown nonlinear state space model. In this approach, a tree is recursively constructed that partitions the �-dimensional state space into a collection of piecewise ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract—In this paper, we develop a regression tree approach to identification and prediction of signals that evolve according to an unknown nonlinear state space model. In this approach, a tree is recursively constructed that partitions the �-dimensional state space into a collection of piecewise homogeneous regions utilizing a P �-ary splitting rule with an entropy-based node impurity criterion. On this partition, the joint density of the state is approximately piecewise constant, leading to a nonlinear predictor that nearly attains minimum mean square error. This process decomposition is closely related to a generalized version of the thresholded AR signal model (ART), which we call piecewise constant AR (PCAR). We illustrate the method for two cases where classical linear prediction is ineffective: a chaotic “doublescroll” signal measured at the output of a Chua-type electronic circuit and a second-order ART model. We show that the prediction errors are comparable with the nearest neighbor approach to nonlinear prediction but with greatly reduced complexity. Index Terms—Chaotic signal analysis, nonlinear and nonparametric modeling and prediction, piecewise constant AR models, recursive partitioning, regression trees. I.
Adaptive Tuning, Four Dimensional Variational DATA ASSIMILATION, AND REPRESENTERS IN RKHS
- Deptartment of Statistics, University of Wisconsin, Madison WI
, 1998
"... this paper we then (i) review the use of model errors as dual variables, (ii) review the GCV and generalized maximum likelihood (GML) tuning methods, and pinpoint sensitivity issues as tunable parameters are sprinkled liberally throughout the weak 4D-Var problem, noting that they can be studied in t ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
this paper we then (i) review the use of model errors as dual variables, (ii) review the GCV and generalized maximum likelihood (GML) tuning methods, and pinpoint sensitivity issues as tunable parameters are sprinkled liberally throughout the weak 4D-Var problem, noting that they can be studied in the influence matrix (or influence operator in the nonlinear case). Then (iii) we describe some simple models for correlated model errors and the simultaneous consideration of systematic (bias), short memory and long memory correlation. We end with (iv) a summary of some representer theory in reproducing kernel Hilbert space (RKHS) relevant to the weak 4D-Var setting. Let t = 1; \Delta \Delta \Delta ; T denote discrete time and let \Psi t ; t = 1; \Delta \Delta \Delta T be a sequence of state vectors representing (some part of) nature that evolves according to

