Results 1  10
of
47
Multicategory Support Vector Machines, theory, and application to the classification of microarray data and satellite radiance data
 Journal of the American Statistical Association
, 2004
"... Twocategory support vector machines (SVM) have been very popular in the machine learning community for classi � cation problems. Solving multicategory problems by a series of binary classi � ers is quite common in the SVM paradigm; however, this approach may fail under various circumstances. We pro ..."
Abstract

Cited by 175 (17 self)
 Add to MetaCart
Twocategory support vector machines (SVM) have been very popular in the machine learning community for classi � cation problems. Solving multicategory problems by a series of binary classi � ers is quite common in the SVM paradigm; however, this approach may fail under various circumstances. We propose the multicategory support vector machine (MSVM), which extends the binary SVM to the multicategory case and has good theoretical properties. The proposed method provides a unifying framework when there are either equal or unequal misclassi � cation costs. As a tuning criterion for the MSVM, an approximate leaveoneout crossvalidation function, called Generalized Approximate Cross Validation, is derived, analogous to the binary case. The effectiveness of the MSVM is demonstrated through the applications to cancer classi � cation using microarray data and cloud classi � cation with satellite radiance pro � les.
Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Randomized GACV
, 1998
"... this paper we very briefly review some of these results. RKHS can be chosen tailored to the problem at hand in many ways, and we review a few of them, including radial basis function and smoothing spline ANOVA spaces. Girosi (1997), Smola and Scholkopf (1997), Scholkopf et al (1997) and others have ..."
Abstract

Cited by 150 (11 self)
 Add to MetaCart
this paper we very briefly review some of these results. RKHS can be chosen tailored to the problem at hand in many ways, and we review a few of them, including radial basis function and smoothing spline ANOVA spaces. Girosi (1997), Smola and Scholkopf (1997), Scholkopf et al (1997) and others have noted the relationship between SVM's and penalty methods as used in the statistical theory of nonparametric regression. In Section 1.2 we elaborate on this, and show how replacing the likelihood functional of the logit (log odds ratio) in penalized likelihood methods for Bernoulli [yesno] data, with certain other functionals of the logit (to be called SVM functionals) results in several of the SVM's that are of modern research interest. The SVM functionals we consider more closely resemble a "goodnessoffit" measured by classification error than a "goodnessoffit" measured by the comparative KullbackLiebler distance, which is frequently associated with likelihood functionals. This observation is not new or profound, but it is hoped that the discussion here will help to bridge the conceptual gap between classical nonparametric regression via penalized likelihood methods, and SVM's in RKHS. Furthermore, since SVM's can be expected to provide more compact representations of the desired classification boundaries than boundaries based on estimating the logit by penalized likelihood methods, they have potential as a prescreening or model selection tool in sifting through many variables or regions of attribute space to find influential quantities, even when the ultimate goal is not classification, but to understand how the logit varies as the important variables change throughout their range. This is potentially applicable to the variable/model selection problem in demographic m...
Smoothing Spline ANOVA for Exponential Families, with Application to the Wisconsin Epidemiological Study of Diabetic Retinopathy
 ANN. STATIST
, 1995
"... Let y i ; i = 1; \Delta \Delta \Delta ; n be independent observations with the density of y i of the form h(y i ; f i ) = exp[y i f i \Gammab(f i )+c(y i )], where b and c are given functions and b is twice continuously differentiable and bounded away from 0. Let f i = f(t(i)), where t = (t 1 ; \De ..."
Abstract

Cited by 83 (44 self)
 Add to MetaCart
Let y i ; i = 1; \Delta \Delta \Delta ; n be independent observations with the density of y i of the form h(y i ; f i ) = exp[y i f i \Gammab(f i )+c(y i )], where b and c are given functions and b is twice continuously differentiable and bounded away from 0. Let f i = f(t(i)), where t = (t 1 ; \Delta \Delta \Delta ; t d ) 2 T (1)\Omega \Delta \Delta \Delta\Omega T (d) = T , the T (ff) are measureable spaces of rather general form, and f is an unknown function on T with some assumed `smoothness' properties. Given fy i ; t(i); i = 1; \Delta \Delta \Delta ; ng, it is desired to estimate f(t) for t in some region of interest contained in T . We develop the fitting of smoothing spline ANOVA models to this data of the form f(t) = C + P ff f ff (t ff ) + P ff!fi f fffi (t ff ; t fi ) + \Delta \Delta \Delta. The components of the decomposition satisfy side conditions which generalize the usual side conditions for parametric ANOVA. The estimate of f is obtained as the minimizer...
Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV
 Ann. Statist
"... (ranGACV) method for choosing multiple smoothing parameters in penalized likelihood estimates for Bernoulli data. The method is intended for application with penalized likelihood smoothing spline ANOVA models. In addition we propose a class of approximate numerical methods for solving the penalized ..."
Abstract

Cited by 41 (19 self)
 Add to MetaCart
(ranGACV) method for choosing multiple smoothing parameters in penalized likelihood estimates for Bernoulli data. The method is intended for application with penalized likelihood smoothing spline ANOVA models. In addition we propose a class of approximate numerical methods for solving the penalized likelihood variational problem which, in conjunction with the ranGACV method allows the application of smoothing spline ANOVA models with Bernoulli data to much larger data sets than previously possible. These methods are based on choosing an approximating subset of the natural (representer) basis functions for the variational problem. Simulation studies with synthetic data, including synthetic data mimicking demographic risk factor data sets is used to examine the properties of the method and to compare the approach with the GRKPACK code of Wang (1997c). Bayesian “confidence intervals ” are obtained for the fits and are shown in the simulation studies to have the “across the function ” property usually claimed for these confidence intervals. Finally the method is applied
LASSOPatternsearch Algorithm with Application to Ophthalmology and Genomic Data
, 2008
"... The LASSOPatternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. ..."
Abstract

Cited by 29 (22 self)
 Add to MetaCart
The LASSOPatternsearch algorithm is proposed to efficiently identify patterns of multiple dichotomous risk factors for outcomes of interest in demographic and genomic studies. The patterns considered are those that arise naturally from the log linear expansion of the multivariate Bernoulli density. The method is designed for the case where there is a possibly very large number of candidate patterns but it is believed that only a relatively small number are important. A LASSO is used to greatly reduce the number of candidate patterns, using a novel computational algorithm that can handle an extremely large number of unknowns simultaneously. The patterns surviving the LASSO are further pruned in the framework of (parametric) generalized linear models. A novel tuning procedure based on the GACV for Bernoulli outcomes, modified to act
Component Selection and Smoothing in Smoothing Spline Analysis of Variance Models
 COSSO. INSTITUTE OF STATISTICS MIMEO SERIES 2556, NCSU
, 2003
"... We propose a new method for model selection and model fitting in nonparametric regression models, in the framework of smoothing spline ANOVA. The "COSSO" is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditi ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
We propose a new method for model selection and model fitting in nonparametric regression models, in the framework of smoothing spline ANOVA. The "COSSO" is a method of regularization with the penalty functional being the sum of component norms, instead of the squared norm employed in the traditional smoothing spline method. The COSSO provides a unified framework for several recent proposals for model selection in linear models and smoothing spline ANOVA models. Theoretical properties, such as the existence and the rate of convergence of the COSSO estimator, are studied. In the special case of a tensor product design with periodic functions, a detailed analysis reveals that the COSSO applies a novel soft thresholding type operation to the function components and selects the correct model structure with probability tending to one. We give
Variable Selection and Model Building via Likelihood Basis Pursuit
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 2002
"... This paper presents a nonparametric penalized likelihood approach for variable selection and model building, called likelihood basis pursuit (LBP). In the setting of a tensor product reproducing kernel Hilbert space, we decompose the log likelihood into the sum of different functional components suc ..."
Abstract

Cited by 22 (10 self)
 Add to MetaCart
This paper presents a nonparametric penalized likelihood approach for variable selection and model building, called likelihood basis pursuit (LBP). In the setting of a tensor product reproducing kernel Hilbert space, we decompose the log likelihood into the sum of different functional components such as main effects and interactions, with each component represented by appropriate basis functions. The basis functions are chosen to be compatible with variable selection and model building in the context of a smoothing spline ANOVA model. Basis pursuit is applied to obtain the optimal decomposition in terms of having the smallest l 1 norm on the coefficients. We use the functional L 1 norm to measure the importance of each component and determine the "threshold" value by a sequential Monte Carlo bootstrap test algorithm. As a generalized LASSOtype method, LBP produces shrinkage estimates for the coefficients, which greatly facilitates the variable selection process, and provides highly interpretable multivariate functional estimates at the same time. To choose the regularization parameters appearing in the LBP models, generalized approximate cross validation (GACV) is derived as a tuning criterion. To make GACV widely applicable to large data sets, its randomized version is proposed as well. A technique "slice modeling" is used to solve the optimization problem and makes the computation more efficient. LBP has great potential for a wide range of research and application areas such as medical studies, and in this paper we apply it to two large ongoing epidemiological studies: the Wisconsin Epidemiological Study of Diabetic Retinopathy (WESDR) and the Beaver Dam Eye Study (BDES).
The BiasVariance Tradeoff and the Randomized GACV
 Advances in Neural Information Processing Systems
, 1999
"... We propose a new insample cross validation based method (randomized GACV) for choosing smoothing or bandwidth parameters that govern the biasvariance or fitcomplexity tradeoff in `soft' classification. Soft classification refers to a learning procedure which estimates the probability that an exam ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
We propose a new insample cross validation based method (randomized GACV) for choosing smoothing or bandwidth parameters that govern the biasvariance or fitcomplexity tradeoff in `soft' classification. Soft classification refers to a learning procedure which estimates the probability that an example with a given attribute vector is in class 1 vs class 0. The target for optimizing the the tradeoff is the KullbackLiebler distance between the estimated probability distribution and the `true' probability distribution, representing knowledge of an infinite population. The method uses a randomized estimate of the trace of a Hessian and mimics cross validation at the cost of a single relearning with perturbed outcome data. 1 INTRODUCTION We propose and test a new insample crossvalidation based method for optimizing the biasvariance tradeoff in `soft classification' (Wahba et al 1994), called ranGACV (randomized Generalized Approximate Cross Validation). Summarizing from Wahba et al(199...
Optimal Properties and Adaptive Tuning of Standard and Nonstandard Support Vector Machines
 IN PROCEEDINGS OF THE MSRI BERKELEY WORKSHOP ON
, 2002
"... We review some of the basic ideas of Support Vector Machines (SVM's) for classification, with the goal of describing how these ideas can sit comfortably inside the statistical literature in decision theory and penalized likelihood regression. We review recent work on adaptive tuning of SVMs, discuss ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
We review some of the basic ideas of Support Vector Machines (SVM's) for classification, with the goal of describing how these ideas can sit comfortably inside the statistical literature in decision theory and penalized likelihood regression. We review recent work on adaptive tuning of SVMs, discussing generalizations to the nonstandard case where the training set is not representative and misclassification costs are not equal. Mention is made of recent results in the multicategory case.
Approximate Smoothing Spline Methods for Large Data Sets in the Binary Case
 DEPARTMENT OF STATISTICS, UNIVERSITY OF WISCONSIN, MADISON WI
, 1997
"... We consider the use of smoothing splines in generalized additive models with binary responses in the large data set situation. Xiang and Wahba (1996) proposed using the Generalized Approximate Cross Validation (GACV ) function as a method to choose (multiple) smoothing parameters in the binary data ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
We consider the use of smoothing splines in generalized additive models with binary responses in the large data set situation. Xiang and Wahba (1996) proposed using the Generalized Approximate Cross Validation (GACV ) function as a method to choose (multiple) smoothing parameters in the binary data case and demonstrated through simulation that the GACV method compares well to existing iterative methods, as judged by the KullbackLeibler distance of the estimate to the true function being fitted. However, the calculation of the GACV function involves solving an n by n linear system, where n is the sample size. As the sample size increases, the calculation becomes numerically unstable and infeasible. To reduce these computational problems we propose a randomized version of the GACV function, which is numerically stable. Furthermore, we use a clustering algorithm to choose a set of basis functions with which to approximate the exact additive smoothing spline estimate, which has a basis function for every data point. Combining these two approaches, we are able to extend smoothing spline methods in the binary response case to much larger data sets without sacrificing much accuracy.