Results 1  10
of
61
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 309 (31 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 266 (33 self)
 Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Support Vector Machines, Reproducing Kernel Hilbert Spaces and the Randomized GACV
, 1998
"... this paper we very briefly review some of these results. RKHS can be chosen tailored to the problem at hand in many ways, and we review a few of them, including radial basis function and smoothing spline ANOVA spaces. Girosi (1997), Smola and Scholkopf (1997), Scholkopf et al (1997) and others have ..."
Abstract

Cited by 150 (11 self)
 Add to MetaCart
this paper we very briefly review some of these results. RKHS can be chosen tailored to the problem at hand in many ways, and we review a few of them, including radial basis function and smoothing spline ANOVA spaces. Girosi (1997), Smola and Scholkopf (1997), Scholkopf et al (1997) and others have noted the relationship between SVM's and penalty methods as used in the statistical theory of nonparametric regression. In Section 1.2 we elaborate on this, and show how replacing the likelihood functional of the logit (log odds ratio) in penalized likelihood methods for Bernoulli [yesno] data, with certain other functionals of the logit (to be called SVM functionals) results in several of the SVM's that are of modern research interest. The SVM functionals we consider more closely resemble a "goodnessoffit" measured by classification error than a "goodnessoffit" measured by the comparative KullbackLiebler distance, which is frequently associated with likelihood functionals. This observation is not new or profound, but it is hoped that the discussion here will help to bridge the conceptual gap between classical nonparametric regression via penalized likelihood methods, and SVM's in RKHS. Furthermore, since SVM's can be expected to provide more compact representations of the desired classification boundaries than boundaries based on estimating the logit by penalized likelihood methods, they have potential as a prescreening or model selection tool in sifting through many variables or regions of attribute space to find influential quantities, even when the ultimate goal is not classification, but to understand how the logit varies as the important variables change throughout their range. This is potentially applicable to the variable/model selection problem in demographic m...
Solving illconditioned and singular linear systems: A tutorial on regularization
 SIAM Rev
, 1998
"... Abstract. It is shown that the basic regularization procedures for finding meaningful approximate solutions of illconditioned or singular linear systems can be phrased and analyzed in terms of classical linear algebra that can be taught in any numerical analysis course. Apart from rewriting many kn ..."
Abstract

Cited by 81 (2 self)
 Add to MetaCart
Abstract. It is shown that the basic regularization procedures for finding meaningful approximate solutions of illconditioned or singular linear systems can be phrased and analyzed in terms of classical linear algebra that can be taught in any numerical analysis course. Apart from rewriting many known results in a more elegant form, we also derive a new twoparameter family of merit functions for the determination of the regularization parameter. The traditional merit functions from generalized cross validation (GCV) and generalized maximum likelihood (GML) are recovered as special cases.
Smoothing Spline Models for the Analysis of Nested and Crossed Samples of Curves
 Journal of the American Statistical Association
, 1998
"... We introduce a class of models for an additive decomposition of groups of curves strati ed by crossed and nested factors, generalizing smoothing splines to such samples by associating them with a corresponding mixed e ects model. The models are also useful for imputation of missing data and explorat ..."
Abstract

Cited by 80 (1 self)
 Add to MetaCart
We introduce a class of models for an additive decomposition of groups of curves strati ed by crossed and nested factors, generalizing smoothing splines to such samples by associating them with a corresponding mixed e ects model. The models are also useful for imputation of missing data and exploratory analysis of variance. We prove that the best linear unbiased predictors (BLUP) from the extended mixed e ects model correspond to solutions of a generalized penalized regression where smoothing parameters are directly related to variance components, and we show that these solutions are natural cubic splines. The model parameters are estimated using a highly e cient implementation of the EM algorithm for restricted maximum likelihood (REML) estimation based on a preliminary eigenvector decomposition. Variability of computed estimates can be assessed with asymptotic techniques or with a novel hierarchical bootstrap resampling scheme for nested mixed e ects models. Our methods are applied to menstrual cycle data from studies of reproductive function that measure daily urinary progesterone; the sample of progesterone curves is strati ed by cycles nested within subjects nested within conceptive and nonconceptive groups.
Comparison of Approximate Methods for Handling Hyperparameters
 NEURAL COMPUTATION
"... I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evid ..."
Abstract

Cited by 67 (1 self)
 Add to MetaCart
I examine two approximate methods for computational implementation of Bayesian hierarchical models, that is, models which include unknown hyperparameters such as regularization constants and noise levels. In the 'evidence framework' the model parameters are integrated over, and the resulting evidence is maximized over the hyperparameters. The optimized
A unified framework for Regularization Networks and Support Vector Machines
, 1999
"... This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Researc ..."
Abstract

Cited by 50 (13 self)
 Add to MetaCart
This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Research under contractN o.N 0001493 10385 and contractN o.N 000149510600. Partial support was also provided by DaimlerBenz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T. Contents Introductic 3 2 OverviF of stati.48EF learni4 theory 5 2.1 Unifo6 Co vergence and the VapnikChervo nenkis bo und ............. 7 2.2 The metho d o Structural Risk Minimizatio ..................... 10 2.3 #unifo8 co vergence and the V # ..................... 10 2.4 Overviewo fo urappro6 h ............................... 13 3 Reproduci9 Kernel HiT ert Spaces: a briL overviE 14 4RegulariEqq.L Networks 16 4.1 Radial Basis Functio8 ................................. 19 4.2 Regularizatioz generalized splines and kernel smo oxy rs .............. 20 4.3 Dual representatio o f Regularizatio Netwo rks ................... 21 4.4 Fro regressioto 5 Support vector machiT9 22 5.1 SVMin RKHS ..................................... 22 5.2 Fro regressioto 6SRMforRNsandSVMs 26 6.1 SRMfo SVMClassificatio .............................. 28 6.1.1 Distributio dependent bo undsfo SVMC .................. 29 7 A BayesiL Interpretatiq ofRegulariTFqEL and SRM? 30 7.1 Maximum A Po terio6 Interpretatio o f ............... 30 7.2 Bayesian interpretatio o f the stabilizer in the RN andSVMfunctio6I6 ...... 32 7.3 Bayesian interpretatio o f the data term in the Regularizatio andSVMfunctioy8 33 7.4 Why a MAP interpretatio may be misleading .................... 33 Connectine between SVMs and Sparse Ap...
Blur Identification by the Method of Generalized CrossValidation
 IEEE Trans. Image Processing
, 1991
"... The pointspread function (PSF) of a blurred image is often unknown a priori  the blur must first be identified from the degraded image data before restoring the image. We introduce generalized crossvalidation (GCV) to address the blur identification problem. Motivated by the success of GCV in i ..."
Abstract

Cited by 46 (1 self)
 Add to MetaCart
The pointspread function (PSF) of a blurred image is often unknown a priori  the blur must first be identified from the degraded image data before restoring the image. We introduce generalized crossvalidation (GCV) to address the blur identification problem. Motivated by the success of GCV in identifying optimal smoothing parameters for image restoration, we have extended the method to the problem of identifying blur parameters as well. The GCV criterion identifies model parameters for the blur, the image, and the regularization parameter, providing all the information necessary to restore the image. Experiments are presented which show that GCV is capable of yielding good identification results. Furthermore, a comparison of the GCV criterion to maximum likelihood (ML) estimation shows that GCV often outperforms ML in identifying the blur and image model parameters. To appear in IEEE Transactions on Image Processing. This work was supported in part by the Joint Services Electroni...
Variable Selection for Cox's Proportional Hazards Model and Frailty Model
 ANNALS OF STATISTICS
, 2002
"... A class of variable selection procedures for parametric models via nonconcave penalized likelihood was proposed in Fan and Li (2001a). It has been shown there that the resulting procedures perform as well as if the subset of significant variables were known in advance. Such a property is called an o ..."
Abstract

Cited by 46 (11 self)
 Add to MetaCart
A class of variable selection procedures for parametric models via nonconcave penalized likelihood was proposed in Fan and Li (2001a). It has been shown there that the resulting procedures perform as well as if the subset of significant variables were known in advance. Such a property is called an oracle property. The proposed procedures were illustrated in the context of linear regression, robust linear regression and generalized linear models. In this paper, the nonconcave penalized likelihood approach is extended further to the Cox proportional hazards model and the Cox proportional hazards frailty model, two commonly used semiparametric models in survival analysis. As a result, new variable selection procedures for these two commonlyused models are proposed. It is demonstrated how the rates of convergence depend on the regularization parameter in the penalty function. Further, with a proper choice of the regularization parameter and the penalty function, the proposed estimators possess an oracle property. Standard error formulae are derived and their accuracies are empirically tested. Simulation studies show that the proposed procedures are more stable in prediction and more effective in computation than the best subset variable selection, and they reduce model complexity as effectively as the best subset variable selection. Compared with the LASSO, which is the penalized likelihood method with the L1penalty, proposed by Tibshirani, the newly proposed approaches have better theoretic properties and finite sample performance.
Inference in Generalized Additive Mixed Models Using Smoothing Splines
, 1999
"... this paper, we propose generalized additive mixed models (GAMMs), which are an additive extension of generalized linear mixed models in the spirit of Hastie and Tibshirani (1990). This new class of models uses additive nonparametric functions to model covariate effects while accounting for overdispe ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
this paper, we propose generalized additive mixed models (GAMMs), which are an additive extension of generalized linear mixed models in the spirit of Hastie and Tibshirani (1990). This new class of models uses additive nonparametric functions to model covariate effects while accounting for overdispersion and correlation by adding random effects to the additive predictor. GAMMs encompass nested and crossed designs and are applicable to clustered, hierarchical and spatial data. We estimate the nonparametric functions using smoothing splines, and jointly estimate the smoothing parameters and the variance components using marginal quasilikelihood. This marginal quasilikelihood approach is an extension of the restricted maximum likelihood approach used by Wahba (1985) and Kohn, et al. (1991) in the classical nonparametric regression model (Kohn, et al. 1991, eq 2.1), and by Zhang, et al. (1998) in Gaussian nonparametric mixed models, where they treated the smoothing parameter as an extra variance component. In view of numerical integration often required by maximizing the objective functions, double penalized quasilikelihood (DPQL) is proposed to make approximate inference. Frequentist and Bayesian inferences are compared. A key feature of the proposed method is that it allows us to make systematic inference on all model components of GAMMs within a unified parametric mixed model framework. Specifically, our estimation of the nonparametric functions, the smoothing parameters and the variance components in GAMMs can proceed by fitting a working GLMM using existing statistical software, which iteratively fits a linear mixed model to a modified dependent variable. When the data are sparse (e.g., binary), the DPQL estimators of the variance components are found to be subject t...