Results 1  10
of
47
A tutorial on support vector regression
, 2004
"... In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing ..."
Abstract

Cited by 828 (3 self)
 Add to MetaCart
In this tutorial we give an overview of the basic ideas underlying Support Vector (SV) machines for function estimation. Furthermore, we include a summary of currently used algorithms for training SV machines, covering both the quadratic (or convex) programming part and advanced methods for dealing with large datasets. Finally, we mention some modifications and extensions that have been applied to the standard SV algorithm, and discuss the aspect of regularization from a SV perspective.
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 397 (33 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 366 (38 self)
 Add to MetaCart
(Show Context)
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
A Survey of Dimension Reduction Techniques
, 2002
"... this paper, we assume that we have n observations, each being a realization of the p dimensional random variable x = (x 1 , . . . , x p ) with mean E(x) = = ( 1 , . . . , p ) and covariance matrix E{(x )(x = # pp . We denote such an observation matrix by X = i,j : 1 p, 1 ..."
Abstract

Cited by 140 (0 self)
 Add to MetaCart
(Show Context)
this paper, we assume that we have n observations, each being a realization of the p dimensional random variable x = (x 1 , . . . , x p ) with mean E(x) = = ( 1 , . . . , p ) and covariance matrix E{(x )(x = # pp . We denote such an observation matrix by X = i,j : 1 p, 1 n}. If i and # i = # (i,i) denote the mean and the standard deviation of the ith random variable, respectively, then we will often standardize the observations x i,j by (x i,j i )/ # i , where i = x i = 1/n j=1 x i,j , and # i = 1/n j=1 (x i,j x i )
An introduction to boosting and leveraging
 Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
(Show Context)
On a Kernelbased Method for Pattern Recognition, Regression, Approximation, and Operator Inversion
, 1997
"... We present a Kernelbased framework for Pattern Recognition, Regression Estimation, Function Approximation and multiple Operator Inversion. Previous approaches such as ridgeregression, Support Vector methods and regression by Smoothing Kernels are included as special cases. We will show connection ..."
Abstract

Cited by 94 (24 self)
 Add to MetaCart
We present a Kernelbased framework for Pattern Recognition, Regression Estimation, Function Approximation and multiple Operator Inversion. Previous approaches such as ridgeregression, Support Vector methods and regression by Smoothing Kernels are included as special cases. We will show connections between the costfunction and some properties up to now believed to apply to Support Vector Machines only. The optimal solution of all the problems described above can be found by solving a simple quadratic programming problem. The paper closes with a proof of the equivalence between Support Vector kernels and Greene's functions of regularization operators.
Priors, Stabilizers and Basis Functions: from regularization to radial, tensor and additive splines
, 1993
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular we had discussed how standard smoothness functionals lead to a subclass of regularization networks, th ..."
Abstract

Cited by 86 (15 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular we had discussed how standard smoothness functionals lead to a subclass of regularization networks, the wellknown Radial Basis Functions approximation schemes. In this paper weshow that regularization networks encompass amuch broader range of approximation schemes, including many of the popular general additivemodels and some of the neural networks. In particular weintroduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same extension that leads from Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additivemodels to ridge approximation models, containing as special cases Breiman's hinge functions and some forms of Projection Pursuit Regression. We propose to use the term GeneralizedRegularization Networks for this broad class of approximation schemes that follow from an extension of regularization. In the probabilistic interpretation of regularization, the different classes of basis functions correspond to different classes of prior probabilities on the approximating function spaces, and therefore to differenttypes of smoothness assumptions. In the final part of the paper, weshow the relation between activation functions of the Gaussian and sigmoidal type by considering the simple case of the kernel G(x)=x. In summary,
No Unbiased Estimator of the Variance of KFold CrossValidation
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... Most machine learning researchers perform quantitative experiments to estimate generalization error and compare the performance of different algorithms (in particular, their proposed algorithm). In order to be able to draw statistically convincing conclusions, it is important for them to also est ..."
Abstract

Cited by 60 (1 self)
 Add to MetaCart
(Show Context)
Most machine learning researchers perform quantitative experiments to estimate generalization error and compare the performance of different algorithms (in particular, their proposed algorithm). In order to be able to draw statistically convincing conclusions, it is important for them to also estimate the uncertainty around the error (or error difference) estimate. This paper studies the very commonly used Kfold crossvalidation estimator of generalization performance. The main theorem shows that there exists no universal (valid under all distributions) unbiased estimator of the variance of Kfold crossvalidation. The analysis that accompanies this result is based on the eigendecomposition of the covariance matrix of errors, which has only three different eigenvalues corresponding to three degrees of freedom of the matrix and three components of the total variance. This analysis helps to better understand the nature of the problem and how it can make naive estimators (that don't take into account the error correlations due to the overlap between training and test sets) grossly underestimate variance. This is confirmed by numerical experiments in which the three components of the variance are compared when the difficulty of the learning problem and the number of folds are varied.
A unified framework for Regularization Networks and Support Vector Machines
, 1999
"... This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Res ..."
Abstract

Cited by 57 (12 self)
 Add to MetaCart
(Show Context)
This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Research under contractN o.N 0001493 10385 and contractN o.N 000149510600. Partial support was also provided by DaimlerBenz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T. Contents Introductic 3 2 OverviF of stati.48EF learni4 theory 5 2.1 Unifo6 Co vergence and the VapnikChervo nenkis bo und ............. 7 2.2 The metho d o Structural Risk Minimizatio ..................... 10 2.3 #unifo8 co vergence and the V # ..................... 10 2.4 Overviewo fo urappro6 h ............................... 13 3 Reproduci9 Kernel HiT ert Spaces: a briL overviE 14 4RegulariEqq.L Networks 16 4.1 Radial Basis Functio8 ................................. 19 4.2 Regularizatioz generalized splines and kernel smo oxy rs .............. 20 4.3 Dual representatio o f Regularizatio Netwo rks ................... 21 4.4 Fro regressioto 5 Support vector machiT9 22 5.1 SVMin RKHS ..................................... 22 5.2 Fro regressioto 6SRMforRNsandSVMs 26 6.1 SRMfo SVMClassificatio .............................. 28 6.1.1 Distributio dependent bo undsfo SVMC .................. 29 7 A BayesiL Interpretatiq ofRegulariTFqEL and SRM? 30 7.1 Maximum A Po terio6 Interpretatio o f ............... 30 7.2 Bayesian interpretatio o f the stabilizer in the RN andSVMfunctio6I6 ...... 32 7.3 Bayesian interpretatio o f the data term in the Regularizatio andSVMfunctioy8 33 7.4 Why a MAP interpretatio may be misleading .................... 33 Connectine between SVMs and Sparse Ap...
Least absolute shrinkage is equivalent to quadratic penalization
 of Perspectives in Neural Computing
, 1998
"... Adaptive ridge is a special form of ridge regression, balancing the quadratic penalization on each parameter of the model. This paper shows the equivalence between adaptive ridge and lasso (least absolute shrinkage and selection operator). This equivalence states that both procedures produce the sam ..."
Abstract

Cited by 49 (6 self)
 Add to MetaCart
(Show Context)
Adaptive ridge is a special form of ridge regression, balancing the quadratic penalization on each parameter of the model. This paper shows the equivalence between adaptive ridge and lasso (least absolute shrinkage and selection operator). This equivalence states that both procedures produce the same estimate. Least absolute shrinkage can thus be viewed as a particular quadratic penalization. From this observation, we derive an EM algorithm to compute the lasso solution. We finally present a series of applications of this type of algorithm in regression problems: kernel regression, additive modeling and neural net training. 1