Results 1  10
of
110
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 309 (31 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 266 (33 self)
 Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Correlationbased feature selection for machine learning
, 1998
"... A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that ..."
Abstract

Cited by 139 (3 self)
 Add to MetaCart
A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set.
A unified framework for Regularization Networks and Support Vector Machines
, 1999
"... This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Researc ..."
Abstract

Cited by 50 (13 self)
 Add to MetaCart
This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Research under contractN o.N 0001493 10385 and contractN o.N 000149510600. Partial support was also provided by DaimlerBenz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T. Contents Introductic 3 2 OverviF of stati.48EF learni4 theory 5 2.1 Unifo6 Co vergence and the VapnikChervo nenkis bo und ............. 7 2.2 The metho d o Structural Risk Minimizatio ..................... 10 2.3 #unifo8 co vergence and the V # ..................... 10 2.4 Overviewo fo urappro6 h ............................... 13 3 Reproduci9 Kernel HiT ert Spaces: a briL overviE 14 4RegulariEqq.L Networks 16 4.1 Radial Basis Functio8 ................................. 19 4.2 Regularizatioz generalized splines and kernel smo oxy rs .............. 20 4.3 Dual representatio o f Regularizatio Netwo rks ................... 21 4.4 Fro regressioto 5 Support vector machiT9 22 5.1 SVMin RKHS ..................................... 22 5.2 Fro regressioto 6SRMforRNsandSVMs 26 6.1 SRMfo SVMClassificatio .............................. 28 6.1.1 Distributio dependent bo undsfo SVMC .................. 29 7 A BayesiL Interpretatiq ofRegulariTFqEL and SRM? 30 7.1 Maximum A Po terio6 Interpretatio o f ............... 30 7.2 Bayesian interpretatio o f the stabilizer in the RN andSVMfunctio6I6 ...... 32 7.3 Bayesian interpretatio o f the data term in the Regularizatio andSVMfunctioy8 33 7.4 Why a MAP interpretatio may be misleading .................... 33 Connectine between SVMs and Sparse Ap...
Blur Identification by the Method of Generalized CrossValidation
 IEEE Trans. Image Processing
, 1991
"... The pointspread function (PSF) of a blurred image is often unknown a priori  the blur must first be identified from the degraded image data before restoring the image. We introduce generalized crossvalidation (GCV) to address the blur identification problem. Motivated by the success of GCV in i ..."
Abstract

Cited by 47 (1 self)
 Add to MetaCart
The pointspread function (PSF) of a blurred image is often unknown a priori  the blur must first be identified from the degraded image data before restoring the image. We introduce generalized crossvalidation (GCV) to address the blur identification problem. Motivated by the success of GCV in identifying optimal smoothing parameters for image restoration, we have extended the method to the problem of identifying blur parameters as well. The GCV criterion identifies model parameters for the blur, the image, and the regularization parameter, providing all the information necessary to restore the image. Experiments are presented which show that GCV is capable of yielding good identification results. Furthermore, a comparison of the GCV criterion to maximum likelihood (ML) estimation shows that GCV often outperforms ML in identifying the blur and image model parameters. To appear in IEEE Transactions on Image Processing. This work was supported in part by the Joint Services Electroni...
Subspace information criterion for model selection
 Neural Computation
, 2001
"... The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC), which is a generalization of Mallows ’ C L. It is a ..."
Abstract

Cited by 41 (28 self)
 Add to MetaCart
The problem of model selection is considerably important for acquiring higher levels of generalization capability in supervised learning. In this paper, we propose a new criterion for model selection called the subspace information criterion (SIC), which is a generalization of Mallows ’ C L. It is assumed that the learning target function belongs to a specified functional Hilbert space and the generalization error is defined as the Hilbert space squared norm of the difference between the learning result function and target function. SIC gives an unbiased estimate of the generalization error so defined. SIC assumes the availability of an unbiased estimate of the target function and the noise covariance matrix, which are generally unknown. A practical calculation method of SIC for least mean squares learning is provided under the assumption that the dimension of the Hilbert space is less than the number of training examples. Finally, computer simulations in two examples show that SIC works well even when the number of training examples is small.
Regularisation in the Selection of Radial Basis Function Centres
 NEURAL COMPUTATION
, 1995
"... Subset selection and regularisation are two well known techniques which can improve the generalisation performance of nonparametric linear regression estimators, such as radial basis function networks. This paper examines regularised forward selection (RFS)  a combination of forward subset selecti ..."
Abstract

Cited by 30 (7 self)
 Add to MetaCart
Subset selection and regularisation are two well known techniques which can improve the generalisation performance of nonparametric linear regression estimators, such as radial basis function networks. This paper examines regularised forward selection (RFS)  a combination of forward subset selection and zeroorder regularisation. An efficient implementation of RFS into which either delete1 or generalised crossvalidation can be incorporated and a reestimation formula for the regularisation parameter are also discussed. Simulation studies are presented which demonstrate improved generalisation performance due to regularisation in the forward selection of radial basis function centres.
Datadriven calibration of penalties for leastsquares regression
, 2008
"... Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from data. We propose a completely datadriven calibration algorithm for these parameters in the leastsquares regression framework, without assuming a parti ..."
Abstract

Cited by 29 (10 self)
 Add to MetaCart
Penalization procedures often suffer from their dependence on multiplying factors, whose optimal values are either unknown or hard to estimate from data. We propose a completely datadriven calibration algorithm for these parameters in the leastsquares regression framework, without assuming a particular shape for the penalty. Our algorithm relies on the concept of minimal penalty, recently introduced by Birgé and Massart (2007) in the context of penalized least squares for Gaussian homoscedastic regression. On the positive side, the minimal penalty can be evaluated from the data themselves, leading to a datadriven estimation of an optimal penalty which can be used in practice; on the negative side, their approach heavily relies on the homoscedastic Gaussian nature of their stochastic framework. The purpose of this paper is twofold: stating a more general heuristics for designing a datadriven penalty (the slope heuristics) and proving that it works for penalized leastsquares regression with a random design, even for heteroscedastic nonGaussian data. For technical reasons, some exact mathematical results will be proved only for regressogram binwidth selection. This is at least a first step towards further results, since the approach and the method that we use are indeed general.
Efficient LeaveOneOut CrossValidation of Kernel Fisher Discriminant Classifiers
 PATTERN RECOGNITION
, 2003
"... Mika et al. [1] apply the "kernel trick" to obtain a nonlinear variant of Fisher's linear discriminant analysis method, demonstrating stateoftheart performance on a range of benchmark datasets. We show that leaveoneout crossvalidation of kernel Fisher discriminant classifiers can be implement ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Mika et al. [1] apply the "kernel trick" to obtain a nonlinear variant of Fisher's linear discriminant analysis method, demonstrating stateoftheart performance on a range of benchmark datasets. We show that leaveoneout crossvalidation of kernel Fisher discriminant classifiers can be implemented with a computational complexity of only O(l³) operations rather than the O(l^4) of a nave implementation, where l is the number of training patterns. Leaveoneout crossvalidation then becomes an attractive means of model selection in largescale applications of kernel Fisher discriminant analysis, being significantly faster than conventional kfold crossvalidation procedures commonly used.