Results 1  10
of
139
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 309 (31 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 266 (33 self)
 Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
A Theory of Networks for Approximation and Learning
 Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract

Cited by 194 (24 self)
 Add to MetaCart
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
The Sample Complexity of Pattern Classification With Neural Networks: The Size of the Weights is More Important Than the Size of the Network
, 1997
"... Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the ne ..."
Abstract

Cited by 177 (15 self)
 Add to MetaCart
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a twolayer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A³ p (log n)=m (ignori...
Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score
, 2000
"... We are interested in estimating the average e#ect of a binary treatment on a scalar outcome. If assignment to the treatment is independent of the potential outcomes given pretreatment variables, biases associated with simple treatmentcontrol average comparisons can be removed by adjusting for di#er ..."
Abstract

Cited by 167 (15 self)
 Add to MetaCart
We are interested in estimating the average e#ect of a binary treatment on a scalar outcome. If assignment to the treatment is independent of the potential outcomes given pretreatment variables, biases associated with simple treatmentcontrol average comparisons can be removed by adjusting for di#erences in the pretreatmentvariables. Rosenbaum and Rubin #1983, 1984# show that adjusting solely for di#erences between treated and control units in a scalar function of the pretreatment variables, the propensity score, also removes the entire bias associated with di#erences in pretreatment variables. Thus it is possible to obtain unbiased estimates of the treatment e#ect without conditioning on a possibly highdimensional vector of pretreatment variables. Although adjusting for the propensity score removes all the bias, this can come at the expense of e#ciency. We show that weighting with the inverse of a nonparametric estimate of the propensity score, rather than the true propensity scor...
Large Sample Sieve Estimation of SemiNonparametric Models
 Handbook of Econometrics
, 2007
"... Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; seminonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method o ..."
Abstract

Cited by 92 (17 self)
 Add to MetaCart
Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; seminonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method of sieves provides one way to tackle such complexities by optimizing an empirical criterion function over a sequence of approximating parameter spaces, called sieves, which are significantly less complex than the original parameter space. With different choices of criteria and sieves, the method of sieves is very flexible in estimating complicated econometric models. For example, it can simultaneously estimate the parametric and nonparametric components in seminonparametric models with or without constraints. It can easily incorporate prior information, often derived from economic theory, such as monotonicity, convexity, additivity, multiplicity, exclusion and nonnegativity. This chapter describes estimation of seminonparametric econometric models via the method of sieves. We present some general results on the large sample properties of the sieve estimates, including consistency of the sieve extremum estimates, convergence rates of the sieve Mestimates, pointwise normality of series estimates of regression functions, rootn asymptotic normality and efficiency of sieve estimates of smooth functionals of infinite dimensional parameters. Examples are used to illustrate the general results.
On the Gibbs phenomenon and its resolution
 SIAM Rev
, 1997
"... Abstract. The nonuniform convergence of the Fourier series for discontinuous functions, and in particular the oscillatory behavior of the finite sum, was already analyzed by Wilbraham in 1848. This was later named the Gibbs phenomenon. This article is a review of the Gibbs phenomenon from a differen ..."
Abstract

Cited by 85 (2 self)
 Add to MetaCart
Abstract. The nonuniform convergence of the Fourier series for discontinuous functions, and in particular the oscillatory behavior of the finite sum, was already analyzed by Wilbraham in 1848. This was later named the Gibbs phenomenon. This article is a review of the Gibbs phenomenon from a different perspective. The Gibbs phenomenon, as we view it, deals with the issue of recovering point values of a function from its expansion coefficients. Alternatively it can be viewed as the possibility of the recovery of local information from global information. The main theme here is not the structure of the Gibbs oscillations but the understanding and resolution of the phenomenon in a general setting. The purpose of this article is to review the Gibbs phenomenon and to show that the knowledge of the expansion coefficients is sufficient for obtaining the point values of a piecewise smooth function, with the same order of accuracy as in the smooth case. This is done by using the finite expansion series to construct a different, rapidly convergent, approximation.
A unified framework for Regularization Networks and Support Vector Machines
, 1999
"... This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Researc ..."
Abstract

Cited by 50 (13 self)
 Add to MetaCart
This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS9800032, the O#ce ofN aval Research under contractN o.N 0001493 10385 and contractN o.N 000149510600. Partial support was also provided by DaimlerBenz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T. Contents Introductic 3 2 OverviF of stati.48EF learni4 theory 5 2.1 Unifo6 Co vergence and the VapnikChervo nenkis bo und ............. 7 2.2 The metho d o Structural Risk Minimizatio ..................... 10 2.3 #unifo8 co vergence and the V # ..................... 10 2.4 Overviewo fo urappro6 h ............................... 13 3 Reproduci9 Kernel HiT ert Spaces: a briL overviE 14 4RegulariEqq.L Networks 16 4.1 Radial Basis Functio8 ................................. 19 4.2 Regularizatioz generalized splines and kernel smo oxy rs .............. 20 4.3 Dual representatio o f Regularizatio Netwo rks ................... 21 4.4 Fro regressioto 5 Support vector machiT9 22 5.1 SVMin RKHS ..................................... 22 5.2 Fro regressioto 6SRMforRNsandSVMs 26 6.1 SRMfo SVMClassificatio .............................. 28 6.1.1 Distributio dependent bo undsfo SVMC .................. 29 7 A BayesiL Interpretatiq ofRegulariTFqEL and SRM? 30 7.1 Maximum A Po terio6 Interpretatio o f ............... 30 7.2 Bayesian interpretatio o f the stabilizer in the RN andSVMfunctio6I6 ...... 32 7.3 Bayesian interpretatio o f the data term in the Regularizatio andSVMfunctioy8 33 7.4 Why a MAP interpretatio may be misleading .................... 33 Connectine between SVMs and Sparse Ap...
On the Relationship Between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions
 NEURAL COMPUTATION
, 1996
"... Feedforward networks are a class of regression techniques that can be used to learn to perform some task from a set of examples. The question of generalization of network performance from a finite training set to unseen data is clearly of crucial importance. In this article we first show that the ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
Feedforward networks are a class of regression techniques that can be used to learn to perform some task from a set of examples. The question of generalization of network performance from a finite training set to unseen data is clearly of crucial importance. In this article we first show that the generalization error can be decomposed in two terms: the approximation error, due to the insufficient representational capacity of a finite sized network, and the estimation error, due to insufficient information about the target function because of the finite number of samples. We then consider the problem of approximating functions belonging to certain Sobolev spaces with Gaussian Radial Basis Functions. Using the above mentioned decomposition we bound the generalization error in terms of the number of basis functions and number of examples. While the bound that we derive is specific for Radial Basis Functions, a number of observations deriving from it apply to any approximation t...
Gmres/cr and Arnoldi/Lanczos as Matrix Approximation Problems
 SIAM J. SCI. COMPUT
"... The GMRES and Arnoldi algorithms, which reduce to the CR and Lanczos algorithms in the symmetric case, both minimize kp(A)bk over polynomials p of degree n. The difference is that p is normalized at z = 0 for GMRES and at z = 1 for Arnoldi. Analogous "ideal GMRES " and "ideal Arnoldi" problems are ..."
Abstract

Cited by 47 (7 self)
 Add to MetaCart
The GMRES and Arnoldi algorithms, which reduce to the CR and Lanczos algorithms in the symmetric case, both minimize kp(A)bk over polynomials p of degree n. The difference is that p is normalized at z = 0 for GMRES and at z = 1 for Arnoldi. Analogous "ideal GMRES " and "ideal Arnoldi" problems are obtained if one removes b from the discussion and minimizes kp(A)k instead. Investigation of these true and ideal approximation problems gives insight into how fast GMRES converges and how the Arnoldi iteration locates eigenvalues.