Results 1 - 10
of
103
Regularization Theory and Neural Networks Architectures
- Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract
-
Cited by 257 (30 self)
- Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Regularization networks and support vector machines
- Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract
-
Cited by 215 (28 self)
- Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
A Theory of Networks for Approximation and Learning
- Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an input-output mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multi-dimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract
-
Cited by 170 (25 self)
- Add to MetaCart
Learning an input-output mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multi-dimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of three-layer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the well-known Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
The Sample Complexity of Pattern Classification With Neural Networks: The Size of the Weights is More Important Than the Size of the Network
, 1997
"... Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the ne ..."
Abstract
-
Cited by 156 (14 self)
- Add to MetaCart
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a two-layer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A³ p (log n)=m (ignori...
Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score
, 2000
"... We are interested in estimating the average e#ect of a binary treatment on a scalar outcome. If assignment to the treatment is independent of the potential outcomes given pretreatment variables, biases associated with simple treatment-control average comparisons can be removed by adjusting for di#er ..."
Abstract
-
Cited by 75 (7 self)
- Add to MetaCart
We are interested in estimating the average e#ect of a binary treatment on a scalar outcome. If assignment to the treatment is independent of the potential outcomes given pretreatment variables, biases associated with simple treatment-control average comparisons can be removed by adjusting for di#erences in the pre-treatmentvariables. Rosenbaum and Rubin #1983, 1984# show that adjusting solely for di#erences between treated and control units in a scalar function of the pre-treatment variables, the propensity score, also removes the entire bias associated with di#erences in pre-treatment variables. Thus it is possible to obtain unbiased estimates of the treatment e#ect without conditioning on a possibly highdimensional vector of pre-treatment variables. Although adjusting for the propensity score removes all the bias, this can come at the expense of e#ciency. We show that weighting with the inverse of a nonparametric estimate of the propensity score, rather than the true propensity scor...
On the Relationship Between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions
- NEURAL COMPUTATION
, 1996
"... Feedforward networks are a class of regression techniques that can be used to learn to perform some task from a set of examples. The question of generalization of network performance from a finite training set to unseen data is clearly of crucial importance. In this article we first show that the ..."
Abstract
-
Cited by 42 (6 self)
- Add to MetaCart
Feedforward networks are a class of regression techniques that can be used to learn to perform some task from a set of examples. The question of generalization of network performance from a finite training set to unseen data is clearly of crucial importance. In this article we first show that the generalization error can be decomposed in two terms: the approximation error, due to the insufficient representational capacity of a finite sized network, and the estimation error, due to insufficient information about the target function because of the finite number of samples. We then consider the problem of approximating functions belonging to certain Sobolev spaces with Gaussian Radial Basis Functions. Using the above mentioned decomposition we bound the generalization error in terms of the number of basis functions and number of examples. While the bound that we derive is specific for Radial Basis Functions, a number of observations deriving from it apply to any approximation t...
A unified framework for Regularization Networks and Support Vector Machines
, 1999
"... This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS-9800032, the O#ce ofN aval Researc ..."
Abstract
-
Cited by 40 (11 self)
- Add to MetaCart
This report describers research done at the Center for Biological & Computational Learning and the Artificial Intelligence Laboratory of the Massachusetts Institute of Technology. This research was sponsored by theN ational Science Foundation under contractN o. IIS-9800032, the O#ce ofN aval Research under contractN o.N 0001493 -1-0385 and contractN o.N 00014-95-1-0600. Partial support was also provided by Daimler-Benz AG, Eastman Kodak, Siemens Corporate Research, Inc., ATR and AT&T. Contents Introductic 3 2 OverviF of stati.48EF learni4 theory 5 2.1 Unifo6 Co vergence and the Vapnik-Chervo nenkis bo und ............. 7 2.2 The metho d o Structural Risk Minimizatio ..................... 10 2.3 #-unifo8 co vergence and the V # ..................... 10 2.4 Overviewo fo urappro6 h ............................... 13 3 Reproduci9 Kernel HiT ert Spaces: a briL overviE 14 4RegulariEqq.L Networks 16 4.1 Radial Basis Functio8 ................................. 19 4.2 Regularizatioz generalized splines and kernel smo oxy rs .............. 20 4.3 Dual representatio o f Regularizatio Netwo rks ................... 21 4.4 Fro regressioto 5 Support vector machiT9 22 5.1 SVMin RKHS ..................................... 22 5.2 Fro regressioto 6SRMforRNsandSVMs 26 6.1 SRMfo SVMClassificatio .............................. 28 6.1.1 Distributio dependent bo undsfo SVMC .................. 29 7 A BayesiL Interpretatiq ofRegulariTFqEL and SRM? 30 7.1 Maximum A Po terio6 Interpretatio o f ............... 30 7.2 Bayesian interpretatio o f the stabilizer in the RN andSVMfunctio6I6 ...... 32 7.3 Bayesian interpretatio o f the data term in the Regularizatio andSVMfunctioy8 33 7.4 Why a MAP interpretatio may be misleading .................... 33 Connectine between SVMs and Sparse Ap...
An extension of matlab to continuous functions and operators
- SIAM J. Sci. Comput
"... Abstract. An object-oriented MATLAB system is described for performing numerical linear algebra on continuous functions and operators rather than the usual discrete vectors and matrices. About eighty MATLAB functions from plot and sum to svd and cond have been overloaded so that one can work with ou ..."
Abstract
-
Cited by 36 (9 self)
- Add to MetaCart
Abstract. An object-oriented MATLAB system is described for performing numerical linear algebra on continuous functions and operators rather than the usual discrete vectors and matrices. About eighty MATLAB functions from plot and sum to svd and cond have been overloaded so that one can work with our “chebfun ” objects using almost exactly the usual MATLAB syntax. All functions live on [−1, 1] and are represented by values at sufficiently many Chebyshev points for the polynomial interpolant to be accurate to close to machine precision. Each of our overloaded operations raises questions about the proper generalization of familiar notions to the continuous context and about appropriate methods of interpolation, differentiation, integration, zerofinding, or transforms. Applications in approximation theory and numerical analysis are explored, and possible extensions for more substantial problems of scientific computing are mentioned.
Polynomial Interpolation in Several Variables
, 1999
"... this paper we want to describe some recent developments in polynomial interpolation, especially those which lead to the construction Partially supported by DGES Spain, PB 96-0730 Partially supported by DGES Spain, PB 96-0730 and Programa Europa CAI-DGA, Zaragoza, Spain 2 M. Gasca and T. Sauer / Po ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
this paper we want to describe some recent developments in polynomial interpolation, especially those which lead to the construction Partially supported by DGES Spain, PB 96-0730 Partially supported by DGES Spain, PB 96-0730 and Programa Europa CAI-DGA, Zaragoza, Spain 2 M. Gasca and T. Sauer / Polynomial interpolation of the interpolating polynomial, rather than verification of its mere existence
Generalization Bounds for Function Approximation from Scattered Noisy Data
, 1998
"... this paper we investigate the problem of providing error bounds for approximation of an unknown function from scattered, noisy data. This problem has particular relevance in the field of machine learning, where the unknown function represents the task that has to be learned and the scattered data re ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
this paper we investigate the problem of providing error bounds for approximation of an unknown function from scattered, noisy data. This problem has particular relevance in the field of machine learning, where the unknown function represents the task that has to be learned and the scattered data represents the examples of this task. An obvious quantity of interest for us is the generalization error -- a measure of how much the result of the approximation scheme differs from the unknown function -- typically studied as a function of the number of data points. Since the data are randomly generated and noisy, the analysis of the generalization error necessarily involves statistical considerations in addition to the traditional

