Results 1  10
of
117
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 309 (31 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
A Theory of Networks for Approximation and Learning
 Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract

Cited by 194 (24 self)
 Add to MetaCart
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
Regularization tools – a matlab package for analysis and solution of discrete illposed problems
 Numerical Algorithms
, 1994
"... The software described in this report was originally published in Numerical Algorithms 6 (1994), pp. 1–35. The current version is published in Numer. Algo. 46 (2007), pp. 189–194, and it is available from www.netlib.org/numeralgo and www.mathworks.com/matlabcentral/fileexchangeContents ..."
Abstract

Cited by 192 (8 self)
 Add to MetaCart
The software described in this report was originally published in Numerical Algorithms 6 (1994), pp. 1–35. The current version is published in Numer. Algo. 46 (2007), pp. 189–194, and it is available from www.netlib.org/numeralgo and www.mathworks.com/matlabcentral/fileexchangeContents
Nonlinear solution of linear inverse problems by waveletvaguelette decomposition
, 1992
"... We describe the WaveletVaguelette Decomposition (WVD) of a linear inverse problem. It is a substitute for the singular value decomposition (SVD) of an inverse problem, and it exists for a class of special inverse problems of homogeneous type { such asnumerical di erentiation, inversion of Abeltype ..."
Abstract

Cited by 182 (12 self)
 Add to MetaCart
We describe the WaveletVaguelette Decomposition (WVD) of a linear inverse problem. It is a substitute for the singular value decomposition (SVD) of an inverse problem, and it exists for a class of special inverse problems of homogeneous type { such asnumerical di erentiation, inversion of Abeltype transforms, certain convolution transforms, and the Radon Transform. We propose to solve illposed linear inverse problems by nonlinearly \shrinking" the WVD coe cients of the noisy, indirect data. Our approach o ers signi cant advantages over traditional SVD inversion in the case of recovering spatially inhomogeneous objects. We suppose that observations are contaminated by white noise and that the object is an unknown element of a Besov space. We prove that nonlinear WVD shrinkage can be tuned to attain the minimax rate of convergence, for L 2 loss, over the entire Besov scale. The important case of Besov spaces Bp;q, p <2, which model spatial inhomogeneity, is included. In comparison, linear procedures { SVD included { cannot attain optimal rates of convergence over such classes in the case p<2. For example, our methods achieve faster rates of convergence, for objects known to lie in the Bump Algebra or in Bounded Variation, than any linear procedure.
Networks and the Best Approximation Property
 Biological Cybernetics
, 1989
"... Networks can be considered as approximation schemes. Multilayer networks of the backpropagation type can approximate arbitrarily well continuous functions (Cybenko, 1989# Funahashi, 1989# Stinchcombe and White, 1989). Weprovethatnetworks derived from regularization theory and including Radial Bas ..."
Abstract

Cited by 95 (7 self)
 Add to MetaCart
Networks can be considered as approximation schemes. Multilayer networks of the backpropagation type can approximate arbitrarily well continuous functions (Cybenko, 1989# Funahashi, 1989# Stinchcombe and White, 1989). Weprovethatnetworks derived from regularization theory and including Radial Basis Functions (Poggio and Girosi, 1989), have a similar property.From the point of view of approximation theory, however, the property of approximating continuous functions arbitrarily well is not sufficientforcharacterizing good approximation schemes. More critical is the property of best approximation. The main result of this paper is that multilayer networks, of the type used in backpropagation, are not best approximation. For regularization networks (in particular Radial Basis Function networks) we prove existence and uniqueness of best approximation.
Solving illconditioned and singular linear systems: A tutorial on regularization
 SIAM Rev
, 1998
"... Abstract. It is shown that the basic regularization procedures for finding meaningful approximate solutions of illconditioned or singular linear systems can be phrased and analyzed in terms of classical linear algebra that can be taught in any numerical analysis course. Apart from rewriting many kn ..."
Abstract

Cited by 81 (2 self)
 Add to MetaCart
Abstract. It is shown that the basic regularization procedures for finding meaningful approximate solutions of illconditioned or singular linear systems can be phrased and analyzed in terms of classical linear algebra that can be taught in any numerical analysis course. Apart from rewriting many known results in a more elegant form, we also derive a new twoparameter family of merit functions for the determination of the regularization parameter. The traditional merit functions from generalized cross validation (GCV) and generalized maximum likelihood (GML) are recovered as special cases.
Priors, Stabilizers and Basis Functions: from regularization to radial, tensor and additive splines
, 1993
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular we had discussed how standard smoothness functionals lead to a subclass of regularization networks, th ..."
Abstract

Cited by 79 (14 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular we had discussed how standard smoothness functionals lead to a subclass of regularization networks, the wellknown Radial Basis Functions approximation schemes. In this paper weshow that regularization networks encompass amuch broader range of approximation schemes, including many of the popular general additivemodels and some of the neural networks. In particular weintroduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same extension that leads from Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additivemodels to ridge approximation models, containing as special cases Breiman's hinge functions and some forms of Projection Pursuit Regression. We propose to use the term GeneralizedRegularization Networks for this broad class of approximation schemes that follow from an extension of regularization. In the probabilistic interpretation of regularization, the different classes of basis functions correspond to different classes of prior probabilities on the approximating function spaces, and therefore to differenttypes of smoothness assumptions. In the final part of the paper, weshow the relation between activation functions of the Gaussian and sigmoidal type by considering the simple case of the kernel G(x)=x. In summary,
Recovering Edges in IllPosed Inverse Problems: Optimality of Curvelet Frames
, 2000
"... We consider a model problem of recovering a function f(x1,x2) from noisy Radon data. The function f to be recovered is assumed smooth apart from a discontinuity along a C2 curve – i.e. an edge. We use the continuum white noise model, with noise level ɛ. Traditional linear methods for solving such in ..."
Abstract

Cited by 50 (14 self)
 Add to MetaCart
We consider a model problem of recovering a function f(x1,x2) from noisy Radon data. The function f to be recovered is assumed smooth apart from a discontinuity along a C2 curve – i.e. an edge. We use the continuum white noise model, with noise level ɛ. Traditional linear methods for solving such inverse problems behave poorly in the presence of edges. Qualitatively, the reconstructions are blurred near the edges; quantitatively, they give in our model Mean Squared Errors (MSEs) that tend to zero with noise level ɛ only as O(ɛ1/2)asɛ → 0. A recent innovation – nonlinear shrinkage in the wavelet domain – visually improves edge sharpness and improves MSE convergence to O(ɛ2/3). However, as we show here, this rate is not optimal. In fact, essentially optimal performance is obtained by deploying the recentlyintroduced tight frames of curvelets in this setting. Curvelets are smooth, highly anisotropic elements ideally suited for detecting and synthesizing curved edges. To deploy them in the Radon setting, we construct a curveletbased biorthogonal decomposition
On the Relationship Between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions
 NEURAL COMPUTATION
, 1996
"... Feedforward networks are a class of regression techniques that can be used to learn to perform some task from a set of examples. The question of generalization of network performance from a finite training set to unseen data is clearly of crucial importance. In this article we first show that the ..."
Abstract

Cited by 47 (6 self)
 Add to MetaCart
Feedforward networks are a class of regression techniques that can be used to learn to perform some task from a set of examples. The question of generalization of network performance from a finite training set to unseen data is clearly of crucial importance. In this article we first show that the generalization error can be decomposed in two terms: the approximation error, due to the insufficient representational capacity of a finite sized network, and the estimation error, due to insufficient information about the target function because of the finite number of samples. We then consider the problem of approximating functions belonging to certain Sobolev spaces with Gaussian Radial Basis Functions. Using the above mentioned decomposition we bound the generalization error in terms of the number of basis functions and number of examples. While the bound that we derive is specific for Radial Basis Functions, a number of observations deriving from it apply to any approximation t...
ON METHODS OF SIEVES AND PENALIZATION
, 1997
"... We develop a general theory which provides a unified treatment for the asymptotic normality and efficiency of the maximum likelihood estimates (MLE’s) in parametric, semiparametric and nonparametric models. We find that the asymptotic behavior of substitution estimates for estimating smooth function ..."
Abstract

Cited by 44 (1 self)
 Add to MetaCart
We develop a general theory which provides a unified treatment for the asymptotic normality and efficiency of the maximum likelihood estimates (MLE’s) in parametric, semiparametric and nonparametric models. We find that the asymptotic behavior of substitution estimates for estimating smooth functionals are essentially governed by two indices: the degree of smoothness of the functional and the local size of the underlying parameter space. We show that when the local size of the parameter space is not very large, the substitution standard (nonsieve), substitution sieve and substitution penalized MLE’s are asymptotically efficient in the Fisher sense, under certain stochastic equicontinuity conditions of the loglikelihood. Moreover, when the convergence rate of the estimate is slow, the degree of smoothness of the functional needs to compensate for the slowness of the rate in order to achieve efficiency. When the size of the parameter space is very large, the standard and penalized maximum likelihood procedures may be inefficient, whereas the method of sieves may be able to overcome this difficulty. This phenomenon is particularly manifested when the functional of interest is very smooth, especially in the semiparametric case.