Results 1  10
of
212
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 394 (33 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score
, 2000
"... We are interested in estimating the average effect of a binary treatment on a scalar outcome. If assignment to the treatment is independent of the potential outcomes given pretreatment variables, biases associated with simple treatmentcontrol average comparisons can be removed by adjusting for diff ..."
Abstract

Cited by 392 (35 self)
 Add to MetaCart
We are interested in estimating the average effect of a binary treatment on a scalar outcome. If assignment to the treatment is independent of the potential outcomes given pretreatment variables, biases associated with simple treatmentcontrol average comparisons can be removed by adjusting for differences in the pretreatmentvariables. Rosenbaum and Rubin (1983, 1984) show that adjusting solely for differences between treated and control units in a scalar function of the pretreatment variables, the propensity score, also removes the entire bias associated with differences in pretreatment variables. Thus it is possible to obtain unbiased estimates of the treatment effect without conditioning on a possibly highdimensional vector of pretreatment variables. Although adjusting for the propensity score removes all the bias, this can come at the expense of efficiency. We show that weighting with the inverse of a nonparametric estimate of the propensity score, rather than the true propensity scor...
Regularization networks and support vector machines
 Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract

Cited by 365 (38 self)
 Add to MetaCart
(Show Context)
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Minimum complexity density estimation
 IEEE TRANS. INF. THEORY
, 1991
"... The minimum complexity or minimum descriptionlength criterion developed by Kolmogorov, Rissanen, Wallace, So&in, and others leads to consistent probability density estimators. These density estimators are defined to achieve the best compromise between likelihood and simplicity. A related issue ..."
Abstract

Cited by 246 (7 self)
 Add to MetaCart
The minimum complexity or minimum descriptionlength criterion developed by Kolmogorov, Rissanen, Wallace, So&in, and others leads to consistent probability density estimators. These density estimators are defined to achieve the best compromise between likelihood and simplicity. A related issue is the compromise between accuracy of approximations and complexity relative to the sample size. An index of resolvability is studied which is shown to bound the statistical accuracy of the density estimators, as well as the informationtheoretic redundancy.
A Theory of Networks for Approximation and Learning
 Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract

Cited by 233 (25 self)
 Add to MetaCart
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
The Sample Complexity of Pattern Classification With Neural Networks: The Size of the Weights is More Important Than the Size of the Network
, 1997
"... Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the ne ..."
Abstract

Cited by 212 (16 self)
 Add to MetaCart
Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the network. Results in this paper show that if a large neural network is used for a pattern classification problem and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the number of weights. For example, consider a twolayer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error estimate (that is related to squared error on the training set) plus A³ p (log n)=m (ignori...
Large Sample Sieve Estimation of SemiNonparametric Models
 Handbook of Econometrics
, 2007
"... Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; seminonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method o ..."
Abstract

Cited by 180 (19 self)
 Add to MetaCart
Often researchers find parametric models restrictive and sensitive to deviations from the parametric specifications; seminonparametric models are more flexible and robust, but lead to other complications such as introducing infinite dimensional parameter spaces that may not be compact. The method of sieves provides one way to tackle such complexities by optimizing an empirical criterion function over a sequence of approximating parameter spaces, called sieves, which are significantly less complex than the original parameter space. With different choices of criteria and sieves, the method of sieves is very flexible in estimating complicated econometric models. For example, it can simultaneously estimate the parametric and nonparametric components in seminonparametric models with or without constraints. It can easily incorporate prior information, often derived from economic theory, such as monotonicity, convexity, additivity, multiplicity, exclusion and nonnegativity. This chapter describes estimation of seminonparametric econometric models via the method of sieves. We present some general results on the large sample properties of the sieve estimates, including consistency of the sieve extremum estimates, convergence rates of the sieve Mestimates, pointwise normality of series estimates of regression functions, rootn asymptotic normality and efficiency of sieve estimates of smooth functionals of infinite dimensional parameters. Examples are used to illustrate the general results.
HighOrder Collocation Methods for Differential Equations with Random Inputs
 SIAM Journal on Scientific Computing
"... Abstract. Recently there has been a growing interest in designing efficient methods for the solution of ordinary/partial differential equations with random inputs. To this end, stochastic Galerkin methods appear to be superior to other nonsampling methods and, in many cases, to several sampling met ..."
Abstract

Cited by 176 (9 self)
 Add to MetaCart
(Show Context)
Abstract. Recently there has been a growing interest in designing efficient methods for the solution of ordinary/partial differential equations with random inputs. To this end, stochastic Galerkin methods appear to be superior to other nonsampling methods and, in many cases, to several sampling methods. However, when the governing equations take complicated forms, numerical implementations of stochastic Galerkin methods can become nontrivial and care is needed to design robust and efficient solvers for the resulting equations. On the other hand, the traditional sampling methods, e.g., Monte Carlo methods, are straightforward to implement, but they do not offer convergence as fast as stochastic Galerkin methods. In this paper, a highorder stochastic collocation approach is proposed. Similar to stochastic Galerkin methods, the collocation methods take advantage of an assumption of smoothness of the solution in random space to achieve fast convergence. However, the numerical implementation of stochastic collocation is trivial, as it requires only repetitive runs of an existing deterministic solver, similar to Monte Carlo methods. The computational cost of the collocation methods depends on the choice of the collocation points, and we present several feasible constructions. One particular choice, based on sparse grids, depends weakly on the dimensionality of the random space and is more suitable for highly accurate computations of practical applications with large dimensional random inputs. Numerical examples are presented to demonstrate the accuracy and efficiency of the stochastic collocation methods. Key words. collocation methods, stochastic inputs, differential equations, uncertainty quantification
On the Gibbs phenomenon and its resolution
 SIAM Rev
, 1997
"... Abstract. The nonuniform convergence of the Fourier series for discontinuous functions, and in particular the oscillatory behavior of the finite sum, was already analyzed by Wilbraham in 1848. This was later named the Gibbs phenomenon. This article is a review of the Gibbs phenomenon from a differen ..."
Abstract

Cited by 146 (3 self)
 Add to MetaCart
(Show Context)
Abstract. The nonuniform convergence of the Fourier series for discontinuous functions, and in particular the oscillatory behavior of the finite sum, was already analyzed by Wilbraham in 1848. This was later named the Gibbs phenomenon. This article is a review of the Gibbs phenomenon from a different perspective. The Gibbs phenomenon, as we view it, deals with the issue of recovering point values of a function from its expansion coefficients. Alternatively it can be viewed as the possibility of the recovery of local information from global information. The main theme here is not the structure of the Gibbs oscillations but the understanding and resolution of the phenomenon in a general setting. The purpose of this article is to review the Gibbs phenomenon and to show that the knowledge of the expansion coefficients is sufficient for obtaining the point values of a piecewise smooth function, with the same order of accuracy as in the smooth case. This is done by using the finite expansion series to construct a different, rapidly convergent, approximation.
Newey (2002): “Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity,” Technical Working Paper 285, National Bureau of Economic Research
"... Chamberlain, Jim Heckman, Aviv Nevo, Ariel Pakes, Jim Powell and participants at seminars at Stanford ..."
Abstract

Cited by 90 (9 self)
 Add to MetaCart
Chamberlain, Jim Heckman, Aviv Nevo, Ariel Pakes, Jim Powell and participants at seminars at Stanford