Results 1 - 10
of
40
Extracting Comprehensible Models from Trained Neural Networks
, 1996
"... To Mom, Dad, and Susan, for their support and encouragement. ..."
Abstract
-
Cited by 65 (4 self)
- Add to MetaCart
To Mom, Dad, and Susan, for their support and encouragement.
On Learning the Derivatives of an Unknown Mapping with Multilayer Feedforward Networks
, 1989
"... Daniel F. Mccaffrey, and Douglas W. Nychka for helpful discussions relating to Recently, multiple input, single output, single hidden layer, feedforward neural networks have been shown to be capable of approximating a nonlinear map and its partial derivatives. Specifically, neural nets have been sho ..."
Abstract
-
Cited by 49 (5 self)
- Add to MetaCart
Daniel F. Mccaffrey, and Douglas W. Nychka for helpful discussions relating to Recently, multiple input, single output, single hidden layer, feedforward neural networks have been shown to be capable of approximating a nonlinear map and its partial derivatives. Specifically, neural nets have been shown to be dense in various Sobolev spaces (Hornik, Stinchcombe and White, 1989). Building upon this result, we show that a net can be trained so that the map and its derivatives are learned. Specifically, we use a result of Gallant (1987b) to show that least squares and similar estimates are strongly consistent in Sobolev norm provided the number of hidden units and the size of the training set increase together. We illustrate these results by an applic~tion to the inverse problem of chaotic dynamics: recovery of a nonlinear map from a time series of iterates. These results extend automatically to nets that embed the single hidden layer, feedforward network as a special case. 1.1 1.
Consistent Specification Testing With Nuisance Parameters Present Only Under The Alternative
, 1995
"... . The nonparametric and the nuisance parameter approaches to consistently testing statistical models are both attempts to estimate topological measures of distance between a parametric and a nonparametric fit, and neither dominates in experiments. This topological unification allows us to greatly ex ..."
Abstract
-
Cited by 34 (8 self)
- Add to MetaCart
. The nonparametric and the nuisance parameter approaches to consistently testing statistical models are both attempts to estimate topological measures of distance between a parametric and a nonparametric fit, and neither dominates in experiments. This topological unification allows us to greatly extend the nuisance parameter approach. How and why the nuisance parameter approach works and how it can be extended bears closely on recent developments in artificial neural networks. Statistical content is provided by viewing specification tests with nuisance parameters as tests of hypotheses about Banach-valued random elements and applying the Banach Central Limit Theorem and Law of Iterated Logarithm, leading to simple procedures that can be used as a guide to when computationally more elaborate procedures may be warranted. 1. Introduction In testing whether or not a parametric statistical model is correctly specified, there are a number of apparently distinct approaches one might take. T...
A Single-Blind Controlled Competition Among Tests for Nonlinearity and Chaos
- Journal of Econometrics
, 1997
"... Abstract: Interest has been growing in testing for nonlinearity or chaos in economic data, but much controversy has arisen about the available results. This paper explores the reasons for these empirical difficulties. We designed and ran a single-blind controlled competition among five highly regard ..."
Abstract
-
Cited by 32 (5 self)
- Add to MetaCart
Abstract: Interest has been growing in testing for nonlinearity or chaos in economic data, but much controversy has arisen about the available results. This paper explores the reasons for these empirical difficulties. We designed and ran a single-blind controlled competition among five highly regarded tests for nonlinearity or chaos with ten simulated data series. The data generating mechanisms include linear processes, chaotic recursions, and nonchaotic stochastic processes; and both large and small samples were included in the experiment. The data series were produced in a single blind manner by the competition manager and sent by e-mail, without identifying information, to the experiment participants. Each such participant is an acknowledged expert in one of the tests and has a possible vested interest in producing the best possible results with that one test. The results of this competition provide much surprising information about the power functions of some of the best regarded tests for nonlinearity or noisy chaos.
Economic Choices
- American Economic Review
, 2001
"... ome detail more recent developments in the economic theory of choice, and modifications to this theory that are being forced by experimental evidence from cognitive psychology. I will close with a survey of statistical methods that have developed as part of the research program on economic choice be ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
ome detail more recent developments in the economic theory of choice, and modifications to this theory that are being forced by experimental evidence from cognitive psychology. I will close with a survey of statistical methods that have developed as part of the research program on economic choice behavior. Science is a cooperative enterprise, and my work on choice behavior reflects not only my own ideas, but the results of exchange and collaboration with many other scholars. 1 First, of course, is my co-laureate James Heckman, who among his many contributions pioneered the important area of dynamic discrete choice analysis. Nine other individuals who played a major role in channeling microeconometrics and choice theory toward their modern forms, and had a particularly important influence on my own work, are Zvi Griliches, L.L. Thurstone, Jacob Marschak, Duncan Luce, Danny Kahneman, Amos Tversky, Moshe Ben-Akiva, Charles Manski, and Kenneth Train. A gallery of their p
Serial And Parallel Backpropagation Convergence Via Nonmonotone Perturbed Minimization
- OPTIMIZATION METHODS AND SOFTWARE
, 1994
"... A general convergence theorem is proposed for a family of serial and parallel nonmonotone unconstrained minimization methods with perturbations. A principal application of the theorem is to establish convergence of backpropagation (BP), the classical algorithm for training artificial neural networks ..."
Abstract
-
Cited by 25 (11 self)
- Add to MetaCart
A general convergence theorem is proposed for a family of serial and parallel nonmonotone unconstrained minimization methods with perturbations. A principal application of the theorem is to establish convergence of backpropagation (BP), the classical algorithm for training artificial neural networks. Under certain natural assumptions, such as divergence of the sum of the learning rates and convergence of the sum of their squares, it is shown that every accumulation point of the BP iterates is a stationary point of the error function associated with the given set of training examples. The results presented cover serial and parallel BP, as well as modified BP with a momentum term.
A New Class Of Incremental Gradient Methods For Least Squares Problems
- SIAM J. Optim
, 1996
"... The LMS method for linear least squares problems di#ers from the steepest descent method in that it processes data blocks one-by-one, with intermediate adjustment of the parameter vector under optimization. This mode of operation often leads to faster convergence when far from the eventual limit, an ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
The LMS method for linear least squares problems di#ers from the steepest descent method in that it processes data blocks one-by-one, with intermediate adjustment of the parameter vector under optimization. This mode of operation often leads to faster convergence when far from the eventual limit, and to slower (sublinear) convergence when close to the optimal solution. We embed both LMS and steepest descent, as well as other intermediate methods, within a one-parameter class of algorithms, and we propose a hybrid class of methods that combine the faster early convergence rate of LMS with the faster ultimate linear convergence rate of steepest descent. These methods are well-suited for neural network training problems with large data sets. Furthermore, these methods allow the e#ective use of scaling based for example on diagonal or other approximations of the Hessian matrix. 1 Research supported by NSF under Grant 9300494-DMI. 2 Dept. of Electrical Engineering and Computer Science, M...
Reinforcement Learning Through Gradient Descent
, 1999
"... Reinforcement learning is often done using parameterized function approximators to store value functions. Algorithms are typically developed for lookup tables, and then applied to function approximators by using backpropagation. This can lead to algorithms diverging on very small, simple MDPs and Ma ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Reinforcement learning is often done using parameterized function approximators to store value functions. Algorithms are typically developed for lookup tables, and then applied to function approximators by using backpropagation. This can lead to algorithms diverging on very small, simple MDPs and Markov chains, even with linear function approximators and epoch-wise training. These algorithms are also very difficult to analyze, and difficult to combine with other algorithms. A series of new families of algorithms are derived based on stochastic gradient descent. Since they are derived from first principles with function approximators in mind, they have guaranteed convergence to local minima, even on general nonlinear function approximators. For both residual algorithms and VAPS algorithms, it is possible to take any of the standard algorithms in the field, such as Q-learning or SARSA or value iteration, and rederive a new form of it with provable convergence. In addition to better conve...
Incremental Least Squares Methods And The Extended Kalman Filter
, 1995
"... In this paper we propose and analyze nonlinear least squares methods, which process the data incrementally, one data block at a time. Such methods are well suited for large data sets and real time operation, and have received much attention in the context of neural network training problems. We focu ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
In this paper we propose and analyze nonlinear least squares methods, which process the data incrementally, one data block at a time. Such methods are well suited for large data sets and real time operation, and have received much attention in the context of neural network training problems. We focus on the Extended Kalman Filter, which may be viewed as an incremental version of the Gauss-Newton method. We provide a nonstochastic analysis of its convergence properties, and we discuss variants aimed at accelerating its convergence. 1 Research supported by NSF under Grant 9300494-DMI. 2 Dept. of Electrical Engineering and Computer Science, M.I.T., Cambridge, Mass., 02139. 1 1. Introduction 1. INTRODUCTION We consider least squares problems of the form minimize f(x) = #g(x)# 2 = m X i=1 #g i (x)# 2 subject to x # # n , (1) where g is a continuously di#erentiable function with component functions g 1 , . . . , gm , where g i : # n # # r i . Here we write #z# for t...
An Incremental Gradient(-Projection) Method With Momentum Term And Adaptive Stepsize Rule
- SIAM J. on Optimization
, 1998
"... . We consider an incremental gradient method with momentum term for minimizing the sum of continuously di#erentiable functions. This method uses a new adaptive stepsize rule that decreases the stepsize whenever su#cient progress is not made. We show that if the gradients of the functions are bounded ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
. We consider an incremental gradient method with momentum term for minimizing the sum of continuously di#erentiable functions. This method uses a new adaptive stepsize rule that decreases the stepsize whenever su#cient progress is not made. We show that if the gradients of the functions are bounded and Lipschitz continuous over a certain level set, then every cluster point of the iterates generated by the method is a stationary point. In addition, if the gradient of the functions have a certain growth property, then the method is either linearly convergent in some sense or the stepsizes are bounded away from zero. The new stepsize rule is much in the spirit of heuristic learning rules used in practice for training neural networks via backpropagation. As such, the new stepsize rule may suggest improvements on existing learning rules. Finally, extension of the method and the convergence results to constrained minimization is discussed, as are some implementation issues and numerical exp...

