Results 1  10
of
78
Learning Machines
, 1965
"... This book is about machines that learn to discover hidden relationships in data. A constant sfream of data bombards our senses and millions of sensory channels carry information into our brains. Brains are also learning machines that condition, ..."
Abstract

Cited by 164 (0 self)
 Add to MetaCart
(Show Context)
This book is about machines that learn to discover hidden relationships in data. A constant sfream of data bombards our senses and millions of sensory channels carry information into our brains. Brains are also learning machines that condition,
Economic Choices
 American Economic Review
, 2001
"... ome detail more recent developments in the economic theory of choice, and modifications to this theory that are being forced by experimental evidence from cognitive psychology. I will close with a survey of statistical methods that have developed as part of the research program on economic choice be ..."
Abstract

Cited by 120 (2 self)
 Add to MetaCart
(Show Context)
ome detail more recent developments in the economic theory of choice, and modifications to this theory that are being forced by experimental evidence from cognitive psychology. I will close with a survey of statistical methods that have developed as part of the research program on economic choice behavior. Science is a cooperative enterprise, and my work on choice behavior reflects not only my own ideas, but the results of exchange and collaboration with many other scholars. 1 First, of course, is my colaureate James Heckman, who among his many contributions pioneered the important area of dynamic discrete choice analysis. Nine other individuals who played a major role in channeling microeconometrics and choice theory toward their modern forms, and had a particularly important influence on my own work, are Zvi Griliches, L.L. Thurstone, Jacob Marschak, Duncan Luce, Danny Kahneman, Amos Tversky, Moshe BenAkiva, Charles Manski, and Kenneth Train. A gallery of their p
Consistent Specification Testing With Nuisance Parameters Present Only Under The Alternative
, 1995
"... . The nonparametric and the nuisance parameter approaches to consistently testing statistical models are both attempts to estimate topological measures of distance between a parametric and a nonparametric fit, and neither dominates in experiments. This topological unification allows us to greatly ex ..."
Abstract

Cited by 84 (13 self)
 Add to MetaCart
(Show Context)
. The nonparametric and the nuisance parameter approaches to consistently testing statistical models are both attempts to estimate topological measures of distance between a parametric and a nonparametric fit, and neither dominates in experiments. This topological unification allows us to greatly extend the nuisance parameter approach. How and why the nuisance parameter approach works and how it can be extended bears closely on recent developments in artificial neural networks. Statistical content is provided by viewing specification tests with nuisance parameters as tests of hypotheses about Banachvalued random elements and applying the Banach Central Limit Theorem and Law of Iterated Logarithm, leading to simple procedures that can be used as a guide to when computationally more elaborate procedures may be warranted. 1. Introduction In testing whether or not a parametric statistical model is correctly specified, there are a number of apparently distinct approaches one might take. T...
Extracting Comprehensible Models from Trained Neural Networks
, 1996
"... To Mom, Dad, and Susan, for their support and encouragement. ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
(Show Context)
To Mom, Dad, and Susan, for their support and encouragement.
On Learning the Derivatives of an Unknown Mapping with Multilayer Feedforward Networks
, 1989
"... Daniel F. Mccaffrey, and Douglas W. Nychka for helpful discussions relating to Recently, multiple input, single output, single hidden layer, feedforward neural networks have been shown to be capable of approximating a nonlinear map and its partial derivatives. Specifically, neural nets have been sho ..."
Abstract

Cited by 75 (8 self)
 Add to MetaCart
(Show Context)
Daniel F. Mccaffrey, and Douglas W. Nychka for helpful discussions relating to Recently, multiple input, single output, single hidden layer, feedforward neural networks have been shown to be capable of approximating a nonlinear map and its partial derivatives. Specifically, neural nets have been shown to be dense in various Sobolev spaces (Hornik, Stinchcombe and White, 1989). Building upon this result, we show that a net can be trained so that the map and its derivatives are learned. Specifically, we use a result of Gallant (1987b) to show that least squares and similar estimates are strongly consistent in Sobolev norm provided the number of hidden units and the size of the training set increase together. We illustrate these results by an applic~tion to the inverse problem of chaotic dynamics: recovery of a nonlinear map from a time series of iterates. These results extend automatically to nets that embed the single hidden layer, feedforward network as a special case. 1.1 1.
A New Class Of Incremental Gradient Methods For Least Squares Problems
 SIAM J. Optim
, 1996
"... The LMS method for linear least squares problems di#ers from the steepest descent method in that it processes data blocks onebyone, with intermediate adjustment of the parameter vector under optimization. This mode of operation often leads to faster convergence when far from the eventual limit, an ..."
Abstract

Cited by 65 (2 self)
 Add to MetaCart
(Show Context)
The LMS method for linear least squares problems di#ers from the steepest descent method in that it processes data blocks onebyone, with intermediate adjustment of the parameter vector under optimization. This mode of operation often leads to faster convergence when far from the eventual limit, and to slower (sublinear) convergence when close to the optimal solution. We embed both LMS and steepest descent, as well as other intermediate methods, within a oneparameter class of algorithms, and we propose a hybrid class of methods that combine the faster early convergence rate of LMS with the faster ultimate linear convergence rate of steepest descent. These methods are wellsuited for neural network training problems with large data sets. Furthermore, these methods allow the e#ective use of scaling based for example on diagonal or other approximations of the Hessian matrix. 1 Research supported by NSF under Grant 9300494DMI. 2 Dept. of Electrical Engineering and Computer Science, M...
A SingleBlind Controlled Competition Among Tests for Nonlinearity and Chaos
 Journal of Econometrics
, 1997
"... Abstract: Interest has been growing in testing for nonlinearity or chaos in economic data, but much controversy has arisen about the available results. This paper explores the reasons for these empirical difficulties. We designed and ran a singleblind controlled competition among five highly regard ..."
Abstract

Cited by 55 (9 self)
 Add to MetaCart
Abstract: Interest has been growing in testing for nonlinearity or chaos in economic data, but much controversy has arisen about the available results. This paper explores the reasons for these empirical difficulties. We designed and ran a singleblind controlled competition among five highly regarded tests for nonlinearity or chaos with ten simulated data series. The data generating mechanisms include linear processes, chaotic recursions, and nonchaotic stochastic processes; and both large and small samples were included in the experiment. The data series were produced in a single blind manner by the competition manager and sent by email, without identifying information, to the experiment participants. Each such participant is an acknowledged expert in one of the tests and has a possible vested interest in producing the best possible results with that one test. The results of this competition provide much surprising information about the power functions of some of the best regarded tests for nonlinearity or noisy chaos.
An Incremental Gradient(Projection) Method with Momentum Term and Adaptive Stepsize Rule
 SIAM Joural on Optimization
, 1998
"... ..."
(Show Context)
Serial And Parallel Backpropagation Convergence Via Nonmonotone Perturbed Minimization
 OPTIMIZATION METHODS AND SOFTWARE
, 1994
"... A general convergence theorem is proposed for a family of serial and parallel nonmonotone unconstrained minimization methods with perturbations. A principal application of the theorem is to establish convergence of backpropagation (BP), the classical algorithm for training artificial neural networks ..."
Abstract

Cited by 37 (11 self)
 Add to MetaCart
(Show Context)
A general convergence theorem is proposed for a family of serial and parallel nonmonotone unconstrained minimization methods with perturbations. A principal application of the theorem is to establish convergence of backpropagation (BP), the classical algorithm for training artificial neural networks. Under certain natural assumptions, such as divergence of the sum of the learning rates and convergence of the sum of their squares, it is shown that every accumulation point of the BP iterates is a stationary point of the error function associated with the given set of training examples. The results presented cover serial and parallel BP, as well as modified BP with a momentum term.
Incremental Gradient Algorithms with Stepsizes Bounded Away From Zero
 Computational Opt. and Appl
, 1998
"... Abstract. We consider the class of incremental gradient methods for minimizing a sum of continuously differentiable functions. An important novel feature of our analysis is that the stepsizes are kept bounded away from zero. We derive the first convergence results of any kind for this computationall ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
Abstract. We consider the class of incremental gradient methods for minimizing a sum of continuously differentiable functions. An important novel feature of our analysis is that the stepsizes are kept bounded away from zero. We derive the first convergence results of any kind for this computationally important case. In particular, we show that a certain εapproximate solution can be obtained and establish the linear dependence of ε on the stepsize limit. Incremental gradient methods are particularly wellsuited for large neural network training problems where obtaining an approximate solution is typically sufficient and is often preferable to computing an exact solution. Thus, in the context of neural networks, the approach presented here is related to the principle of tolerant training. Our results justify numerous stepsize rules that were derived on the basis of extensive numerical experimentation but for which no theoretical analysis was previously available. In addition, convergence to (exact) stationary points is established when the gradient satisfies a certain growth property.