Results 1 - 10
of
33
Why there are Complementary Learning Systems in the Hippocampus and Neocortex: Insights from the Successes and Failures of Connectionist Models of Learning and Memory
, 1994
"... The influence of prior experience on some forms of behavior and cognition is drastically affected by damage to the hippocampal system. However, if the hippocampal system is left intact both during the experience and for a period of time thereafter, subsequent damage can have much less or even no eff ..."
Abstract
-
Cited by 288 (34 self)
- Add to MetaCart
The influence of prior experience on some forms of behavior and cognition is drastically affected by damage to the hippocampal system. However, if the hippocampal system is left intact both during the experience and for a period of time thereafter, subsequent damage can have much less or even no effect. Such findings suggest that memory traces change over time in a way that makes them less dependent on the hippocampal system. This process of change has often been called consolidation. Consolidation is a very gradual process; in humans, it appears to span up to 15 years. This article asks what consolidation is and why it occurs. We take as our point of departure the view that the initial memory trace that results from a relevant experience consists of changes to the strengths of the connections among neurons in the hippocampal system. Bidirectional connections between the neocortex and the hippocampus allow these initial traces to mediate the reinstatement of representations of events o...
Efficient Back Prop
, 1996
"... HINE Parameters X0, X1, ....Xp Output E0, E1,....Ep Error Desired Output D0, D1,...Dp Y0, Y1,...Yp Input w w0 w1 AT&T Laboratories (c) COST FUNCTION Output E0, E1,....Ep Error Desired Output D0, D1,...Dp Y0, Y1,...Yp X0, X1, ....Xp Input Parameters w B R A COMPUTING THE GRADIENT WITH BACKPROPAGATIO ..."
Abstract
-
Cited by 93 (16 self)
- Add to MetaCart
HINE Parameters X0, X1, ....Xp Output E0, E1,....Ep Error Desired Output D0, D1,...Dp Y0, Y1,...Yp Input w w0 w1 AT&T Laboratories (c) COST FUNCTION Output E0, E1,....Ep Error Desired Output D0, D1,...Dp Y0, Y1,...Yp X0, X1, ....Xp Input Parameters w B R A COMPUTING THE GRADIENT WITH BACKPROPAGATION O = A(I1, I2) dI1 = dO ¶ A ¶ I1 dI2 = dO ¶ A ¶ I2 - The learning machine is composed of modules (e.g. layers) - Each module can do two things: 1- compute its outputs from its inputs (FPROP) 2- compute gradient vectors at its inputs from gradient vectors at its outputs (BPROP) A O, dO I1, dI1 I2, dI2 AT&T Laboratories (c) AN INTERESTING SPECIAL CASE: MULTILAYER NETWORKS X0, X1, ....Xp Output Desired Output D0, D1,...Dp Y0, Y1,...Yp Input || D - Y || 2 2 1 WX F() WX F() Mean Square Error Parameters (weights + biases) w Weight matrix E0, E1,....Ep Sigmoids + Biase
Dimension Reduction by Local Principal Component Analysis
, 1997
"... Reducing or eliminating statistical redundancy between the components of high-dimensional vector data enables a lower-dimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural networ ..."
Abstract
-
Cited by 84 (0 self)
- Add to MetaCart
Reducing or eliminating statistical redundancy between the components of high-dimensional vector data enables a lower-dimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural network communities have developed nonlinear extensions of PCA. This article develops a local linear approach to dimension reduction that provides accurate representations and is fast to compute. We exercise the algorithms on speech and image data, and compare performance with PCA and with neural network implementations of nonlinear PCA. We find that both nonlinear techniques can provide more accurate representations than PCA and show that the local linear techniques outperform neural network implementations.
Convergence Properties of the K-Means Algorithms
"... This paper studies the convergence properties of the well known K-Means clustering algorithm. The K-Means algorithm can be described either as a gradient descent algorithm or by slightly extending the mathematics of the EM algorithm to this hard threshold case. We show that the K-Means algorithm act ..."
Abstract
-
Cited by 66 (2 self)
- Add to MetaCart
This paper studies the convergence properties of the well known K-Means clustering algorithm. The K-Means algorithm can be described either as a gradient descent algorithm or by slightly extending the mathematics of the EM algorithm to this hard threshold case. We show that the K-Means algorithm actually minimizes the quantization error using the very fast Newton algorithm.
Noisy Time Series Prediction using a Recurrent Neural Network and Grammatical Inference
- Machine Learning
, 2001
"... Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method proposed uses conversion into a symbolic representation with a selforganizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the direction of change for th...
Comparison of Optimized Backpropagation Algorithms
- Proc. of ESANN'93, Brussels
, 1993
"... Backpropagation is one of the most famous training algorithms for multilayer perceptrons. Unfortunately it can be very slow for practical applications. Over the last years many improvement strategies have been developed to speed up backpropagation. It's very difficult to compare these different tech ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
Backpropagation is one of the most famous training algorithms for multilayer perceptrons. Unfortunately it can be very slow for practical applications. Over the last years many improvement strategies have been developed to speed up backpropagation. It's very difficult to compare these different techniques, because most of them have been tested on various specific data sets. Most of the reported results are based on some kind of tiny and artificial training sets like XOR, encoder or decoder. It's very doubtful if these results hold for more complicate practical application. In this report an overview of many different speedup techniques is given. All of them were assessed by a very hard practical classification task, which consists of a big medical data set. As you will see many of these optimized algorithms fail in learning the data set. 1 Introduction This report is intended to summarize our experience using many different speedup techniques for the backpropagation algorithm. We have...
Learning Rate Schedules For Faster Stochastic Gradient Search
, 1992
"... . Stochastic gradient descent is a general algorithm that includes LMS, on-line backpropagation, and adaptive k-means clustering as special cases. The standard choices of the learning rate j (both adaptive and fixed functions of time) often perform quite poorly. In contrast, our recently proposed cl ..."
Abstract
-
Cited by 36 (0 self)
- Add to MetaCart
. Stochastic gradient descent is a general algorithm that includes LMS, on-line backpropagation, and adaptive k-means clustering as special cases. The standard choices of the learning rate j (both adaptive and fixed functions of time) often perform quite poorly. In contrast, our recently proposed class of "search then converge" (STC) learning rate schedules (Darken and Moody, 1990b, 1991) display the theoretically optimal asymptotic convergence rate and a superior ability to escape from poor local minima However, the user is responsible for setting a key parameter. We propose here a new methodology for creating the first automatically adapting learning rates that achieve the optimal rate of convergence. INTRODUCTION The stochastic gradient descent algorithm is \DeltaW (t) = \Gammajr W f [W (t); X(t)]; (1) where j is the learning rate, t is the "time", and X(t) is the independent random exemplar chosen at time t. The purpose of the algorithm is to find a parameter vector W that minim...
Incremental Dynamic Programming for On-Line Adaptive Optimal Control
, 1994
"... Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Reinforcement learning algorithms based on the principles of Dynamic Programming (DP) have enjoyed a great deal of recent attention both empirically and theoretically. These algorithms have been referred to generically as Incremental Dynamic Programming (IDP) algorithms. IDP algorithms are intended for use in situations where the information or computational resources needed by traditional dynamic programming algorithms are not available. IDP algorithms attempt to find a global solution to a DP problem by incrementally improving local constraint satisfaction properties as experience is gained through interaction with the environment. This class of algorithms is not new, going back at least as far as Samuel's adaptive checkers-playing programs,...
Competitive Learning Algorithms for Robust Vector Quantization
, 1998
"... The efficient representation and encoding of signals with limited resources, e.g., finite storage capacity and restricted transmission bandwidth, is a fundamental problem in technical as well as biological information processing systems. Typically, under realistic circumstances, the encoding and com ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
The efficient representation and encoding of signals with limited resources, e.g., finite storage capacity and restricted transmission bandwidth, is a fundamental problem in technical as well as biological information processing systems. Typically, under realistic circumstances, the encoding and communication of messages has to deal with different sources of noise and disturbances. In this paper, we propose a unifying approach to data compression by robust vector quantization, which explicitly deals with channel noise, bandwidth limitations, and random elimination of prototypes. The resulting algorithm is able to limit the detrimental effect of noise in a very general communication scenario. In addition, the presented model allows us to derive a novel competitive neural networks algorithm, which covers topology preserving feature maps, the so-called neural-gas algorithm, and the maximum entropy soft-max rule as special cases. Furthermore, continuation methods based on these noise models impr...

