Results 1 - 10
of
97
Why there are Complementary Learning Systems in the Hippocampus and Neocortex: Insights from the Successes and Failures of Connectionist Models of Learning and Memory
, 1994
"... The influence of prior experience on some forms of behavior and cognition is drastically affected by damage to the hippocampal system. However, if the hippocampal system is left intact both during the experience and for a period of time thereafter, subsequent damage can have much less or even no eff ..."
Abstract
-
Cited by 288 (34 self)
- Add to MetaCart
The influence of prior experience on some forms of behavior and cognition is drastically affected by damage to the hippocampal system. However, if the hippocampal system is left intact both during the experience and for a period of time thereafter, subsequent damage can have much less or even no effect. Such findings suggest that memory traces change over time in a way that makes them less dependent on the hippocampal system. This process of change has often been called consolidation. Consolidation is a very gradual process; in humans, it appears to span up to 15 years. This article asks what consolidation is and why it occurs. We take as our point of departure the view that the initial memory trace that results from a relevant experience consists of changes to the strengths of the connections among neurons in the hippocampal system. Bidirectional connections between the neocortex and the hippocampus allow these initial traces to mediate the reinstatement of representations of events o...
Regularization Theory and Neural Networks Architectures
- Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract
-
Cited by 257 (30 self)
- Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Efficient Distribution-free Learning of Probabilistic Concepts
- Journal of Computer and System Sciences
, 1993
"... In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behavior---thus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic c ..."
Abstract
-
Cited by 182 (8 self)
- Add to MetaCart
In this paper we investigate a new formal model of machine learning in which the concept (boolean function) to be learned may exhibit uncertain or probabilistic behavior---thus, the same input may sometimes be classified as a positive example and sometimes as a negative example. Such probabilistic concepts (or p-concepts) may arise in situations such as weather prediction, where the measured variables and their accuracy are insufficient to determine the outcome with certainty. We adopt from the Valiant model of learning [27] the demands that learning algorithms be efficient and general in the sense that they perform well for a wide class of p-concepts and for any distribution over the domain. In addition to giving many efficient algorithms for learning natural classes of p-concepts, we study and develop in detail an underlying theory of learning p-concepts. 1 Introduction Consider the following scenarios: A meteorologist is attempting to predict tomorrow's weather as accurately as pos...
Toward efficient agnostic learning
- In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory
, 1992
"... Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtua ..."
Abstract
-
Cited by 169 (7 self)
- Add to MetaCart
Abstract. In this paper we initiate an investigation of generalizations of the Probably Approximately Correct (PAC) learning model that attempt to significantly weaken the target function assumptions. The ultimate goal in this direction is informally termed agnostic learning, in which we make virtually no assumptions on the target function. The name derives from the fact that as designers of learning algorithms, we give up the belief that Nature (as represented by the target function) has a simple or succinct explanation. We give a number of positive and negative results that provide an initial outline of the possibilities for agnostic learning. Our results include hardness results for the most obvious generalization of the PAC model to an agnostic setting, an efficient and general agnostic learning method based on dynamic programming, relationships between loss functions for agnostic learning, and an algorithm for a learning problem that involves hidden variables.
Network Information Criterion - Determining the Number of Hidden Units for an Artificial Neural Network Model
- IEEE Transactions on Neural Networks
, 1994
"... The problem of model selection, or determination of the number of hidden units, can be approached statistically, by generalizing Akaike's information criterion (AIC) to be applicable to unfaithful (i.e., unrealizable) models with general loss criteria including regularization terms. The relation bet ..."
Abstract
-
Cited by 120 (6 self)
- Add to MetaCart
The problem of model selection, or determination of the number of hidden units, can be approached statistically, by generalizing Akaike's information criterion (AIC) to be applicable to unfaithful (i.e., unrealizable) models with general loss criteria including regularization terms. The relation between the training error and the generalization error is studied in terms of the number of the training examples and the complexity of a network which reduces to the number of parameters in the ordinary statistical theory of the AIC. This relation leads to a new Network Information Criterion (NIC) which is useful for selecting the optimal network model based on a given training set. 3 IEEE Transactions on Neural Networks, Vol. 5, No. 6, pp. 865--872, November 1994 y Department of Mathematical Engineering and Information Physics, Faculty of Engineering, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan. 1 Introduction In engineering fields, one of the most important applicati...
First and Second-Order Methods for Learning: between Steepest Descent and Newton's Method
- Neural Computation
, 1992
"... On-line first order backpropagation is sufficiently fast and effective for many large-scale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first- and second-order optimization methods for learning in feedforward neura ..."
Abstract
-
Cited by 108 (6 self)
- Add to MetaCart
On-line first order backpropagation is sufficiently fast and effective for many large-scale classification problems but for very high precision mappings, batch processing may be the method of choice. This paper reviews first- and second-order optimization methods for learning in feedforward neural networks. The viewpoint is that of optimization: many methods can be cast in the language of optimization techniques, allowing the transfer to neural nets of detailed results about computational complexity and safety procedures to ensure convergence and to avoid numerical problems. The review is not intended to deliver detailed prescriptions for the most appropriate methods in specific applications, but to illustrate the main characteristics of the different methods and their mutual relations.
Extracting Comprehensible Models from Trained Neural Networks
, 1996
"... To Mom, Dad, and Susan, for their support and encouragement. ..."
Abstract
-
Cited by 65 (4 self)
- Add to MetaCart
To Mom, Dad, and Susan, for their support and encouragement.
Simultaneous Training of Negatively Correlated Neural Networks in an Ensemble
- IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
, 1999
"... This paper presents a new cooperative ensemble learning system (CELS) for designing neural network ensembles. The idea behind CELS is to encourage different individual networks in an ensemble to learn different parts or aspects of a training data so that the ensemble can learn the whole training dat ..."
Abstract
-
Cited by 34 (16 self)
- Add to MetaCart
This paper presents a new cooperative ensemble learning system (CELS) for designing neural network ensembles. The idea behind CELS is to encourage different individual networks in an ensemble to learn different parts or aspects of a training data so that the ensemble can learn the whole training data better. In CELS, the individual networks are trained simultaneously rather than independently or sequentially. This provides an opportunity for the individual networks to interact with each other and to specialize. CELS can create negatively correlated neural networks using a correlation penalty term in the error function to encourage such specialization. This paper analyzes CELS in terms of bias-variance-covariance tradeoff. CELS has also been tested on the Mackey--Glass time series prediction problem and the Australian credit card assessment problem. The experimental results show that CELS can produce neural network ensembles with good generalization ability.

