Results 11 - 20
of
101
Six Principles for Biologically-Based Computational Models of Cortical Cognition
- TRENDS IN COGNITIVE SCIENCES
, 1998
"... This paper describes and motivates six principles for computational cognitive neuroscience models: biological realism, distributed representations, inhibitory competition, bidirectional activation propagation, errordriven task learning, and Hebbian model learning. Although these principles are suppo ..."
Abstract
-
Cited by 43 (14 self)
- Add to MetaCart
This paper describes and motivates six principles for computational cognitive neuroscience models: biological realism, distributed representations, inhibitory competition, bidirectional activation propagation, errordriven task learning, and Hebbian model learning. Although these principles are supported by a number of cognitive, computational, and biological motivations, the prototypical neural network model (a feedforward backpropagation network) incorporates only two of them, and no widely used model incorporates all of them. This paper argues that these principles should be integrated into a coherent overall framework, and discusses some potential synergies and conflicts in doing so.
Efficient learning in Boltzmann Machines using linear response theory
- Neural Computation
, 1997
"... The learning process in Boltzmann Machines is computationally very expensive. The computational complexity of the exact algorithm is exponential in the number of neurons. We present a new approximate learning algorithm for Boltzmann Machines, which is based on mean field theory and the linear respon ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
The learning process in Boltzmann Machines is computationally very expensive. The computational complexity of the exact algorithm is exponential in the number of neurons. We present a new approximate learning algorithm for Boltzmann Machines, which is based on mean field theory and the linear response theorem. The computational complexity of the algorithm is cubic in the number of neurons. In the absence of hidden units, we show how the weights can be directly computed from the fixed point equation of the learning rules. Thus, in this case we do not need to use a gradient descent procedure for the learning process. We show that the solutions of this method are close to the optimal solutions and give a significant improvement when correlations play a significant role. Finally, we apply the method to a pattern completion task and show good performance for networks up to 100 neurons. 1 Introduction Boltzmann Machines (BMs) (Ackley et al., 1985), are networks of binary neurons with a stoc...
Improving the Mean Field Approximation via the Use of Mixture Distributions
, 1998
"... Introduction Graphical models provide a formalism in which to express and manipulate conditional independence statements. Inference algorithms for graphical models exploit these independence statements, using them to compute conditional probabilities while avoiding brute force marginalization over ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
Introduction Graphical models provide a formalism in which to express and manipulate conditional independence statements. Inference algorithms for graphical models exploit these independence statements, using them to compute conditional probabilities while avoiding brute force marginalization over the joint probability table. Many inference algorithms, in particular the clustering algorithms, make explicit their usage of conditional independence by constructing a data structure that captures the essential Markov properties underlying the graph. That is, the algorithm groups interacting variables into clusters, such that the hypergraph of clusters has Markov properties that allow simple local algorithms to be employed for inference. In the best case, in which the original graph is sparse and without long cycles, the clusters are small and inference is efficient. In the worst case, such as the case of a dense graph, the clusters are large and inference is inefficient (complexity
Recursive Algorithms for Approximating Probabilities in Graphical Models
"... We develop a recursive node-elimination formalism for efficiently approximating large probabilistic networks. No constraints are set on the network topologies. Yet the formalism can be straightforwardly integrated with exact methods whenever they are/become applicable. The approximations we use are ..."
Abstract
-
Cited by 28 (10 self)
- Add to MetaCart
We develop a recursive node-elimination formalism for efficiently approximating large probabilistic networks. No constraints are set on the network topologies. Yet the formalism can be straightforwardly integrated with exact methods whenever they are/become applicable. The approximations we use are controlled: they maintain consistently upper and lower bounds on the desired quantities at all times. We show that Boltzmann machines, sigmoid belief networks, or any combination (i.e., chain graphs) can be handled within the same framework. The accuracy of the methods is verified experimentally. 1 Introduction Graphical models (see, e.g., Lauritzen 1996) provide a medium for rigorously embedding domain knowledge into network models. The structure in these graphical models embodies the qualitative assumptions about the independence relationships in the domain while the probability model attached to the graph permits a consistent computation of belief (or uncertainty) about the values of t...
Learning continuous probability distributions with symmetric diffusion networks
- Cognitive Science
, 1993
"... in this article we present symmetric diffusion networks, a family of networks that instantiate the principles of continuous, stochastic, adaptive and interactive pro-pagation of information. Using methods of Markovlon diffusion theory, we for-malize the activation dynamics of these networks and then ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
in this article we present symmetric diffusion networks, a family of networks that instantiate the principles of continuous, stochastic, adaptive and interactive pro-pagation of information. Using methods of Markovlon diffusion theory, we for-malize the activation dynamics of these networks and then show that they can be trained to reproduce entire muitivariote probability distributions an their outputs using the contrastive Hebbian learning rule (CHL).,We show that CHL performs gradient descent on an error function that captures differences between desired and obtolned continuous multivoriate probability distributions. This allows the learning algorithm to go beyond expected values of output units and to approxi-mate complete probability distributions on continuous muitivariote activation spaces. We argue that learning continuous distributions is an important task underlying a variety of real-life situations that were beyond the scope of previous connectionist networks. Deterministic networks, like back propagation, cannot ieorn this task because they ore limited to learning average values of indepen-dent output units. Previous stochastic connectionist networks could learn pro-bobility distributions but they were limited to discrete variables. Simulations show that symmetric diffusion networks can be trained with the CHL rule to op-proximate discrete and continuous probability distributions of various types. 1.
Transferring Previously Learned Back-Propagation Neural Networks To New Learning Tasks
, 1993
"... ..."
A New Learning Algorithm for Mean Field Boltzmann Machines
, 2002
"... We present a new learning algorithm for Mean Field Boltzmann Machines based on the contrastive divergence optimization criterion. In addition to minimizing the divergence between the data distribution and the equilibrium distribution that the network believes in, we maximize the divergence betwe ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
We present a new learning algorithm for Mean Field Boltzmann Machines based on the contrastive divergence optimization criterion. In addition to minimizing the divergence between the data distribution and the equilibrium distribution that the network believes in, we maximize the divergence between one-step reconstructions of the data and the equilibrium distribution. This eliminates the need to estimate equilibrium statistics, so we do not need to approximate the multimodal probablility distribution of the free network with the unimodal mean field distribution. We test the learning algorithm on the classification of digits. A New Learning Algorithm for Mean Field Boltzmann Machines Max Welling G.E. Hinton Gatsby Unit 1 Boltzmann Machines The stochastic Boltzmann machine (BM) is a probabilistic neural network of symmetrically connected binary units taking values f0; 1g (Ackley, Hinton & Sejnowski, 1985). The variant used for unsupervised learning consists of a set of visi...
Unsupervised Neural Network Learning Procedures . . .
, 1996
"... In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: information-preserving methods, densi ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: information-preserving methods, density estimation methods, and feature extraction methods. Each of these major sections concludes with a discussion of successful applications of the methods to real-world problems.
Dynamic Recurrent Neural Networks
, 1990
"... We survey learning algorithms for recurrent neural networks with hidden units and attempt to put the various techniques into a common framework. We discuss fixpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non-fixpoint algorithms, namely backpro ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
We survey learning algorithms for recurrent neural networks with hidden units and attempt to put the various techniques into a common framework. We discuss fixpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non-fixpoint algorithms, namely backpropagation through time, Elman's history cutoff nets, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, is also discussed. In many cases, the unified presentation leads to generalizations of various sorts. Some simulations are presented, and at the end, issues of computational complexity are addressed. This research was sponsored in part by The Defense Advanced Research Projects Agency, Information Science and Technology Office, under the title "Research on Parallel Computing", ARPA Order No. 7330, issued by DARPA/CMO under Contract MDA972-90-C-0035 and in part by the National Science Foundation under grant number EET-8716324 and i...
Learning in Boltzmann Trees
- Neural Computation
, 1995
"... We introduce a large family of Boltzmann machines that can be trained using standard gradient descent. The networks can have one or more layers of hidden units, with tree-like connectivity. We show how to implement the supervised learning algorithm for these Boltzmann machines exactly, without resor ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
We introduce a large family of Boltzmann machines that can be trained using standard gradient descent. The networks can have one or more layers of hidden units, with tree-like connectivity. We show how to implement the supervised learning algorithm for these Boltzmann machines exactly, without resort to simulated or mean-field annealing. The stochastic averages that yield the gradients in weight space are computed by the technique of decimation. We present results on the problems of N-bit parity and the detection of hidden symmetries. 1 Introduction Boltzmann machines (Ackley, Hinton, & Sejnowski, 1985) have several compelling virtues. Unlike simple perceptrons, they can solve problems that are not linearly separable. The learning rule, simple and locally based, lends itself to massive parallelism. The theory of Boltzmann learning, moreover, has a solid foundation in statistical mechanics. Unfortunately, Boltzmann machines--- as originally conceived---also have some serious drawbacks...

