Results 11  20
of
119
Six Principles for BiologicallyBased Computational Models of Cortical Cognition
 TRENDS IN COGNITIVE SCIENCES
, 1998
"... This paper describes and motivates six principles for computational cognitive neuroscience models: biological realism, distributed representations, inhibitory competition, bidirectional activation propagation, errordriven task learning, and Hebbian model learning. Although these principles are suppo ..."
Abstract

Cited by 52 (14 self)
 Add to MetaCart
This paper describes and motivates six principles for computational cognitive neuroscience models: biological realism, distributed representations, inhibitory competition, bidirectional activation propagation, errordriven task learning, and Hebbian model learning. Although these principles are supported by a number of cognitive, computational, and biological motivations, the prototypical neural network model (a feedforward backpropagation network) incorporates only two of them, and no widely used model incorporates all of them. This paper argues that these principles should be integrated into a coherent overall framework, and discusses some potential synergies and conflicts in doing so.
Divergence Measures and Message Passing
, 2005
"... This paper presents a unifying view of messagepassing algorithms, as methods to approximate a complex Bayesian network by a simpler network with minimum information divergence. In this view, the difference between meanfield methods and belief propagation is not the amount of structure they model, b ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
This paper presents a unifying view of messagepassing algorithms, as methods to approximate a complex Bayesian network by a simpler network with minimum information divergence. In this view, the difference between meanfield methods and belief propagation is not the amount of structure they model, but only the measure of loss they minimize (‘exclusive ’ versus ‘inclusive’ KullbackLeibler divergence). In each case, messagepassing arises by minimizing a localized version of the divergence, local to each factor. By examining these divergence measures, we can intuit the types of solution they prefer (symmetrybreaking, for example) and their suitability for different tasks. Furthermore, by considering a wider variety of divergence measures (such as alphadivergences), we can achieve different complexity and performance goals. 1
Efficient learning in Boltzmann Machines using linear response theory
 Neural Computation
, 1997
"... The learning process in Boltzmann Machines is computationally very expensive. The computational complexity of the exact algorithm is exponential in the number of neurons. We present a new approximate learning algorithm for Boltzmann Machines, which is based on mean field theory and the linear respon ..."
Abstract

Cited by 44 (5 self)
 Add to MetaCart
The learning process in Boltzmann Machines is computationally very expensive. The computational complexity of the exact algorithm is exponential in the number of neurons. We present a new approximate learning algorithm for Boltzmann Machines, which is based on mean field theory and the linear response theorem. The computational complexity of the algorithm is cubic in the number of neurons. In the absence of hidden units, we show how the weights can be directly computed from the fixed point equation of the learning rules. Thus, in this case we do not need to use a gradient descent procedure for the learning process. We show that the solutions of this method are close to the optimal solutions and give a significant improvement when correlations play a significant role. Finally, we apply the method to a pattern completion task and show good performance for networks up to 100 neurons. 1 Introduction Boltzmann Machines (BMs) (Ackley et al., 1985), are networks of binary neurons with a stoc...
Improving the Mean Field Approximation via the Use of Mixture Distributions
, 1998
"... Introduction Graphical models provide a formalism in which to express and manipulate conditional independence statements. Inference algorithms for graphical models exploit these independence statements, using them to compute conditional probabilities while avoiding brute force marginalization over ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
Introduction Graphical models provide a formalism in which to express and manipulate conditional independence statements. Inference algorithms for graphical models exploit these independence statements, using them to compute conditional probabilities while avoiding brute force marginalization over the joint probability table. Many inference algorithms, in particular the clustering algorithms, make explicit their usage of conditional independence by constructing a data structure that captures the essential Markov properties underlying the graph. That is, the algorithm groups interacting variables into clusters, such that the hypergraph of clusters has Markov properties that allow simple local algorithms to be employed for inference. In the best case, in which the original graph is sparse and without long cycles, the clusters are small and inference is efficient. In the worst case, such as the case of a dense graph, the clusters are large and inference is inefficient (complexity
Mining associated text and images with dualwing Harmoniums
 In Conference on Uncertainty in Artificial Intelligence
, 2005
"... We propose a multiwing harmonium model for mining multimedia data that extends and improves on earlier models based on twolayer random fields, which capture bidirectional dependencies between hidden topic aspects and observed inputs. This model can be viewed as an undirected counterpart of the two ..."
Abstract

Cited by 34 (9 self)
 Add to MetaCart
We propose a multiwing harmonium model for mining multimedia data that extends and improves on earlier models based on twolayer random fields, which capture bidirectional dependencies between hidden topic aspects and observed inputs. This model can be viewed as an undirected counterpart of the twolayer directed models such as LDA for similar tasks, but bears significant difference in inference/learning cost tradeoffs, latent topic representations, and topic mixing mechanisms. In particular, our model facilitates efficient inference and robust topic mixing, and potentially provides high flexibilities in modeling the latent topic spaces. A contrastive divergence and a variational algorithm are derived for learning. We specialized our model to a dualwing harmonium for captioned images, incorporating a multivariate Poisson for wordcounts and a multivariate Gaussian for color histogram. We present empirical results on the applications of this model to classification, retrieval and image annotation on news video collections, and we report an extensive comparison with various extant models. 1
A New Learning Algorithm for Mean Field Boltzmann Machines
, 2002
"... We present a new learning algorithm for Mean Field Boltzmann Machines based on the contrastive divergence optimization criterion. In addition to minimizing the divergence between the data distribution and the equilibrium distribution that the network believes in, we maximize the divergence betwe ..."
Abstract

Cited by 33 (8 self)
 Add to MetaCart
We present a new learning algorithm for Mean Field Boltzmann Machines based on the contrastive divergence optimization criterion. In addition to minimizing the divergence between the data distribution and the equilibrium distribution that the network believes in, we maximize the divergence between onestep reconstructions of the data and the equilibrium distribution. This eliminates the need to estimate equilibrium statistics, so we do not need to approximate the multimodal probablility distribution of the free network with the unimodal mean field distribution. We test the learning algorithm on the classification of digits. A New Learning Algorithm for Mean Field Boltzmann Machines Max Welling G.E. Hinton Gatsby Unit 1 Boltzmann Machines The stochastic Boltzmann machine (BM) is a probabilistic neural network of symmetrically connected binary units taking values f0; 1g (Ackley, Hinton & Sejnowski, 1985). The variant used for unsupervised learning consists of a set of visi...
Learning continuous probability distributions with symmetric diffusion networks
 Cognitive Science
, 1993
"... in this article we present symmetric diffusion networks, a family of networks that instantiate the principles of continuous, stochastic, adaptive and interactive propagation of information. Using methods of Markovlon diffusion theory, we formalize the activation dynamics of these networks and then ..."
Abstract

Cited by 32 (7 self)
 Add to MetaCart
in this article we present symmetric diffusion networks, a family of networks that instantiate the principles of continuous, stochastic, adaptive and interactive propagation of information. Using methods of Markovlon diffusion theory, we formalize the activation dynamics of these networks and then show that they can be trained to reproduce entire muitivariote probability distributions an their outputs using the contrastive Hebbian learning rule (CHL).,We show that CHL performs gradient descent on an error function that captures differences between desired and obtolned continuous multivoriate probability distributions. This allows the learning algorithm to go beyond expected values of output units and to approximate complete probability distributions on continuous muitivariote activation spaces. We argue that learning continuous distributions is an important task underlying a variety of reallife situations that were beyond the scope of previous connectionist networks. Deterministic networks, like back propagation, cannot ieorn this task because they ore limited to learning average values of independent output units. Previous stochastic connectionist networks could learn probobility distributions but they were limited to discrete variables. Simulations show that symmetric diffusion networks can be trained with the CHL rule to opproximate discrete and continuous probability distributions of various types. 1.
Longterm semantic priming: A computational account and empirical evidence
 Journal of Experimental Psychology: Learning, Memory, and Cognition
, 1997
"... Semantic priming is traditionally viewed as an effect that rapidly decays. A new view of longterm word priming in attractor neural networks is proposed. The model predicts longterm semantic priming under certain conditions. That is, the task must engage semanticlevel processing to a sufficient de ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
Semantic priming is traditionally viewed as an effect that rapidly decays. A new view of longterm word priming in attractor neural networks is proposed. The model predicts longterm semantic priming under certain conditions. That is, the task must engage semanticlevel processing to a sufficient degree. The predictions were confirmed in computer simulations and in 3 experiments. Experiment 1 showed that when target words are each preceded by multiple semantically related primes, there is longlag priming on a semanticdecision task but not on a lexicaldecision task. Experiment 2 replicated the longterm semantic priming effect for semantic decisions with only one prime per target. Experiment 3 demonstrated semantic priming with much longer word lists at lags of 0, 4, and 8 items. These are the first experiments to demonstrate a semantic priming effect spanning many intervening items and lasting much longer than a few seconds. Many forms of priming have been studied (for reviews, see Monsell, 1985; RichardsonKlavehn & Bjork, 1988; Schacter, 1987). Whereas in repetition priming the priming stimulus is identical to the target, in similaritybased priming tests (e.g., form priming, morphological priming, and semantic priming), the prime and target are different words sharing some surface features, semantic features, or both. Repetition priming and form priming have been found to produce longlasting effects ranging from hours to weeks or even
Recursive Algorithms for Approximating Probabilities in Graphical Models
"... We develop a recursive nodeelimination formalism for efficiently approximating large probabilistic networks. No constraints are set on the network topologies. Yet the formalism can be straightforwardly integrated with exact methods whenever they are/become applicable. The approximations we use are ..."
Abstract

Cited by 29 (10 self)
 Add to MetaCart
We develop a recursive nodeelimination formalism for efficiently approximating large probabilistic networks. No constraints are set on the network topologies. Yet the formalism can be straightforwardly integrated with exact methods whenever they are/become applicable. The approximations we use are controlled: they maintain consistently upper and lower bounds on the desired quantities at all times. We show that Boltzmann machines, sigmoid belief networks, or any combination (i.e., chain graphs) can be handled within the same framework. The accuracy of the methods is verified experimentally. 1 Introduction Graphical models (see, e.g., Lauritzen 1996) provide a medium for rigorously embedding domain knowledge into network models. The structure in these graphical models embodies the qualitative assumptions about the independence relationships in the domain while the probability model attached to the graph permits a consistent computation of belief (or uncertainty) about the values of t...
Dynamic Recurrent Neural Networks
, 1990
"... We survey learning algorithms for recurrent neural networks with hidden units and attempt to put the various techniques into a common framework. We discuss fixpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and nonfixpoint algorithms, namely backpro ..."
Abstract

Cited by 27 (3 self)
 Add to MetaCart
We survey learning algorithms for recurrent neural networks with hidden units and attempt to put the various techniques into a common framework. We discuss fixpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and nonfixpoint algorithms, namely backpropagation through time, Elman's history cutoff nets, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, is also discussed. In many cases, the unified presentation leads to generalizations of various sorts. Some simulations are presented, and at the end, issues of computational complexity are addressed. This research was sponsored in part by The Defense Advanced Research Projects Agency, Information Science and Technology Office, under the title "Research on Parallel Computing", ARPA Order No. 7330, issued by DARPA/CMO under Contract MDA97290C0035 and in part by the National Science Foundation under grant number EET8716324 and i...