Results 1 
8 of
8
Gradient calculation for dynamic recurrent neural networks: a survey
 IEEE Transactions on Neural Networks
, 1995
"... Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backp ..."
Abstract

Cited by 132 (3 self)
 Add to MetaCart
Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backpropagation through time, Elman's history cuto, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the uni ed presentation leads to generalizations of various sorts. We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, continue with some \tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. We present somesimulations, and at the end, address issues of computational complexity and learning speed.
Mean Field Theory for Sigmoid Belief Networks
 Journal of Artificial Intelligence Research
, 1996
"... We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. ..."
Abstract

Cited by 116 (12 self)
 Add to MetaCart
We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics.
Unsupervised Texture Segmentation in a Deterministic Annealing Framework
, 1998
"... We present a novel optimization framework for unsupervised texture segmentation that relies on statistical tests as a measure of homogeneity. Texture segmentation is formulated as a data clustering problem based on sparse proximity data. Dissimilarities of pairs of textured regions are computed from ..."
Abstract

Cited by 91 (9 self)
 Add to MetaCart
We present a novel optimization framework for unsupervised texture segmentation that relies on statistical tests as a measure of homogeneity. Texture segmentation is formulated as a data clustering problem based on sparse proximity data. Dissimilarities of pairs of textured regions are computed from a multiscale Gabor filter image representation. We discuss and compare a class of clustering objective functions which is systematically derived from invariance principles. As a general optimization framework we propose deterministic annealing based on a meanfield approximation. The canonical way to derive clustering algorithms within this framework as well as an efficient implementation of meanfield annealing and the closely related Gibbs sampler are presented. We apply both annealing variants to Brodatzlike microtexture mixtures and realword images.
Unsupervised Neural Network Learning Procedures . . .
, 1996
"... In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: informationpreserving methods, densi ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: informationpreserving methods, density estimation methods, and feature extraction methods. Each of these major sections concludes with a discussion of successful applications of the methods to realworld problems.
ModelIndependent Mean Field Theory as a Local Method for Approximate Propagation of Information
 Propagation of Information,” Computation in Neural Systems
, 2002
"... We present a systematic approach to mean field theory (MFT) in a general probabilistic setting without assuming a particular model. The mean field equations derived here may serve as a local and thus very simple method for approximate inference in probabilistic models such as Boltzmann machines or B ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
We present a systematic approach to mean field theory (MFT) in a general probabilistic setting without assuming a particular model. The mean field equations derived here may serve as a local and thus very simple method for approximate inference in probabilistic models such as Boltzmann machines or Bayesian networks. "Modelindependent" means that we do not assume a particular type of dependencies; in a Bayesian network, for example, we allow arbitrary tables to specify conditional dependencies. In general, there are multiple solutions to the mean field equations. We show that improved estimates can be obtained by forming a weighted mixture of the multiple mean field solutions. Simple approximate expressions for the mixture weights are given. The general formalism derived so far is evaluated for the special case of Bayesian networks. The benefits of taking into account multiple solutions are demonstrated by using MFT for inference in a small and in a very large Bayesian network. The results are compared to the exact results.
Equivalence of backpropagation and contrastive Hebbian learning in a layered network
"... BackprP&&mL666 and contrm866 e Hebbianlearnm6 ar two methods oftrPfi5W networ8 with hidden neur&; BackprWPmL6WW computes anerW signalfor the output neurtm andspr&j it over the hidden neurW5 Contrm85 e Hebbian learnm; involves clamping the outputneurm8 at desirP values, and letting the ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
BackprP&&mL666 and contrm866 e Hebbianlearnm6 ar two methods oftrPfi5W networ8 with hidden neur&; BackprWPmL6WW computes anerW signalfor the output neurtm andspr&j it over the hidden neurW5 Contrm85 e Hebbian learnm; involves clamping the outputneurm8 at desirP values, and letting the e#ect sprtm thrtmW feedback connections over the entir networ To investigate ther elationship between these twofor& of lear;W;m weconsider a special case in which they ar identical, a multilayer perWfi6;mL withlinear output units, to which weak feedback connections have been added. In this case, the change in networ state caused by clamping the outputneurm6 tur5 out to be the same as the erW signal spr&W by backprPfi;mLW656 exceptfor a scalar pr6&mL This suggests that the functionality of backprP&&mLWWP can ber&88&m altermLWW ely by a Hebbiantype learP5j algorP5mL which is suitablefor implementation in biological networW5 1
The minimum description length principle applied to feature learning and analogical mapping
 MCC Tech. Rep
, 1990
"... This paper describes an algorithm for orthogonal clustering. That is, it nds multiple partitions of a domain. The Minimum Description Length (MDL) Principle is used to de ne a parameterfree evaluation function over all possible sets of partitions. In contrast, conventional clustering algorithms can ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper describes an algorithm for orthogonal clustering. That is, it nds multiple partitions of a domain. The Minimum Description Length (MDL) Principle is used to de ne a parameterfree evaluation function over all possible sets of partitions. In contrast, conventional clustering algorithms can only nd a single partition of a set of data. While they can be applied iteratively to create hierarchies, these are limited to tree structures. Orthogonal clustering, on the other hand, cannot form hierarchies deeper than one layer. Ideally one would want an algorithm which doesboth. However there are important problems for which orthogonal clustering is desirable. In particular, orthogonal clusters correspond to feature vectors, which are widely used throughout cognitive science. Hopefully, orthogonal clusters will also be useful for nding analogies. A side e ect which deserves more exploration is the induction of domain axioms in which the features
Geometrical View On The Effectiveness Of Naive MeanField Approximation To Optimization Problems
, 1999
"... When one wishes to solve optimization problems by simulated annealing, the naive meaneld approximation provides a practical way of doing it. Extensions of the naive approximation by including higherorder terms have been proposed in order to improve accuracy of approximation. It has been reported, ..."
Abstract
 Add to MetaCart
When one wishes to solve optimization problems by simulated annealing, the naive meaneld approximation provides a practical way of doing it. Extensions of the naive approximation by including higherorder terms have been proposed in order to improve accuracy of approximation. It has been reported, however, that higherorder approximations do not work well in low temperature regimes. We present an analytical argument and a geometrical view on this contradictory observation based on informationgeometry, and give an intuitive explanation as to why the naive approximation does work well when it is applied to solving optimization problems. 1. INTRODUCTION Simulated annealing [1] has been recognized as a tool for solving optimization problems in a stochastic manner. It includes the socalled MarkovchainMonteCarlo (MCMC) procedure, and therefore, it suers from the same diculty as the one for the MCMC that in general it requires a huge amount of computation for sampling states with re...