Results 1 
2 of
2
Learning stochastic feedforward networks
, 1990
"... Introduction The work reported here began with the desire to find a network architecture that shared with Boltzmann machines [6, 1, 7] the capacity to learn arbitrary probability distributions over binary vectors, but that did not require the negative phase of Boltzmann machine learning. It was hypo ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Introduction The work reported here began with the desire to find a network architecture that shared with Boltzmann machines [6, 1, 7] the capacity to learn arbitrary probability distributions over binary vectors, but that did not require the negative phase of Boltzmann machine learning. It was hypothesized that eliminating the negative phase would improve learning performance. This goal was achieved by replacing the Boltzmann machine's symmetric connections with feedforward connections. In analogy with Boltzmann machines, the sigmoid function was used to compute the conditional probability of a unit being on from the weighted input from other units. Stochastic simulation of such a network is somewhat more complex than for a Boltzmann machine, but is still possible using local communication. Maximum likelihood, gradientascent learning can be done with a local Hebbtype rule.
Weight Space Probability Densities in Stochastic Learning: I. Dynamics and Equilibria
 Advances in Neural Information Processing Systems
, 1993
"... The ensemble dynamics of stochastic learning algorithms can be studied using theoretical techniques from statistical physics. We develop the equations of motion for the weight space probability densities for stochastic learning algorithms. We discuss equilibria in the diffusion approximation and ..."
Abstract

Cited by 9 (5 self)
 Add to MetaCart
The ensemble dynamics of stochastic learning algorithms can be studied using theoretical techniques from statistical physics. We develop the equations of motion for the weight space probability densities for stochastic learning algorithms. We discuss equilibria in the diffusion approximation and provide expressions for special cases of the LMS algorithm. The equilibrium densities are not in general thermal (Gibbs) distributions in the objective function being minimized, but rather depend upon an effective potential that includes diffusion effects. Finally we present an exact analytical expression for the time evolution of the density for a learning algorithm with weight updates proportional to the sign of the gradient. 1 Introduction: Theoretical Framework Stochastic learning algorithms involve weight updates of the form !(n + 1) = !(n) + (n) H[!(n); x(n) ] (1) where ! 2 RI m is the vector of m weights, is the learning rate, H[\Delta] 2 RI m is the update function, an...