## A learning algorithm for Boltzmann machines (1985)

### Cached

### Download Links

Venue: | Cognitive Science |

Citations: | 431 - 14 self |

### BibTeX

@ARTICLE{Ackley85alearning,

author = {H. Ackley and E. Hinton and J. Sejnowski},

title = {A learning algorithm for Boltzmann machines},

journal = {Cognitive Science},

year = {1985},

pages = {147--169}

}

### Years of Citing Articles

### OpenURL

### Abstract

The computotionol power of massively parallel networks of simple processing elements resides in the communication bandwidth provided by the hardware connections between elements. These connections con allow a significant fraction of the knowledge of the system to be applied to an instance of a problem in o very short time. One kind of computation for which massively porollel networks appear to be well suited is large constraint satisfaction searches, but to use the connections efficiently two conditions must be met: First, a search technique that is suitable for parallel networks must be found. Second, there must be some way of choosing internal representations which allow the preexisting hardware connections to be used efficiently for encoding the con-straints in the domain being searched. We describe a generol parallel search method, based on statistical mechanics, and we show how it leads to a gen-eral learning rule for modifying the connection strengths so as to incorporate knowledge obout o task domain in on efficient way. We describe some simple examples in which the learning algorithm creates internal representations thot ore demonstrobly the most efficient way of using the preexisting connectivity structure. 1.

### Citations

3720 | ªStochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images,º - Geman, Geman - 1984 |

3531 | Optimization by simulated annealing
- Kirkpatrick, Jr, et al.
- 1983
(Show Context)
Citation Context ...y Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller (1953) to study average properties of thermodynamic systems (Binder, 1978) and has recently been applied to problems of constraint satisfaction (=-=Kirkpatrick, Gelatt, & Vecchi, 1983-=-). We adopt a form of the Metroplis algorithm that is suitable for parallel computation: If the energy gap between the on and off states of the klh unit is AE, then regardless of the previous state se... |

2243 | Equation of state calculations by fast computing machines - Metropolis, Rosenbluth, et al. - 1953 |

1581 |
Neural networks and physical systems with emergent computational properties
- Hopfield
- 1982
(Show Context)
Citation Context ...he current states of the other hypotheses. If hardware units make their decisions asynchronously, and if transmission times are negligible, then the system always settles into a local energy minimum (=-=Hopfield, 1982-=-). Because the connections are symmetric, the difference between the energy of the whole system with the krh hypothesis rejected and its energy with the kfh hypothesis accepted can be determined local... |

350 | Understanding line drawings of scenes with shadows
- Waltz
- 1975
(Show Context)
Citation Context ... Boltzmann Machine is a parallel computational organization that is well suited to constraint satisfaction tasks involving large numbers of “weak” constraints. Constraint-satisfaction searches (e.g., =-=Waltz, 1975-=-; Winston, 1984) normally use “strong” constraints that tnusl be satisfied by any solution. In problem domains such as games and puzzles, for example, the goal criteria often have this character, so s... |

246 |
Connectionist models and their properties, Cognitive Science 6(3
- Feldman, Ballard
- 1982
(Show Context)
Citation Context ... helpful discussions. Reprint requests should be addressed to David Ackley, Computer Science Department, Carnegie-Mellon University, Pittsburgh, PA 15213. 147s148 ACKLEY. HINTON. AND SEJNOWSKI terns (=-=Feldman & Ballard, 1982-=-; Hinton & Anderson, 1981) that store their long-term knowledge as the strengths of the connections between simple neuron-like processing elements. These networks are clearly suited to tasks like visi... |

92 | Optimal perceptual inference - Hinton, Sejnowski - 1983 |

16 | Schema selection and stochastic inference in modular environments
- Smolensky
- 1983
(Show Context)
Citation Context ...lication of statistical mechanics to constraint satisfaction searches in parallel networks is a promising new area that has been discovered independently by several other groups (Geman & Geman, 1983; =-=Smolensky, 1983-=-). There are many interesting issues that we have only mentioned in passing. Some of these issues are discussed in greater detail elsewhere: Hinton and Sejnowski (1983b) and Geman and Geman (1983) des... |

1 | The QBKG system: Generating explanations from a non-discrete knowledge representation - H - 1982 |

1 |
The Mm/e-Cur/o /?re//rod /,I .s~o~is/ictr/ phmks
- Binder
- 1978
(Show Context)
Citation Context ...to configurations of higher energy. An algorithm with this property was introduced by Metropolis, Rosenbluth, Rosenbluth, Teller, & Teller (1953) to study average properties of thermodynamic systems (=-=Binder, 1978-=-) and has recently been applied to problems of constraint satisfaction (Kirkpatrick, Gelatt, & Vecchi, 1983). We adopt a form of the Metroplis algorithm that is suitable for parallel computation: If t... |

1 | The Hashnet Interconnection Scheme - E - 1980 |

1 | Massively parallel architectures for Al - Fahlman, E, et al. - 1983 |

1 | Dynamic connections in neural networks - A - 1982 |

1 | Frorrr imuges /o .FII~/UC~S - L - 1981 |

1 |
Re/u.uario~t und its role in vision. Unpublished doctoral dissertation, University of Edinburgh. Described in
- Hinton
- 1977
(Show Context)
Citation Context ...ome problem domains, such as finding the most plausible interpretation of an image, many of the criteria are not all-or-none, and frequently even the best possible solution violates some constraints (=-=Hinton, 1977-=-). A variation that is more appropriate for such domains uses weak constraints that incur a cost when violated. The quality of a solution is then determined by the total cost of all the constraints th... |

1 | Implementing semantic networks in parallel hardware - E - 1981 |

1 |
Pam//e/ models oJassociulive rne/nor.v
- Hinton, Anderson
- 1981
(Show Context)
Citation Context ...rint requests should be addressed to David Ackley, Computer Science Department, Carnegie-Mellon University, Pittsburgh, PA 15213. 147s148 ACKLEY. HINTON. AND SEJNOWSKI terns (Feldman & Ballard, 1982; =-=Hinton & Anderson, 1981-=-) that store their long-term knowledge as the strengths of the connections between simple neuron-like processing elements. These networks are clearly suited to tasks like vision that can be performed ... |

1 |
Analyzing cooperative computation. froceedings of ihe Fijlh Annuul Conference of /he Cognirive Science Sociefy
- Hinton, Sejnowski
- 1983
(Show Context)
Citation Context ...likely. Conversely, a negative weight suggests, other things being equal, that the two hypotheses should not both be accepted. Link weights are symmetric, having the same strength in both directions (=-=Hinton & Sejnowski, 1983-=-).’ ’ But, see (Berliner & Ackley, 1982) for argument that, even in such domains, strong constraints must be used only where absolutely necessary for legal play, and in particular must not propagate i... |

1 |
Eo//munn Machines: Cons/ruin/ m/is//c/ion nerworks /ho
- Hinton, Sejnowski, et al.
- 1984
(Show Context)
Citation Context ...chniques; Fahlman, Hinton, and Sejnowski (1983) compare Boltzmann machines with some alternative parallel schemes, and discuss some knowledge representation issues. An expanded version of this paper (=-=Hinton, Sejnowski, & Ackley, 1984-=-) presents this material in greater depth and discusses a number of related issues such as the relationship to the brain and the problem of sequential behavior. It also shows how the probabilistic dec... |

1 |
Injormu/ion /heor!, und r/a/isrics
- Kullback
- 1959
(Show Context)
Citation Context ...corresponding probability when the network is running freely with no environmental input. The G metric, sometimes called the asymmetric divergence or informaBOLTZMANN MACHINE LEARNING 155 tion gain (=-=Kullback, 1959-=-; Renyi, 1962), is a measure of the distance from the distribution given by the P’(V,) to the distribution given by the P(VJ. G is zero if and only if the distributions are identical; otherwise it is ... |

1 |
Percepplrons
- Minsky, Papert
- 1968
(Show Context)
Citation Context ...ch a network does the wrong thing it appears to be impossible to decide which of the many connection strengths is at fault. This “credit-assignment” problem was what led to the demise of perceptrons (=-=Minsky & Papert, 1968-=-; Rosenblatt, 1961). The perceptron convergence theorem guarantees that the weights of a single layer of decision units can be trained, but it could not be generalized to networks of such units when t... |

1 |
nte//ecruu/ issues in rhe hi.rrory o/ ur/i/cro
- Newell
- 1982
(Show Context)
Citation Context ...in such a way that the whole network develops an internal model which captures the underlying structure of its environment. There has been a long history of failure in the search for such algorithms (=-=Newell, 1982-=-), and many people (particularly in Artificial Intelligence) now believe that no such algorithms exist. The major technical stumbling block which prevented the generalization of simple learning algori... |

1 |
H~rnru!t problem so/vr~r
- Newell, Simon
- 1972
(Show Context)
Citation Context ...ardware-oriented connectionist descriptions and the more abstract symbol manipulation models that have proved to be an extremely powerful and pervasive way of describing human information processing (=-=Newell & Simon, 1972-=-). 2. THE BOLTZMANN MACHINE The Boltzmann Machine is a parallel computational organization that is well suited to constraint satisfaction tasks involving large numbers of “weak” constraints. Constrain... |

1 | A theory of memory retrieval - unknown authors - 1978 |

1 |
o/‘ tlelrrorl.vnu,,licJ: Perceprrom ontl /he fheo~:v o/’ brurn rtrechuniw~s
- Renyi
- 1962
(Show Context)
Citation Context ...obability when the network is running freely with no environmental input. The G metric, sometimes called the asymmetric divergence or informaBOLTZMANN MACHINE LEARNING 155 tion gain (Kullback, 1959; =-=Renyi, 1962-=-), is a measure of the distance from the distribution given by the P’(V,) to the distribution given by the P(VJ. G is zero if and only if the distributions are identical; otherwise it is positive. The... |

1 |
M~t/iireco/r~/ion cwrrpic~u/ion oJ ~,i.rih/e-c1rt:lirc.e re/~r)T(~.\etft~I/iot~.~. Unpublished doctoral dissertation
- Terzopoulos
- 1984
(Show Context)
Citation Context ...efficiently in parallel networks which have physical connections in just the places where processes need to communicate. For problems like surface interpolation from sparse depth data (Crimson, 1981; =-=Terzopoulos, 1984-=-) where the necessary decision units and communication paths can be determined in advance, it is relatively easy to see how to make good use of massive parallelism. The more difficult problem is to di... |