Results 1  10
of
119
Connectionist Learning Procedures
 ARTIFICIAL INTELLIGENCE
, 1989
"... A major goal of research on networks of neuronlike processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way ..."
Abstract

Cited by 339 (6 self)
 Add to MetaCart
A major goal of research on networks of neuronlike processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way that internal units which are not part of the input or output come to represent important features of the task domain. Several interesting gradientdescent procedures have recently been discovered. Each connection computes the derivative, with respect to the connection strength, of a global measure of the error in the performance of the network. The strength is then adjusted in the direction that decreases the error. These relatively simple, gradientdescent learning procedures work well for small tasks and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.
The Helmholtz Machine
, 1995
"... Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative model ..."
Abstract

Cited by 193 (21 self)
 Add to MetaCart
Discovering the structure inherent in a set of patterns is a fundamental aim of statistical inference or learning. One fruitful approach is to build a parameterized stochastic generative model, independent draws from which are likely to produce the patterns. For all but the simplest generative models, each pattern can be generated in exponentially many ways. It is thus intractable to adjust the parameters to maximize the probability of the observed patterns. We describe a way of finessing this combinatorial explosion by maximizing an easily computed lower bound on the probability of the observations. Our method can be viewed as a form of hierarchical selfsupervised learning that may relate to the function of bottomup and topdown cortical processing pathways.
Deep Dyslexia: A Case Study of Connectionist Neuropsychology
, 1993
"... Deep dyslexia is an acquired reading disorder marked by the occurrence of semantic errors (e.g., reading RIVER as "ocean"). In addition, patients exhibit a number of other symptoms, including visual and morphological effects in their errors, a partofspeech effect, and an advantage for concrete ove ..."
Abstract

Cited by 140 (27 self)
 Add to MetaCart
Deep dyslexia is an acquired reading disorder marked by the occurrence of semantic errors (e.g., reading RIVER as "ocean"). In addition, patients exhibit a number of other symptoms, including visual and morphological effects in their errors, a partofspeech effect, and an advantage for concrete over abstract words. Deep dyslexia poses a distinct challenge for cognitive neuropsychology because there is little understanding of why such a variety of symptoms should cooccur in virtually all known patients. Hinton and Shallice (1991) replicated the cooccurrence of visual and semantic errors by lesioning a recurrent connectionist network trained to map from orthography to semantics. While the success of their simulations is encouraging, there is little understanding of what underlying principles are responsible for them. In this paper we evaluate and, where possible, improve on the most important design decisions made by Hinton and Shallice, relating to the task, the network architecture, the training procedure, and the testing procedure. We identify four properties of networks that underly their ability to reproduce the deep dyslexic symptomcomplex: distributed orthographic and semantic representations, gradient descent learning, attractors for word meanings, and greater richness of concrete vs. abstract semantics. The first three of these are general connectionist principles and the last is based on earlier theorizing. Taken together, the results demonstrate the usefulness of a connectionist approach to understanding deep dyslexia in particular, and the viability of connectionist neuropsychology in general.
Gradient calculation for dynamic recurrent neural networks: a survey
 IEEE Transactions on Neural Networks
, 1995
"... Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backp ..."
Abstract

Cited by 136 (3 self)
 Add to MetaCart
Abstract  We survey learning algorithms for recurrent neural networks with hidden units, and put the various techniques into a common framework. We discuss xedpoint learning algorithms, namely recurrent backpropagation and deterministic Boltzmann Machines, and non xedpoint algorithms, namely backpropagation through time, Elman's history cuto, and Jordan's output feedback architecture. Forward propagation, an online technique that uses adjoint equations, and variations thereof, are also discussed. In many cases, the uni ed presentation leads to generalizations of various sorts. We discuss advantages and disadvantages of temporally continuous neural networks in contrast to clocked ones, continue with some \tricks of the trade" for training, using, and simulating continuous time and recurrent neural networks. We present somesimulations, and at the end, address issues of computational complexity and learning speed.
Mean Field Theory for Sigmoid Belief Networks
 Journal of Artificial Intelligence Research
, 1996
"... We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics. ..."
Abstract

Cited by 118 (12 self)
 Add to MetaCart
We develop a mean field theory for sigmoid belief networks based on ideas from statistical mechanics.
Learning from Labeled and Unlabeled Data with Label Propagation
, 2002
"... We investigate the use of unlabeled data to help labeled data in classification. We propose a simple iterative algorithm, label propagation, to propagate labels through the dataset along high density areas defined by unlabeled data. We give the analysis of the algorithm, show its solution, and its c ..."
Abstract

Cited by 110 (0 self)
 Add to MetaCart
We investigate the use of unlabeled data to help labeled data in classification. We propose a simple iterative algorithm, label propagation, to propagate labels through the dataset along high density areas defined by unlabeled data. We give the analysis of the algorithm, show its solution, and its connection to several other algorithms. We also show how to learn parameters by minimum spanning tree heuristic and entropy minimization, and the algorithm's ability to do feature selection. Experiment results are promising.
Exploiting Tractable Substructures in Intractable Networks
 Advances in Neural Information Processing Systems 8
, 1995
"... We develop a refined mean field approximation for inference and learning in probabilistic neural networks. Our mean field theory, unlike most, does not assume that the units behave as independent degrees of freedom; instead, it exploits in a principled way the existence of large substructures that a ..."
Abstract

Cited by 98 (11 self)
 Add to MetaCart
We develop a refined mean field approximation for inference and learning in probabilistic neural networks. Our mean field theory, unlike most, does not assume that the units behave as independent degrees of freedom; instead, it exploits in a principled way the existence of large substructures that are computationally tractable. To illustrate the advantages of this framework, we show how to incorporate weak higher order interactions into a firstorder hidden Markov model, treating the corrections (but not the first order structure) within mean field theory. 1 INTRODUCTION Learning the parameters in a probabilistic neural network may be viewed as a problem in statistical estimation. In networks with sparse connectivity (e.g. trees and chains), there exist efficient algorithms for the exact probabilistic calculations that support inference and learning. In general, however, these calculations are intractable, and approximations are required. Mean field theory provides a framework for app...
Biologically Plausible Errordriven Learning using Local Activation Differences: The Generalized Recirculation Algorithm
 NEURAL COMPUTATION
, 1996
"... The error backpropagation learning algorithm (BP) is generally considered biologically implausible because it does not use locally available, activationbased variables. A version of BP that can be computed locally using bidirectional activation recirculation (Hinton & McClelland, 1988) instead of ..."
Abstract

Cited by 95 (11 self)
 Add to MetaCart
The error backpropagation learning algorithm (BP) is generally considered biologically implausible because it does not use locally available, activationbased variables. A version of BP that can be computed locally using bidirectional activation recirculation (Hinton & McClelland, 1988) instead of backpropagated error derivatives is more biologically plausible. This paper presents a generalized version of the recirculation algorithm (GeneRec), which overcomes several limitations of the earlier algorithm by using a generic recurrent network with sigmoidal units that can learn arbitrary input/output mappings. However, the contrastiveHebbian learning algorithm (CHL, a.k.a. DBM or mean field learning) also uses local variables to perform errordriven learning in a sigmoidal recurrent network. CHL was derived in a stochastic framework (the Boltzmann machine), but has been extended to the deterministic case in various ways, all of which rely on problematic approximationsand assumptions, le...
Unsupervised Texture Segmentation in a Deterministic Annealing Framework
, 1998
"... We present a novel optimization framework for unsupervised texture segmentation that relies on statistical tests as a measure of homogeneity. Texture segmentation is formulated as a data clustering problem based on sparse proximity data. Dissimilarities of pairs of textured regions are computed from ..."
Abstract

Cited by 91 (9 self)
 Add to MetaCart
We present a novel optimization framework for unsupervised texture segmentation that relies on statistical tests as a measure of homogeneity. Texture segmentation is formulated as a data clustering problem based on sparse proximity data. Dissimilarities of pairs of textured regions are computed from a multiscale Gabor filter image representation. We discuss and compare a class of clustering objective functions which is systematically derived from invariance principles. As a general optimization framework we propose deterministic annealing based on a meanfield approximation. The canonical way to derive clustering algorithms within this framework as well as an efficient implementation of meanfield annealing and the closely related Gibbs sampler are presented. We apply both annealing variants to Brodatzlike microtexture mixtures and realword images.
Double Dissociation Without Modularity: Evidence from Connectionist Neuropsychology
 Journal of Clinical and Experimental Neuropsychology
, 1995
"... Many theorists assume that the cognitive system is composed of a collection of encapsulated processing components or modules, each dedicated to performing a particular cognitive function. On this view, selective impairments of cognitive tasks following brain damage, as evidenced by double dissociati ..."
Abstract

Cited by 73 (15 self)
 Add to MetaCart
Many theorists assume that the cognitive system is composed of a collection of encapsulated processing components or modules, each dedicated to performing a particular cognitive function. On this view, selective impairments of cognitive tasks following brain damage, as evidenced by double dissociations, are naturally interpreted in terms of the loss of particular processing components. By contrast, the current investigation examines in detail a double dissociation between concrete and abstract word reading after damage to a connectionist network that pronounces words via meaning and yet has no separable components (Plaut & Shallice, 1993). The functional specialization in the network that gives rise to the double dissociation is not transparently related to the network's structure, as modular theories assume. Furthermore, a consideration of the distribution of effects across quantitatively equivalent individual lesions in the network raises specific concerns about the interpretation of...