Results 1 - 10
of
76
A Unifying Review of Linear Gaussian Models
, 1999
"... Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observa ..."
Abstract
-
Cited by 208 (14 self)
- Add to MetaCart
Factor analysis, principal component analysis, mixtures of gaussian clusters, vector quantization, Kalman filter models, and hidden Markov models can all be unified as variations of unsupervised learning under a single basic generative model. This is achieved by collecting together disparate observations and derivations made by many previous authors and introducing a new way of linking discrete and continuous state models using a simple nonlinearity. Through the use of other nonlinearities, we show how independent component analysis is also a variation of the same basic generative model. We show that factor analysis and mixtures of gaussians can be implemented in autoencoder neural networks and learned using squared error plus the same regularization term. We introduce a new model for static data, known as sensible principal component analysis, as well as a novel concept of spatially adaptive observation noise. We also review some of the literature involving global and local mixtures of the basic models and provide pseudocode for inference and learning for all the basic models.
Separating style and content with bilinear models
- NEURAL COMPUTATION
, 2000
"... PERCEPTUAL systems routinely separate content from style, classifying familiar words spoken in an unfamiliar accent, identifying a font or handwriting style across letters, or recognizing a familiar face or object seen under unfamiliar viewing conditions. Yet a general and tractable computational mo ..."
Abstract
-
Cited by 119 (3 self)
- Add to MetaCart
PERCEPTUAL systems routinely separate content from style, classifying familiar words spoken in an unfamiliar accent, identifying a font or handwriting style across letters, or recognizing a familiar face or object seen under unfamiliar viewing conditions. Yet a general and tractable computational model of this ability to untangle the underlying factors of perceptual observations remains elusive. Existing factor models are either insufficiently rich to capture the complex interactions of perceptually meaningful factors such as phoneme and speaker accent or letter and font, or do not allow efficient learning algorithms. Here we show how perceptual systems may learn to solve these crucial tasks using surprisingly simple bilinear models. We report promising results in three realistic perceptual domains: spoken vowel classification with a benchmark multi-speaker database, extrapolation of fonts to unseen letters, and translation of faces to novel illuminants.
A Unifying Information-theoretic Framework for Independent Component Analysis
, 1999
"... We show that different theories recently proposed for Independent Component Analysis (ICA) lead to the same iterative learning algorithm for blind separation of mixed independent sources. We review those theories and suggest that information theory can be used to unify several lines of research. Pea ..."
Abstract
-
Cited by 74 (5 self)
- Add to MetaCart
We show that different theories recently proposed for Independent Component Analysis (ICA) lead to the same iterative learning algorithm for blind separation of mixed independent sources. We review those theories and suggest that information theory can be used to unify several lines of research. Pearlmutter and Parra (1996) and Cardoso (1997) showed that the infomax approach of Bell and Sejnowski (1995) and the maximum likelihood estimation approach are equivalent. We show that negentropy maximization also has equivalent properties and therefore all three approaches yield the same learning rule for a fixed nonlinearity. Girolami and Fyfe (1997a) have shown that the nonlinear Principal Component Analysis (PCA) algorithm of Karhunen and Joutsensalo (1994) and Oja (1997) can also be viewed from information-theoretic principles since it minimizes the sum of squares of the fourth-order marginal cumulants and therefore approximately minimizes the mutual information (Comon, 1994). Lambert (19...
Six Principles for Biologically-Based Computational Models of Cortical Cognition
- TRENDS IN COGNITIVE SCIENCES
, 1998
"... This paper describes and motivates six principles for computational cognitive neuroscience models: biological realism, distributed representations, inhibitory competition, bidirectional activation propagation, errordriven task learning, and Hebbian model learning. Although these principles are suppo ..."
Abstract
-
Cited by 43 (14 self)
- Add to MetaCart
This paper describes and motivates six principles for computational cognitive neuroscience models: biological realism, distributed representations, inhibitory competition, bidirectional activation propagation, errordriven task learning, and Hebbian model learning. Although these principles are supported by a number of cognitive, computational, and biological motivations, the prototypical neural network model (a feedforward backpropagation network) incorporates only two of them, and no widely used model incorporates all of them. This paper argues that these principles should be integrated into a coherent overall framework, and discusses some potential synergies and conflicts in doing so.
A Multi-Layer Sparse Coding Network Learns Contour Coding From Natural Images
, 2002
"... An important approach in visual neuroscience considers how the function of the early visual system relates to the statistics of its natural input. Previous studies have shown how many basic properties of the primary visual cortex, such as the receptive fields of simple and complex cells and the sp ..."
Abstract
-
Cited by 41 (8 self)
- Add to MetaCart
An important approach in visual neuroscience considers how the function of the early visual system relates to the statistics of its natural input. Previous studies have shown how many basic properties of the primary visual cortex, such as the receptive fields of simple and complex cells and the spatial organization (topography) of the cells, can be understood as efficient coding of natural images. Here we extend the framework by considering how the responses of complex cells could be sparsely represented by a higher-order neural layer. This leads to contour coding and end-stopped receptive fields. In addition, contour integration could be interpreted as top-down inference in the presented model.
Facial Expression Space Learning
- In Proceedings of Pacific Graphics
, 2002
"... experienced increased attention recently. Most current research focuses on techniques for capturing, synthesizing, and retargeting facial expressions. Little attention has been paid to the problem of controlling and modifiing the expression itself. We present techniques that separate video data into ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
experienced increased attention recently. Most current research focuses on techniques for capturing, synthesizing, and retargeting facial expressions. Little attention has been paid to the problem of controlling and modifiing the expression itself. We present techniques that separate video data into expressive features and underlying content. This allows, for example, a sequence originally recorded with a happy expression to be modified so that the speaker appears to be speaking with an angry or neutral expression. Although the expression has been modified, the new sequences maintain the same visual speech content as the original sequence. The facial expression space that allows these transformations is learned with the aid of a factorization model 1
Bayesian computation in recurrent neural circuits
- Neural Computation
, 2004
"... A large number of human psychophysical results have been successfully explained in recent years using Bayesian models. However, the neural implementation of such mod-els remains largely unclear. In this paper, we show that a network architecture com-monly used to model the cerebral cortex can implem ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
A large number of human psychophysical results have been successfully explained in recent years using Bayesian models. However, the neural implementation of such mod-els remains largely unclear. In this paper, we show that a network architecture com-monly used to model the cerebral cortex can implement Bayesian inference for an arbi-trary hidden Markov model. We illustrate the approach using an orientation discrimi-nation task and a visual motion detection task. In the case of orientation discrimination, we show that the model network can infer the posterior distribution over orientations and correctly estimate stimulus orientation in the presence of significant noise. In the case of motion detection, we show that the resulting model network exhibits direction selectivity and correctly computes the posterior probabilities over motion direction and position. When used to solve the well-known random dots motion discrimination task, the model generates responses that mimic the activities of evidence-accumulating neu-rons in cortical areas LIP and FEF. The framework introduced in the paper posits a new interpretation of cortical activities in terms of log posterior probabilities of stimuli occurring in the natural world. 1 1
The Rectified Gaussian Distribution
- Advances in Neural Information Processing Systems 10
, 1998
"... A simple but powerful modification of the standard Gaussian distribution is studied. The variables of the rectified Gaussian are constrained to be nonnegative, enabling the use of nonconvex energy functions. Two multimodal examples, the competitive and cooperative distributions, illustrate the repre ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
A simple but powerful modification of the standard Gaussian distribution is studied. The variables of the rectified Gaussian are constrained to be nonnegative, enabling the use of nonconvex energy functions. Two multimodal examples, the competitive and cooperative distributions, illustrate the representational power of the rectified Gaussian. Since the cooperative distribution can represent the translations of a pattern, it demonstrates the potential of the rectified Gaussian for modeling pattern manifolds. 1 INTRODUCTION The rectified Gaussian distribution is a modification of the standard Gaussian in which the variables are constrained to be nonnegative. This simple modification brings increased representational power, as illustrated by two multimodal examples of the rectified Gaussian, the competitive and the cooperative distributions. The modes of the competitive distribution are well-separated by regions of low probability. The modes of the cooperative distribution are closely sp...
Generalization in Interactive Networks: The Benefits of Inhibitory Competition and Hebbian Learning
- Neural Computation
, 2001
"... Computational models in cognitive neuroscience should ideally use biological properties and powerful computational principles to produce behavior consistent with psychological findings. Error-driven backpropagation is computationally powerful, and has proven useful for modeling a range of psycholo ..."
Abstract
-
Cited by 28 (5 self)
- Add to MetaCart
Computational models in cognitive neuroscience should ideally use biological properties and powerful computational principles to produce behavior consistent with psychological findings. Error-driven backpropagation is computationally powerful, and has proven useful for modeling a range of psychological data, but is not biologically plausible. Several approaches to implementing backpropagation in a biologically plausible fashion converge on the idea of using bidirectional activation propagation in interactive networks to convey error signals. This paper demonstrates two main points about these error-driven interactive networks: (a) they generalize poorly due to attractor dynamics that interfere with the network's ability to systematically produce novel combinatorial representations in response to novel inputs; and (b) this generalization problem can be remedied by adding two widely used mechanistic principles, inhibitory competition and Hebbian learning, that can be independent...
A hierarchical community of experts
- Learning in Graphical Models
, 1998
"... We describe a hierarchical generative model that selects from a large collection of available linear units an appropriate subset to model each observation. The selection mechanism is a corresponding network of binary units each of which gates the output of a linear unit. Inference in the binary netw ..."
Abstract
-
Cited by 27 (10 self)
- Add to MetaCart
We describe a hierarchical generative model that selects from a large collection of available linear units an appropriate subset to model each observation. The selection mechanism is a corresponding network of binary units each of which gates the output of a linear unit. Inference in the binary network is intractable, but the statistics required to learn maximum-likelihood model parameters can be approximated with Gibbs sampling, even if the sampling is so brief that the Markov chain is far from equilibrium. 1 Multilayer networks of linear-Gaussian units We consider directed acyclic networks of simple stochastic units, where the units are arranged in layers. The input to a unit is the weighted sum of the activities of units in the layer above, plus a bias. In the generative model, the joint probability of all of the units in the network taking on a particular set of values, or configuration, can be factored into a product of probabilities of individual units, conditioned on the units in the layer above. The simplest unit we will consider is a linear-Gaussian unit. The probability that a linear-Gaussian unit takes on a particular value is given by a Gaussian distribution centered at the top-down prediction of the unit’s parents. The top-down prediction for unit i, denoted �yi, is the weighted sum of its parents ’ outputs, plus a bias: �yi = � j∈P a(i)

