Results 1  10
of
17
Hierarchical Bayesian Inference in the Visual Cortex
, 2002
"... this paper, we propose a Bayesian theory of hierarchical cortical computation based both on (a) the mathematical and computational ideas of computer vision and pattern the ory and on (b) recent neurophysiological experimental evidence. We ,2 have proposed that Grenander's pattern theory 3 could pot ..."
Abstract

Cited by 173 (0 self)
 Add to MetaCart
this paper, we propose a Bayesian theory of hierarchical cortical computation based both on (a) the mathematical and computational ideas of computer vision and pattern the ory and on (b) recent neurophysiological experimental evidence. We ,2 have proposed that Grenander's pattern theory 3 could potentially model the brain as a generafive model in such a way that feedback serves to disambiguate and 'explain away' the earlier representa tion. The Helmholtz machine 4, 5 was an excellent step towards approximating this proposal, with feedback implementing priors. Its development, however, was rather limited, dealing only with binary images. Moreover, its feedback mechanisms were engaged only during the learning of the feedforward connections but not during perceptual inference, though the Gibbs sampling process for inference can potentially be interpreted as topdown feedback disambiguating low level representations? Rao and Ballard's predictive coding/Kalman filter model 6 did integrate generafive feedback in the perceptual inference process, but it was primarily a linear model and thus severely limited in practical utility. The datadriven Markov Chain Monte Carlo approach of Zhu and colleagues 7, 8 might be the most successful recent application of this proposal in solving real and difficult computer vision problems using generafive models, though its connection to the visual cortex has not been explored. Here, we bring in a powerful and widely applicable paradigm from artificial intelligence and computer vision to propose some new ideas about the algorithms of visual cortical process ing and the nature of representations in the visual cortex. We will review some of our and others' neurophysiological experimental data to lend support to these ideas
The Role of the Primary Visual Cortex in Higher Level Vision
, 1998
"... In the classical feedforward, modular view of visual processing, the primary visual cortex (area V1) is a module that serves to extract local features such as edges and bars. Representation and recognition of objects are thought to be functions of higher extrastriate cortical areas. This paper pres ..."
Abstract

Cited by 109 (6 self)
 Add to MetaCart
In the classical feedforward, modular view of visual processing, the primary visual cortex (area V1) is a module that serves to extract local features such as edges and bars. Representation and recognition of objects are thought to be functions of higher extrastriate cortical areas. This paper presents neurophysiological data that show the later part of V1 neurons' responses reflecting higher order perceptual computations related to Ullman's (Cognition 1984;18:97  159) visual routines and Marr's (Vision NJ: Freeman 1982) full primal sketch, 2 1 2 D sketch and 3D model. Based on theoretical reasoning and the experimental evidence, we propose a possible reinterpretation of the functional role of V1. In this framework, because of V1 neurons' precise encoding of orientation and spatial information, higher level perceptual computations and representations that involve high resolution details, fine geometry and spatial precision would necessarily involve V1 and be reflected in the later...
Local Learning in Probabilistic Networks With Hidden Variables
, 1995
"... Probabilistic networks, which provide compact descriptions of complex stochastic relationships among several random variables, are rapidly becoming the tool of choice for uncertain reasoning in artificial intelligence. We show that networks with fixed structure containing hidden variables can be lea ..."
Abstract

Cited by 77 (4 self)
 Add to MetaCart
Probabilistic networks, which provide compact descriptions of complex stochastic relationships among several random variables, are rapidly becoming the tool of choice for uncertain reasoning in artificial intelligence. We show that networks with fixed structure containing hidden variables can be learned automatically from data using a gradientdescent mechanism similar to that used in neural networks. We also extend the method to networks with intensionally represented distributions, including networks with continuous variables and dynamic probabilistic networks. Because probabilistic networks provide explicit representations of causal structure, human experts can easily contribute prior knowledge to the training process, thereby significantly improving the learning rate. Adaptive probabilistic networks (APNs) may soon compete directly with neural networks as models in computational neuroscience as well as in industrial and financial applications. 1 Introduction Intelligent systems, ...
Recognizing handwritten digits using mixtures of linear models
 Advances in Neural Information Processing Systems 7
, 1995
"... We construct a mixture of locally linear generative models of a collection of pixelbased images of digits, and use them for recognition. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their loglikelihoods under each mo ..."
Abstract

Cited by 56 (6 self)
 Add to MetaCart
We construct a mixture of locally linear generative models of a collection of pixelbased images of digits, and use them for recognition. Different models of a given digit are used to capture different styles of writing, and new images are classified by evaluating their loglikelihoods under each model. We use an EMbased algorithm in which the Mstep is computationally straightforward principal components analysis (PCA). Incorporating tangentplane information [12] about expected local deformations only requires adding tangent vectors into the sample covariance matrices for the PCA, and it demonstrably improves performance. 1
Developments in Probabilistic Modelling with Neural Networks  Ensemble Learning
, 1995
"... Ensemble learning by variational free energy minimization is a framework for statistical inference in which an ensemble of parameter vectors is optimized rather than a single parameter vector. The ensemble approximates the posterior probability distribution of the parameters. In this paper I give a ..."
Abstract

Cited by 49 (5 self)
 Add to MetaCart
Ensemble learning by variational free energy minimization is a framework for statistical inference in which an ensemble of parameter vectors is optimized rather than a single parameter vector. The ensemble approximates the posterior probability distribution of the parameters. In this paper I give a review of ensemble learning using a simple example. 1 Ensemble Learning by Free Energy Minimization A new tool has recently been introduced into the field of neural networks. In traditional approaches to model fitting, a single parameter vector w is optimized by, say, maximum likelihood or penalized maximum likelihood; in the Bayesian interpretation, these optimized parameters are viewed as defining the mode of a posterior probability distribution P (wjD; H) (given data D and model assumptions H). The new concept introduced by Hinton and van Camp (1993) is to work in terms of an approximating ensemble Q(w; `), that is, a probability distribution over the parameters, and optimize the ensemb...
Ensemble Learning and Evidence Maximization
 Proc. NIPS
, 1995
"... Ensemble learning by variational free energy minimization is a tool introduced to neural networks by Hinton and van Camp in which learning is described in terms of the optimization of an ensemble of parameter vectors. The optimized ensemble is an approximation to the posterior probability distributi ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
Ensemble learning by variational free energy minimization is a tool introduced to neural networks by Hinton and van Camp in which learning is described in terms of the optimization of an ensemble of parameter vectors. The optimized ensemble is an approximation to the posterior probability distribution of the parameters. This tool has now been applied to a variety of statistical inference problems. In this paper I study a linear regression model with both parameters and hyperparameters. I demonstrate that the evidence approximation for the optimization of regularization constants can be derived in detail from a free energy minimization viewpoint. 1 Ensemble Learning by Free Energy Minimization A new tool has recently been introduced into the field of neural networks and statistical inference. In traditional approaches to neural networks, a single parameter vector w is optimized by maximum likelihood or penalized maximum likelihood. In the Bayesian interpretation, these optimized param...
ACh, uncertainty, and cortical inference
 Advances in Neural Information Processing Systems 14:189–196
, 2002
"... Acetylcholine (ACh) has been implicated in a wide variety of tasks involving attentional processes and plasticity. Following extensive animal studies, it has previously been suggested that ACh reports on uncertainty and controls hippocampal, cortical and corticoamygdalar plasticity. We extend this ..."
Abstract

Cited by 6 (6 self)
 Add to MetaCart
Acetylcholine (ACh) has been implicated in a wide variety of tasks involving attentional processes and plasticity. Following extensive animal studies, it has previously been suggested that ACh reports on uncertainty and controls hippocampal, cortical and corticoamygdalar plasticity. We extend this view and consider its effects on cortical representational inference, arguing that ACh controls the balance between bottomup inference, influenced by input stimuli, and topdown inference, influenced by contextual information. We illustrate our proposal using a hierarchical hidden Markov model. 1
Recognition Networks for Approximate Inference in BN2O Networks
 In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence
, 2001
"... A recognition network is a multilayer perception (MLP) trained to predict posterior marginals given observed evidence in a particular Bayesian network. The input to the MLP is a vector of the states of the evidential nodes. The activity of an output unit is interpreted as a prediction of the posteri ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
A recognition network is a multilayer perception (MLP) trained to predict posterior marginals given observed evidence in a particular Bayesian network. The input to the MLP is a vector of the states of the evidential nodes. The activity of an output unit is interpreted as a prediction of the posterior marginal of the corresponding variable. The MLP is trained using samples generated from the corresponding Bayesian network.
Free Energy, Value, and Attractors
, 2012
"... It has been suggested recently that action and perception can be understood as minimising the free energy of sensory samples. This ensures that agents sample the environment to maximise the evidence for their model of the world, such that exchanges with the environment are predictable and adaptive ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
It has been suggested recently that action and perception can be understood as minimising the free energy of sensory samples. This ensures that agents sample the environment to maximise the evidence for their model of the world, such that exchanges with the environment are predictable and adaptive. However, the free energy account does not invoke reward or costfunctions from reinforcementlearning and optimal control theory. We therefore ask whether reward is necessary to explain adaptive behaviour. The free energy formulation uses ideas from statistical physics to explain action in terms of minimising sensory surprise. Conversely, reinforcementlearning has its roots in behaviourism and engineering and assumes that agents optimise a policy to maximise future reward. This paper tries to connect the two formulations and concludes that optimal policies correspond to empirical priors on the trajectories of hidden environmental states, which compel agents to seek out the (valuable) states they expect to encounter.
ATTRACTORS IN SONG
, 2009
"... This paper summarizes our recent attempts to integrate action and perception within a single optimization framework. We start with a statistical formulation of Helmholtz’s ideas about neural energy to furnish a model of perceptual inference and learning that can explain a remarkable range of neurobi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper summarizes our recent attempts to integrate action and perception within a single optimization framework. We start with a statistical formulation of Helmholtz’s ideas about neural energy to furnish a model of perceptual inference and learning that can explain a remarkable range of neurobiological facts. Using constructs from statistical physics it can be shown that the problems of inferring the causes of our sensory inputs and learning regularities in the sensorium can be resolved using exactly the same principles. Furthermore, inference and learning can proceed in a biologically plausible fashion. The ensuing scheme rests on Empirical Bayes and hierarchical models of how sensory information is generated. The use of hierarchical models enables the brain to construct prior expectations in a dynamic and contextsensitive fashion. This scheme provides a principled way to understand many aspects of the brain’s organization and responses. We will demonstrate the brainlike dynamics that this scheme entails by using models of bird songs that are based on chaotic attractors with autonomous dynamics. This provides a nice example of how nonlinear dynamics can be exploited by the brain to represent and predict dynamics in the environment.