Results 11  20
of
30
When did Bayesian inference become “Bayesian"?
 BAYESIAN ANALYSIS
, 2006
"... While Bayes’ theorem has a 250year history, and the method of inverse probability that flowed from it dominated statistical thinking into the twentieth century, the adjective “Bayesian” was not part of the statistical lexicon until relatively recently. This paper provides an overview of key Bayesi ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
While Bayes’ theorem has a 250year history, and the method of inverse probability that flowed from it dominated statistical thinking into the twentieth century, the adjective “Bayesian” was not part of the statistical lexicon until relatively recently. This paper provides an overview of key Bayesian developments, beginning with Bayes’ posthumously published 1763 paper and continuing up through approximately 1970, including the period of time when “Bayesian” emerged as the label of choice for those who advocated Bayesian methods.
Bayesian decoding of brain images
, 2008
"... This paper introduces a multivariate Bayesian (MVB) scheme to decode or recognise brain states from neuroimages. It resolves the illposed manytoone mapping, from voxel values or data features to a target variable, using a parametric empirical or hierarchical Bayesian model. This model is inverted ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
This paper introduces a multivariate Bayesian (MVB) scheme to decode or recognise brain states from neuroimages. It resolves the illposed manytoone mapping, from voxel values or data features to a target variable, using a parametric empirical or hierarchical Bayesian model. This model is inverted using standard variational techniques, in this case expectation maximisation, to furnish the model evidence and the conditional density of the model’s parameters. This allows one to compare different models or hypotheses about the mapping from functional or structural anatomy to perceptual and behavioural consequences (or their deficits). We frame this approach in terms of decoding measured brain states to predict or classify outcomes using the rhetoric established in pattern classification of neuroimaging data. However, the aim of MVB is not to predict (because the outcomes are known) but to enable inference on different models of structure– function mappings; such as distributed and sparse representations. This allows
Freeenergy and the brain
, 2007
"... If one formulates Helmholtz’s ideas about perception in terms of modernday theories one arrives at a model of perceptual inference and learning that can explain a remarkable range of neurobiological facts. Using constructs from statistical physics it can be shown that the problems of inferring wh ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
If one formulates Helmholtz’s ideas about perception in terms of modernday theories one arrives at a model of perceptual inference and learning that can explain a remarkable range of neurobiological facts. Using constructs from statistical physics it can be shown that the problems of inferring what cause our sensory inputs and learning causal regularities in the sensorium can be resolved using exactly the same principles. Furthermore, inference and learning can proceed in a biologically plausible fashion. The ensuing scheme rests on Empirical Bayes and hierarchical models of how sensory information is generated. The use of hierarchical models enables the brain to construct prior expectations in a dynamic and contextsensitive fashion. This scheme provides a principled way to understand many aspects of the brain’s organisation and responses. In this paper, we suggest that these perceptual processes are just one emergent property of systems that conform to a freeenergy principle. The freeenergy considered here represents a bound on the surprise inherent in any exchange with the environment, under expectations encoded by its state or configuration. A system can minimise freeenergy by changing its configuration to change the way it samples the environment, or to change its expectations. These changes correspond to action and perception, respectively, and lead to an adaptive exchange with the environment that is characteristic of biological systems. This treatment implies that the system’s state and structure encode an implicit and probabilistic model of the environment. We will look at models entailed by the brain and how minimisation of freeenergy can explain its dynamics and structure.
Variational Bayes Solution of Linear Neural Networks and its Generalization Performance
, 2007
"... It is wellknown that, in unidentifiable models, the Bayes estimation provides much better generalization performance than the maximum likelihood (ML) estimation. However, its accurate approximation by Markov chain Monte Carlo methods requires huge computational costs. As an alternative, a tractable ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
It is wellknown that, in unidentifiable models, the Bayes estimation provides much better generalization performance than the maximum likelihood (ML) estimation. However, its accurate approximation by Markov chain Monte Carlo methods requires huge computational costs. As an alternative, a tractable approximation method, called the variational Bayes (VB) approach, has recently been proposed and been attracting people’s attention. Its advantage over the expectation maximization (EM) algorithm, often used for realizing the ML estimation, has been experimentally shown in many applications, nevertheless, has not been theoretically shown yet. In this paper, through the analysis of the simplest unidentifiable models, we theoretically show some properties of the VB approach. We first prove that, in threelayer linear neural networks, the VB approach is asymptotically equivalent to a positivepart JamesStein type shrinkage estimation. Then, we theoretically clarify its free energy, generalization error, and training error. Comparing them with those of the ML estimation and of the Bayes estimation, we discuss the advantage of the VB approach. We also show that, unlike in the Bayes estimation, the free energy and the generalization error are less simply related with each other, and that, in typical cases, the VB free energy well approximates the Bayes one, while the VB generalization error significantly differs from the Bayes one.
Post hoc Bayesian model selection
"... This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal noncommercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or sel ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal noncommercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit:
On Learning Hierarchical Classifications
 In ResearchIndex; The NECI Scientifc Literature Digital Libraray [Online]. Available from: http:// citerseer. nj.nec /com / 38202. html [ Accessed 25
, 1997
"... Many significant realworld classification tasks involve a large number of categories which are arranged in a hierarchical structure; for example, classifying documents into subject categories under the library of congress scheme, or classifying worldwideweb documents into topic hierarchies. We in ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Many significant realworld classification tasks involve a large number of categories which are arranged in a hierarchical structure; for example, classifying documents into subject categories under the library of congress scheme, or classifying worldwideweb documents into topic hierarchies. We investigate the potential benefits of using a given hierarchy over base classes to learn accurate multicategory classifiers for these domains. First, we consider the possibility of exploiting a class hierarchy as prior knowledge that can help one learn a more accurate classifier. We explore the benefits of learning categorydiscriminants in a "hard" topdown fashion and compare this to a "soft" approach which shares training data among sibling categories. In doing so, we verify that hierarchies have the potential to improve prediction accuracy. But we argue that the reasons for this can be subtle. Sometimes, the improvement is only because using a hierarchy happens to constrain the expressiven...
GENERAL MAXIMUM LIKELIHOOD EMPIRICAL BAYES ESTIMATION OF NORMAL MEANS
, 908
"... We propose a general maximum likelihood empirical Bayes (GMLEB) method for the estimation of a mean vector based on observations with i.i.d. normal errors. We prove that under mild moment conditions on the unknown means, the average mean squared error (MSE) of the GMLEB is within an infinitesimal f ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
We propose a general maximum likelihood empirical Bayes (GMLEB) method for the estimation of a mean vector based on observations with i.i.d. normal errors. We prove that under mild moment conditions on the unknown means, the average mean squared error (MSE) of the GMLEB is within an infinitesimal fraction of the minimum average MSE among all separable estimators which use a single deterministic estimating function on individual observations, provided that the risk is of greater order than (log n) 5 /n. We also prove that the GMLEB is uniformly approximately minimax in regular and weak ℓp balls when the order of the lengthnormalized norm of the unknown means is between (log n) κ1 /n
Implicit Regularization in Variational Bayesian Matrix Factorization
"... Matrix factorization into the product of lowrank matrices induces nonidentifiability, i.e., the mapping between the target matrix and factorized matrices is not onetoone. In this paper, we theoretically investigate the influence of nonidentifiability on Bayesian matrix factorization. More specif ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Matrix factorization into the product of lowrank matrices induces nonidentifiability, i.e., the mapping between the target matrix and factorized matrices is not onetoone. In this paper, we theoretically investigate the influence of nonidentifiability on Bayesian matrix factorization. More specifically, we show that a variational Bayesian method involves regularization effect even when the prior is noninformative, which is intrinsically different from the maximum a posteriori approach. We also extend our analysis to empirical Bayes scenarios where hyperparameters are also learned from data. 1.
Sparse additive matrix factorization for robust PCA and its generalization
 In Proceedings of Fourth Asian Conference on Machine Learning
"... Principal component analysis (PCA) can be regarded as approximating a data matrix with a lowrank one by imposing sparsity on its singular values, and its robust variant further captures sparse noise. In this paper, we extend such sparse matrix learning methods, and propose a novel unified framework ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Principal component analysis (PCA) can be regarded as approximating a data matrix with a lowrank one by imposing sparsity on its singular values, and its robust variant further captures sparse noise. In this paper, we extend such sparse matrix learning methods, and propose a novel unified framework called sparse additive matrix factorization (SAMF). SAMF systematically induces various types of sparsity by the socalled modelinduced regularization in the Bayesian framework. We propose an iterative algorithm called the mean update (MU) for the variational Bayesian approximation to SAMF, which gives the global optimal solution for a large subset of parameters in each step. We demonstrate the usefulness of our method on artificial data and the foreground/background video separation.
Generalization Error of Linear Neural Networks in an Empirical Bayes Approach
 In Proc. of IJCAI
, 2005
"... It is well known that in unidentifiable models, the Bayes estimation has the advantage of generalization performance to the maximum likelihood estimation. However, accurate approximation of the posterior distribution requires huge computational costs. In this paper, we consider an empirical Bayes ap ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
It is well known that in unidentifiable models, the Bayes estimation has the advantage of generalization performance to the maximum likelihood estimation. However, accurate approximation of the posterior distribution requires huge computational costs. In this paper, we consider an empirical Bayes approach where a part of the parameters are regarded as hyperparameters, which we call a subspace Bayes approach, and theoretically analyze the generalization error of threelayer linear neural networks. We show that a subspace Bayes approach is asymptotically equivalent to a positivepart JamesStein type shrinkage estimation, and behaves similarly to the Bayes estimation in typical cases. 1