Results 1  10
of
36
Informationtheoretic asymptotics of Bayes methods
 IEEE Transactions on Information Theory
, 1990
"... AbstractIn the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian densit ..."
Abstract

Cited by 107 (10 self)
 Add to MetaCart
AbstractIn the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian density and show that the asymptotic distance is (d/2Xlogn)+ c, where d is the dimension of the parameter vector. Therefore, the relative entropy rate D,,/n converges to zero at rate (logn)/n. The constant c, which we explicitly identify, depends only on the prior density function and the Fisher information matrix evaluated at the true parameter value. Consequences are given for density estimation, universal data compression, composite hypothesis testing, and stockmarket portfolio selection. 1.
Semisupervised learning of mixture models
 In Proc of the 20th Int’l Conf. on Machine Learning
, 2003
"... This paper analyzes the performance of semisupervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. We present a mathematical analysis of this “degradation ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
This paper analyzes the performance of semisupervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. We present a mathematical analysis of this “degradation ” phenomenon and show that it is due to the fact that bias may be adversely affected by unlabeled data. We discuss the impact of these theoretical results to practical situations. 1.
SemiSupervised Learning of Mixture Models and Bayesian Networks
 Networks, Proceedings of the Twentieth International Conference of Machine Learning
, 2003
"... This paper analyzes the performance of semisupervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. This behavior contradicts several empirical results repo ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
This paper analyzes the performance of semisupervised learning of mixture models. We show that unlabeled data can lead to an increase in classification error even in situations where additional labeled data would decrease classification error. This behavior contradicts several empirical results reported in the literature. We present a mathematical analysis of this "degradation" phenomenon and show that it is due to the fact that bias may be adversely affected by unlabeled data.
Consistency issues in Bayesian Nonparametrics
 IN ASYMPTOTICS, NONPARAMETRICS AND TIME SERIES: A TRIBUTE
, 1998
"... ..."
Philosophy and the practice of Bayesian statistics
, 2010
"... A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics. We argue that the most successful forms of Bayesian statistics do not actually support that particular philosophy but rather accord much better with sophisticated forms of hypotheticodeductivism. We examine the actual role played by prior distributions in Bayesian models, and the crucial aspects of model checking and model revision, which fall outside the scope of Bayesian confirmation theory. We draw on the literature on the consistency of Bayesian updating and also on our experience of applied work in social science. Clarity about these matters should benefit not just philosophy of science, but also statistical practice. At best, the inductivist view has encouraged researchers to fit and compare models without checking them; at worst, theorists have actively discouraged practitioners from performing model checking because it does not fit into their framework.
DYNAMICS OF BAYESIAN UPDATING WITH DEPENDENT DATA AND MISSPECIFIED MODELS
, 2009
"... Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data being independent and identically distributed or Markovian. Here I establish sufficient conditions for the convergence of the posterior distribution in nonparametric problems even when all of the hypotheses are wrong, and the datagenerating process has a complicated dependence structure. The main dynamical assumption is the generalized asymptotic equipartition (or “ShannonMcMillanBreiman”) property of information theory. I derive a kind of large deviations principle for the posterior measure, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between the present results and the “replicator dynamics” of evolutionary theory.
Fragility of Asymptotic Agreement under Bayesian Learning ∗
, 2009
"... Under the assumption that individuals know the conditional distributions of signals given the payoffrelevant parameters, existing results conclude that as individuals observe infinitely many signals, their beliefs about the parameters will eventually merge. We first show that these results are frag ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Under the assumption that individuals know the conditional distributions of signals given the payoffrelevant parameters, existing results conclude that as individuals observe infinitely many signals, their beliefs about the parameters will eventually merge. We first show that these results are fragile when individuals are uncertain about the signal distributions: given any such model, vanishingly small individual uncertainty about the signal distributions can lead to substantial (nonvanishing) differences in asymptotic beliefs. Under a uniform convergence assumption, we then characterize the conditions under which a small amount of uncertainty leads to significant asymptotic disagreement.
Misspecification in infinitedimensional Bayesian statistics
 Annals of Statistics
, 2006
"... We consider the asymptotic behavior of posterior distributions if the model is misspecified. Given a prior distribution and a random sample from a distribution P0, which may not be in the support of the prior, we show that the posterior concentrates its mass near the points in the support of the pri ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
We consider the asymptotic behavior of posterior distributions if the model is misspecified. Given a prior distribution and a random sample from a distribution P0, which may not be in the support of the prior, we show that the posterior concentrates its mass near the points in the support of the prior that minimize the Kullback–Leibler divergence with respect to P0. An entropy condition and a priormass condition determine the rate of convergence. The method is applied to several examples, with special interest for infinitedimensional models. These include Gaussian mixtures, nonparametric regression and parametric models.
Comparing Bayes model averaging and stacking when model approximation error cannot be ignored
 Journal of Machine Learning Research
, 2003
"... We compare Bayes Model Averaging, BMA, to a nonBayes form of model averaging called stacking. In stacking, the weights are no longer posterior probabilities of models; they are obtained by a technique based on crossvalidation. When the correct data generating model (DGM) is on the list of models u ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We compare Bayes Model Averaging, BMA, to a nonBayes form of model averaging called stacking. In stacking, the weights are no longer posterior probabilities of models; they are obtained by a technique based on crossvalidation. When the correct data generating model (DGM) is on the list of models under consideration BMA is never worse than stacking and often is demonstrably better, provided that the noise level is of order commensurate with the coefficients and explanatory variables. Here, however, we focus on the case that the correct DGM is not on the model list and may not be well approximated by the elements on the model list. We give a sequence of computed examples by choosing model lists and DGM’s to contrast the risk performance of stacking and BMA. In the first examples, the model lists are chosen to reflect geometric principles that should give good performance. In these cases, stacking typically outperforms BMA, sometimes by a wide margin. In the second set of examples we examine how stacking and BMA perform when the model list includes all subsets of a set of potential predictors. When we standardize the size of terms and coefficients in this setting, we find that BMA outperforms stacking when the deviant terms in the DGM ‘point ’ in directions accommodated by the model list but that when the deviant term points outside the model list stacking seems to do better. Overall, our results suggest the stacking has better robustness properties than BMA in the most important settings.
Semisupervised Learning Of Classifiers With Application To HumanComputer Interaction
 Born, Max, Einstein’s Theory of Relativity
, 2003
"... With the growing use of computers and computing objects in the design of many of the day to day tools that humans use, humancomputer intelligent interaction is seen as a necessary step for the ability to make computers better aid the human user. There are many tasks involved in designing good inter ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
With the growing use of computers and computing objects in the design of many of the day to day tools that humans use, humancomputer intelligent interaction is seen as a necessary step for the ability to make computers better aid the human user. There are many tasks involved in designing good interaction between humans and machines. One basic task, related to many such applications, is automatic classification by the machine. Designing a classifier can be done by domain experts or by learning from training data. Training data can be labeled to the different classes or unlabeled. In this work I focus on training probabilistic classifiers with labeled and unlabeled data. I show under what conditions unlabeled data can be used to improve classification performance. I also show that it often occurs that if the conditions are violated, using unlabeled data can be detrimental to the classification performance. I discuss the implications of this analysis when learning a specific type of probabilistic classifiers, namely Bayesian networks, and propose structure learning algorithms that can potentially utilize unlabeled data to improve classification. I show how the theory and algorithms are successfully applied in two applications related to humancomputer interaction: facial expression recognition and face detection.