Results 1 
2 of
2
Informationtheoretic asymptotics of Bayes methods
 IEEE Transactions on Information Theory
, 1990
"... AbstractIn the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian densit ..."
Abstract

Cited by 107 (10 self)
 Add to MetaCart
AbstractIn the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian density and show that the asymptotic distance is (d/2Xlogn)+ c, where d is the dimension of the parameter vector. Therefore, the relative entropy rate D,,/n converges to zero at rate (logn)/n. The constant c, which we explicitly identify, depends only on the prior density function and the Fisher information matrix evaluated at the true parameter value. Consequences are given for density estimation, universal data compression, composite hypothesis testing, and stockmarket portfolio selection. 1.
Learning graphical models for hypothesis testing
 in Proc. 14th IEEE Statist. Signal Process. Workshop
, 2007
"... Abstract—Sparse graphical models have proven to be a flexible class of multivariate probability models for approximating highdimensional distributions. In this paper, we propose techniques to exploit this modeling ability for binary classification by discriminatively learning such models from label ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Abstract—Sparse graphical models have proven to be a flexible class of multivariate probability models for approximating highdimensional distributions. In this paper, we propose techniques to exploit this modeling ability for binary classification by discriminatively learning such models from labeled training data, i.e., using both positive and negative samples to optimize for the structures of the two models. We motivate why it is difficult to adapt existing generative methods, and propose an alternative method consisting of two parts. First, we develop a novel method to learn treestructured graphical models which optimizes an approximation of the loglikelihood ratio. We also formulate a joint objective to learn a nested sequence of optimal forestsstructured models. Second, we construct a classifier by using ideas from boosting to learn a set of discriminative trees. The final classifier can interpreted as a likelihood ratio test between two models with a larger set of pairwise features. We use crossvalidation to determine the optimal number of edges in the final model. The algorithm presented in this paper also provides a method to identify a subset of the edges that are most salient for discrimination. Experiments show that the proposed procedure outperforms generative methods such as Tree Augmented Naïve Bayes and ChowLiu as well as their boosted counterparts.