Results 1  10
of
20
Informationtheoretic asymptotics of Bayes methods
 IEEE Transactions on Information Theory
, 1990
"... AbstractIn the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian densit ..."
Abstract

Cited by 107 (10 self)
 Add to MetaCart
AbstractIn the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian density and show that the asymptotic distance is (d/2Xlogn)+ c, where d is the dimension of the parameter vector. Therefore, the relative entropy rate D,,/n converges to zero at rate (logn)/n. The constant c, which we explicitly identify, depends only on the prior density function and the Fisher information matrix evaluated at the true parameter value. Consequences are given for density estimation, universal data compression, composite hypothesis testing, and stockmarket portfolio selection. 1.
Decisionmetrics: a decisionbased approach to econometric modelling
 Journal of Econometrics
, 2007
"... In many applications it is necessary to use a simple and therefore highly misspecified econometric model as the basis for decisionmaking. We propose an approach to developing a possibly misspecified econometric model that will be used as the beliefs of an objective expected utility maximiser. A dis ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
In many applications it is necessary to use a simple and therefore highly misspecified econometric model as the basis for decisionmaking. We propose an approach to developing a possibly misspecified econometric model that will be used as the beliefs of an objective expected utility maximiser. A discrepancy between model and ‘truth ’ is introduced that is interpretable as a measure of the model’s value for this decisionmaker. Our decisionbased approach utilises this discrepancy in estimation, selection, inference and evaluation of parametric or semiparametric models. The methods proposed nest quasilikelihood methods as a special case that arises when model value is measured by the KullbackLeibler information discrepancy and also provide an econometric approach for developing parametric decision rules (e.g. technical trading rules) with desirable properties. The approach is illustrated and applied in the context of a CARA investor’s decision problem for which analytical, simulation and empirical results suggest it is very effective.
Applications of Lindley Information Measure to the Design of Clinical Experiments
 Aspects of Uncertainty
, 1994
"... this paper we consider applications of Lindley information measure to the design of clinical experiments. We review the decision theoretic foundations underlying the use of Lindley information, and discuss its role in constructing utility functions suitable for clinical applications. We derive and i ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
this paper we consider applications of Lindley information measure to the design of clinical experiments. We review the decision theoretic foundations underlying the use of Lindley information, and discuss its role in constructing utility functions suitable for clinical applications. We derive and interpret general firstorder conditions for the optimality of a design. We discuss examples: choosing the optimal fixed sample size of a clinical trial, and choosing the optimal followup time for patients in a survival analysis. We give special attention to the design of multicenter clinical trials. Research of D. A. Berry supported in part by the US Public Health Service under grant HS 0647501. Research of Giovanni Parmigiani and ISDS computing environment supported in part by NSF under grant DMS9305699. We are thankful to Chengchang Li, Peter Muller, Saurabh Mukhopadhyay and Dalene Stangl for helpful discussions. 1. INTRODUCTION From the point of view of decision making, information is anything that enables us to make a better decision, that is a decision with a higher expected utility. For example, an experiment that, irrespective of the outcome, will lead to the same decision that we would make prior to observing it, has no information content. Conversely, experiments able to lead to different decision are potentially of benefit. The expected change in utility can actually be used as a quantitative measure of the worth of an experiment in any given situation. This idea is about as old as Bayesian statistics (see Ramsey, 1990) and is discussed by Raiffa and Schlaifer (1961) and DeGroot (1984). The well known measure of information proposed by Lindley (1956) is the object of investigation in this paper. It can be seen as a very important special case of this general ap...
Asymptotic Redundancies for Universal Quantum Coding
"... Clarke and Barron have recently shown that the Jereys' invariant prior of Bayesian theory yields the common asymptotic (minimax and maximin) redundancy of universal data compression in a parametric setting. We seek a possible analogue of this result for the twolevel quantum systems. We restrict ou ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Clarke and Barron have recently shown that the Jereys' invariant prior of Bayesian theory yields the common asymptotic (minimax and maximin) redundancy of universal data compression in a parametric setting. We seek a possible analogue of this result for the twolevel quantum systems. We restrict our considerations to prior probability distributions belonging to a certain oneparameter family, qu , 1 < u < 1. Within this setting, we are able to compute exact redundancy formulas, for which we nd the asymptotic limits. We compare our quantum asymptotic redundancy formulas to those derived by naively applying the (nonquantum) counterparts of Clarke and Barron, and nd certain common features. Our results are based on formulas we obtain for the eigenvalues and eigenvectors of 2 n 2 n (Bayesian density) matrices, n (u). These matrices are the weighted averages (with respect to qu) of all possible tensor products of n identical 2 2 density matrices, representing the twolevel quantum systems. We propose a form of universal coding for the situation in which the density matrix describing an ensemble of quantum signal states is unknown. A sequence of n signals would be projected onto the dominant eigenspaces of n (u).
Information geometry, Bayesian inference, ideal estimates and error decomposition
, 1998
"... In statistics it is necessary to study the relation among many probability distributions. Information geometry elucidates the geometric structure on the space of all distributions. When combined with Bayesian decision theory, it leads to the new concept of “ideal estimates”. They uniquely exist in t ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
In statistics it is necessary to study the relation among many probability distributions. Information geometry elucidates the geometric structure on the space of all distributions. When combined with Bayesian decision theory, it leads to the new concept of “ideal estimates”. They uniquely exist in the space of finite measures, and are generally sufficient statistic. The optimal estimate on any model is given by projecting the ideal estimate onto that model. An error decomposition theorem splits the error of an estimate into the sum of statistical error and approximation error. They can be expanded to yield higher order asymptotics. Furthermore, the ideal estimates under certain uniform priors, invariantly defined in information geometry, corresponds to various optimal nonBayesian estimates, such as the MLE.
Improved minimax predictive densities under Kullback–Leibler loss
 Ann. Statist
, 2006
"... Let Xµ ∼ Np(µ, vxI)and Y µ ∼ Np(µ, vyI)be independent pdimensional multivariate normal vectors with common unknown mean µ. Based on only observing X = x, we consider the problem of obtaining a predictive density ˆp(yx) for Y that is close to p(yµ) as measured by expected Kullback–Leibler loss. ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Let Xµ ∼ Np(µ, vxI)and Y µ ∼ Np(µ, vyI)be independent pdimensional multivariate normal vectors with common unknown mean µ. Based on only observing X = x, we consider the problem of obtaining a predictive density ˆp(yx) for Y that is close to p(yµ) as measured by expected Kullback–Leibler loss. A natural procedure for this problem is the (formal) Bayes predictive density ˆpU(yx) under the uniform prior πU(µ) ≡ 1, which is best invariant and minimax. We show that any Bayes predictive density will be minimax if it is obtained by a prior yielding a marginal that is superharmonic or whose square root is superharmonic. This yields wide classes of minimax procedures that dominate ˆpU(yx), including Bayes predictive densities under superharmonic priors. Fundamental similarities and differences with the parallel theory of estimating a multivariate normal mean under quadratic loss are described. 1. Introduction. Let Xµ ∼ Np(µ, vxI) and Y µ ∼ Np(µ, vyI) be independent pdimensional multivariate normal vectors with common unknown mean µ,
Combining model selection procedures for online prediction
 Sankhya A
, 2001
"... SUMMARY. Here we give a technique for online prediction that uses different model selection principles (MSP’s) at different times. The central idea is that each MSP is associated with a collection of models for which it is best suited. This means one can use the data to choose an MSP. Then, the MSP ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
SUMMARY. Here we give a technique for online prediction that uses different model selection principles (MSP’s) at different times. The central idea is that each MSP is associated with a collection of models for which it is best suited. This means one can use the data to choose an MSP. Then, the MSP chosen is used with the data to choose a model, and the parameters of the model are estimated so that predictions can be made. Depending on the degree of discrepancy between the predicted values and the actual outcomes one may update the parameters within a model, reuse the MSP to rechoose the model and estimate its parameters, or start all over again rechoosing the MSP. Our main formal result is a theorem which gives conditions under which our technique performs better than always using the same MSP. We also discuss circumstances under which dropping data points may lead to better predictions. 1.
Predictive Inference, Rare Events And Hierarchical Models
, 1997
"... this paper have implicity assumed a single homogeneous sample. However, they are also applicable in multisample problems, in which the parameters of the model are possibly different from one sample to another. Such problems lead to what are usually called empirical Bayes methods of analysis. In rec ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
this paper have implicity assumed a single homogeneous sample. However, they are also applicable in multisample problems, in which the parameters of the model are possibly different from one sample to another. Such problems lead to what are usually called empirical Bayes methods of analysis. In recent years it has become more common to solve such problems from a fully Bayesian point of view, using a hierarchical model structure to link together the parameters of the different subsamples. This is the point of view taken, for example, in the excellent recent monograph by Carlin and Louis (1996). Despite the very rapid growth of this field, there has been comparatively little study of the frequentist properties of Bayesian procedures in this setting. Berger and Strawderman (1996) established some admissibility results, which have the advantage of not relying on any kind of asymptotics, and which provide guidance on the choice of prior particularly where improper priors are concerned. On the other hand, the class of models to which their results apply is restrictive, and admissibility results do not necessarily help to pick out a prior distribution which has good properties under particular conditions. In contrast, the results of the present paper are asymptotic (letting sample size n !1 while the number of samples remains fixed) but they do allow explicit computations to be made under a veriety of circumstances. In the present section, these ideas are worked out in some detail for the simplest problem in this class: the case of p normal distributions with unknown means and known common variance. In the next section, a more complicated example is considered. Suppose there are p subgroups and the data in the j'th subgroup follow a N(` j ; 1) distribution. Here the vector ...
On Bayesian Learning of Sparse Classifiers
, 2002
"... Figueiredo (2001) and Figueiredo and Jain (2001) described a particular sparsenessinducing Bayesian model for probit regression. For several standard datasets, they reported... ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Figueiredo (2001) and Figueiredo and Jain (2001) described a particular sparsenessinducing Bayesian model for probit regression. For several standard datasets, they reported...