Results 1  10
of
11
Game Theory, Maximum Entropy, Minimum Discrepancy And Robust Bayesian Decision Theory
 ANNALS OF STATISTICS
, 2004
"... ..."
Competitive online statistics
 International Statistical Review
, 1999
"... A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential sta ..."
Abstract

Cited by 63 (10 self)
 Add to MetaCart
A radically new approach to statistical modelling, which combines mathematical techniques of Bayesian statistics with the philosophy of the theory of competitive online algorithms, has arisen over the last decade in computer science (to a large degree, under the influence of Dawid’s prequential statistics). In this approach, which we call “competitive online statistics”, it is not assumed that data are generated by some stochastic mechanism; the bounds derived for the performance of competitive online statistical procedures are guaranteed to hold (and not just hold with high probability or on the average). This paper reviews some results in this area; the new material in it includes the proofs for the performance of the Aggregating Algorithm in the problem of linear regression with square loss. Keywords: Bayes’s rule, competitive online algorithms, linear regression, prequential statistics, worstcase analysis.
Mutual information, Fisher information and population coding
 Neural Computation
, 1998
"... In the context of parameter estimation and model selection, it is only quite recently that a direct link between the Fisher information and information theoretic quantities has been exhibited. We give an interpretation of this link within the standard framework of information theory. We show that in ..."
Abstract

Cited by 61 (3 self)
 Add to MetaCart
In the context of parameter estimation and model selection, it is only quite recently that a direct link between the Fisher information and information theoretic quantities has been exhibited. We give an interpretation of this link within the standard framework of information theory. We show that in the context of population coding, the mutual information between the activity of a large array of neurons and a stimulus to which the neurons are tuned is naturally related to the Fisher information. In the light of this result we consider the optimization of the tuning curves parameters in the case of neurons responding to a stimulus represented by an angular variable. To appear in Neural Computation Vol. 10, Issue 7, published by the MIT press. 1 Laboratory associated with C.N.R.S. (U.R.A. 1306), ENS, and Universities Paris VI and Paris VII 1 Introduction A natural framework to study how neurons communicate, or transmit information, in the nervous system is information theory (see e...
An empirical study of minimum description length model selection with infinite parametric complexity
 JOURNAL OF MATHEMATICAL PSYCHOLOGY
, 2006
"... Parametric complexity is a central concept in Minimum Description Length (MDL) model selection. In practice it often turns out to be infinite, even for quite simple models such as the Poisson and Geometric families. In such cases, MDL model selection as based on NML and Bayesian inference based on J ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Parametric complexity is a central concept in Minimum Description Length (MDL) model selection. In practice it often turns out to be infinite, even for quite simple models such as the Poisson and Geometric families. In such cases, MDL model selection as based on NML and Bayesian inference based on Jeffreys ’ prior can not be used. Several ways to resolve this problem have been proposed. We conduct experiments to compare and evaluate their behaviour on small sample sizes. We find interestingly poor behaviour for the plugin predictive code; a restricted NML model performs quite well but it is questionable if the results validate its theoretical motivation. A Bayesian marginal distribution with Jeffreys’ prior can still be used if one sacrifices the first observation to make a proper posterior; this approach turns out to be most dependable.
An empirical study of MDL model selection with infinite parametric complexity
 J. Mathematical Psychology
, 2006
"... Parametric complexity is a central concept in MDL model selection. In practice it often turns out to be infinite, even for quite simple models such as the Poisson and Geometric families. In such cases, MDL model selection as based on NML and Bayesian inference based on Jeffreys ’ prior can not be us ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Parametric complexity is a central concept in MDL model selection. In practice it often turns out to be infinite, even for quite simple models such as the Poisson and Geometric families. In such cases, MDL model selection as based on NML and Bayesian inference based on Jeffreys ’ prior can not be used. Several ways to resolve this problem have been proposed. We conduct experiments to compare and evaluate their behaviour on small sample sizes. We find interestingly poor behaviour for the plugin predictive code; a restricted NML model performs quite well but it is questionable if the results validate its theoretical motivation. The Bayesian model with the improper Jeffreys ’ prior is the most dependable. 1
Universal and composite hypothesis testing via mismatched divergence
 IEEE Trans. Inf. Theory
"... ..."
Online Prediction with Experts under a Logscoring Rule  Online Expert Prediction
"... F13.39> (x) = p(xj) is a stochastic process: This means that if we write X = X n = (X 1 ; :::; X n ) to mean the random variable with outcomes x = x n = (x 1 ; :::; x n ) then the density p(x n j) for n is the result of integrating p(x n+1 j) over x n+1 . Write J() to mean the Jereys prio ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
F13.39> (x) = p(xj) is a stochastic process: This means that if we write X = X n = (X 1 ; :::; X n ) to mean the random variable with outcomes x = x n = (x 1 ; :::; x n ) then the density p(x n j) for n is the result of integrating p(x n+1 j) over x n+1 . Write J() to mean the Jereys prior on the parameter space, assuming it exists, and let w be another prior density on the parameter space. We use J() as a dominating measure for other priors unless stated otherwise. Denote by <F13
Information Optimality and Bayesian Modeling a ∗
"... The general approach of treating a statistical problem as one of information processing led to the Bayesian method of moments, reference priors, minimal information likelihoods, and stochastic complexity. These techniques rest on quantities that have physical itnerpretations from information theory. ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
The general approach of treating a statistical problem as one of information processing led to the Bayesian method of moments, reference priors, minimal information likelihoods, and stochastic complexity. These techniques rest on quantities that have physical itnerpretations from information theory. Current work includes: the role of prediction, the emergence of data dependent priors, the role of information measures in model selection, and the use of conditional mutual information to incorporate partial information. Key words: entropy, Bayesian method of moments, reference priors, stochastic complexity, data dependent priors
Discussion of the Papers by Rissanen, and by Wallace and Dowe
"... to Yang and Barron [1], and earlier work due to Barron and Cover [2], Bethel and Shumway [3], are efforts to provide general results for collections of classes that are recognized to have common propertiestypically the dependence of the penalty term on n, the sample size. In particular, one may ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
to Yang and Barron [1], and earlier work due to Barron and Cover [2], Bethel and Shumway [3], are efforts to provide general results for collections of classes that are recognized to have common propertiestypically the dependence of the penalty term on n, the sample size. In particular, one may consider an Akaike information criterion or AICclass, see [4], of MSPs which contains the AIC and its equivalent formulations such as Mallow's C p and crossvalidation, see [5]. Equivalently, the AIC class of MSPs can be defined as the MSPs that satisfy the same optimality criterion from prediction as the AIC itself does, see [6, 7]. Second, one may consider a Bayes information criterion, or BICclass of MSPs which contains the BIC and other equivalent formulations such as the posterior quantities it approximates, see [8]. Perhaps the optimality criterion satisfied by the BIC, see [9], or the posterior probabilities, can be used to define this class. In some cases, the MDL is similar to
Partial Information Reference Priors
, 2000
"... Suppose X 1 ; : : : ; X n are IID p(j; ) where (; ) 2 IR d is distributed according to the prior density w(). For estimators S n = S(X n ) and T n = T (X n ) assumed to be consistent for some function of and asymptotically normal we examine the conditional Shannon mutual information (CSMI) be ..."
Abstract
 Add to MetaCart
Suppose X 1 ; : : : ; X n are IID p(j; ) where (; ) 2 IR d is distributed according to the prior density w(). For estimators S n = S(X n ) and T n = T (X n ) assumed to be consistent for some function of and asymptotically normal we examine the conditional Shannon mutual information (CSMI) between and T n given and S n , I(; T n j ; S n ). It is seen there are several important special cases of this CSMI. We establish an asymptotic formula for it and identify the resulting noninformative reference prior. As a consequence, we develop the notion of data dependent priors and a calibration for how close an estimator is to suciency. 1 x1