Results 1 - 10
of
32
Universal Prediction
- IEEE Transactions on Information Theory
, 1998
"... This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. ..."
Abstract
-
Cited by 99 (6 self)
- Add to MetaCart
This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression.
Information-Theoretic Determination of Minimax Rates of Convergence
- Ann. Stat
, 1997
"... In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain information-theoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence. ..."
Abstract
-
Cited by 67 (18 self)
- Add to MetaCart
In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain information-theoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence.
Game Theory, Maximum Entropy, Minimum Discrepancy And Robust Bayesian Decision Theory
- Annals of Statistics
, 2004
"... this paper appeared in the Proceedings of the 2002 IEEE Information Theory Workshop [see Grnwald and Dawid (2002)] ..."
Abstract
-
Cited by 53 (3 self)
- Add to MetaCart
this paper appeared in the Proceedings of the 2002 IEEE Information Theory Workshop [see Grnwald and Dawid (2002)]
Learning Bayesian Belief Networks Based on the Minimum Description Length Principle: Basic Properties
, 1996
"... This paper was partially presented at the 9th conference on Uncertainty in Artificial Intelligence, July 1993. ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
This paper was partially presented at the 9th conference on Uncertainty in Artificial Intelligence, July 1993.
A strong version of the redundancy-capacity theorem of universal coding
- IEEE Trans. Inform. Theory
, 1995
"... Abstract-The capacity of the channel induced by a given class of sources is well known to be an attainable lower bound on the redundancy of universal codes with respect to this class, both in the minimax sense and in the Bayesian (maximin) sense. We show that this capacity is essentially a lower bou ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
Abstract-The capacity of the channel induced by a given class of sources is well known to be an attainable lower bound on the redundancy of universal codes with respect to this class, both in the minimax sense and in the Bayesian (maximin) sense. We show that this capacity is essentially a lower bound also in a stronger sense, that is, for “most ” sources in the class. This result extends Rissanen’s lower bound for parametric families. We demonstrate the applicability of this result in several examples, e.g., parametric families with growing dimensionality, piecewise-fixed sources, arbitrarily varying sources, and noisy samples of learnable functions. Finally, we discuss implications of our results to statistical inference. Index Terms- Universal coding, minimax redundancy, mini-mum description length, channel capacity, arbitrarily varying sources, random coding; I.
Predicting a Binary Sequence Almost as Well as the Optimal Biased Coin
, 1996
"... We apply the exponential weight algorithm, introduced and Littlestone and Warmuth [17] and by Vovk [24] to the problem of predicting a binary sequence almost as well as the best biased coin. We first show that for the case of the logarithmic loss, the derived algorithm is equivalent to the Bayes alg ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
We apply the exponential weight algorithm, introduced and Littlestone and Warmuth [17] and by Vovk [24] to the problem of predicting a binary sequence almost as well as the best biased coin. We first show that for the case of the logarithmic loss, the derived algorithm is equivalent to the Bayes algorithm with Jeffrey's prior, that was studied by Xie and Barron under probabilistic assumptions [26]. We derive a uniform bound on the regret which holds for any sequence. We also show that if the empirical distribution of the sequence is bounded away from 0 and from 1, then, as the length of the sequence increases to infinity, the difference between this bound and a corresponding bound on the average case regret of the same algorithm (which is asymptotically optimal in that case) is only 1=2. We show that this gap of 1=2 is necessary by calculating the regret of the min-max optimal algorithm for this problem and showing that the asymptotic upper bound is tight. We also study the application...
On predictive distributions and Bayesian networks
- Statistics and Computing
, 2000
"... this paper we are interested in discrete prediction problems for a decision-theoretic setting, where the ..."
Abstract
-
Cited by 33 (24 self)
- Add to MetaCart
this paper we are interested in discrete prediction problems for a decision-theoretic setting, where the
Mutual Information, Metric Entropy, and Cumulative Relative Entropy Risk
- Annals of Statistics
, 1996
"... Assume fP ` : ` 2 \Thetag is a set of probability distributions with a common dominating measure on a complete separable metric space Y . A state ` 2 \Theta is chosen by Nature. A statistician gets n independent observations Y 1 ; : : : ; Y n from Y distributed according to P ` . For each time ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
Assume fP ` : ` 2 \Thetag is a set of probability distributions with a common dominating measure on a complete separable metric space Y . A state ` 2 \Theta is chosen by Nature. A statistician gets n independent observations Y 1 ; : : : ; Y n from Y distributed according to P ` . For each time t between 1 and n, based on the observations Y 1 ; : : : ; Y t\Gamma1 , the statistician produces an estimated distribution P t for P ` , and suffers a loss L(P ` ; P t ). The cumulative risk for the statistician is the average total loss up to time n. Of special interest in information theory, data compression, mathematical finance, computational learning theory and statistical mechanics is the special case when the loss L(P ` ; P t ) is the relative entropy between the true distribution P ` and the estimated distribution P t . Here the cumulative Bayes risk from time 1 to n is the mutual information between the random parameter \Theta and the observations Y 1 ; : : : ;...
Efficient Bayesian Parameter Estimation in Large Discrete Domains
- Advances in Neural Information Processing Systems
, 1999
"... In this paper we examine the problem of estimating the parameters of a multinomial distribution over a large number of discrete outcomes, most of which do not appear in the training data. We analyze this problem from a Bayesian perspective and develop a hierarchical prior that incorporates the assum ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
In this paper we examine the problem of estimating the parameters of a multinomial distribution over a large number of discrete outcomes, most of which do not appear in the training data. We analyze this problem from a Bayesian perspective and develop a hierarchical prior that incorporates the assumption that the observed outcomes constitute only a small subset of the possible outcomes. We show how to efficiently perform exact inference with this form of hierarchical prior and compare our method to standard approaches and demonstrate its merits. Category: Algorithms and Architectures Presentation preference: none This paper was not submitted elsewhere nor will be submitted during NIPS review period. 1 Introduction One of the most important problems in statistical inference is multinomialestimation: Given a past history of observations independent trials with a discrete set of outcomes, predict the probability of the next trial. Such estimators are the basic building blocks in mor...
A General Minimax Result for Relative Entropy
- IEEE Trans. Inform. Theory
, 1996
"... : Suppose Nature picks a probability measure P ` on a complete separable metric space X at random from a measurable set P \Theta = fP ` : ` 2 \Thetag. Then, without knowing `, a statistician picks a measure Q on X. Finally, the statistician suffers a loss D(P ` jjQ), the relative entropy between P ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
: Suppose Nature picks a probability measure P ` on a complete separable metric space X at random from a measurable set P \Theta = fP ` : ` 2 \Thetag. Then, without knowing `, a statistician picks a measure Q on X. Finally, the statistician suffers a loss D(P ` jjQ), the relative entropy between P ` and Q. We show that the minimax and maximin values of this game are always equal, and there is always a minimax strategy in the closure of the set of all Bayes strategies. This generalizes previous results of Gallager, and Davisson and Leon-Garcia. Index terms: minimax theorem, minimax redundancy, minimax risk, Bayes risk, relative entropy, KullbackLeibler divergence, density estimation, source coding, channel capacity, computational learning theory 1 Introduction Consider a sequential estimation game in which a statistician is given n independent observations Y 1 ; : : : ; Yn distributed according to an unknown distribution ~ P ` chosen at random by Nature from the set f ~ P ` : ` 2 \...

