Results 1 -
9 of
9
Information-Theoretic Determination of Minimax Rates of Convergence
- Ann. Stat
, 1997
"... In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain information-theoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence. ..."
Abstract
-
Cited by 67 (18 self)
- Add to MetaCart
In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain information-theoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence.
Information Theoretic Methods in Probability and Statistics
, 2001
"... Ideas of information theory have found fruitful applications not only in various fields of science and engineering but also within mathematics, both pure and applied. This is illustrated by several typical applications of information theory specifically in probability and statistics. ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Ideas of information theory have found fruitful applications not only in various fields of science and engineering but also within mathematics, both pure and applied. This is illustrated by several typical applications of information theory specifically in probability and statistics.
Asymptotic Redundancies for Universal Quantum Coding
"... Clarke and Barron have recently shown that the Jereys' invariant prior of Bayesian theory yields the common asymptotic (minimax and maximin) redundancy of universal data compression in a parametric setting. We seek a possible analogue of this result for the two-level quantum systems. We restrict ou ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Clarke and Barron have recently shown that the Jereys' invariant prior of Bayesian theory yields the common asymptotic (minimax and maximin) redundancy of universal data compression in a parametric setting. We seek a possible analogue of this result for the two-level quantum systems. We restrict our considerations to prior probability distributions belonging to a certain one-parameter family, qu , 1 < u < 1. Within this setting, we are able to compute exact redundancy formulas, for which we nd the asymptotic limits. We compare our quantum asymptotic redundancy formulas to those derived by naively applying the (non-quantum) counterparts of Clarke and Barron, and nd certain common features. Our results are based on formulas we obtain for the eigenvalues and eigenvectors of 2 n 2 n (Bayesian density) matrices, n (u). These matrices are the weighted averages (with respect to qu) of all possible tensor products of n identical 2 2 density matrices, representing the two-level quantum systems. We propose a form of universal coding for the situation in which the density matrix describing an ensemble of quantum signal states is unknown. A sequence of n signals would be projected onto the dominant eigenspaces of n (u).
Asymptotic Normality of the Posterior in Relative Entropy
- IEEE Trans. Inform. Theory
, 1999
"... We show that the relative entropy between a posterior density formed from a smooth likelihood and prior and a limiting normal form tends to zero in the independent and identically distributed case. The mode of convergence is in probability and in mean. Applications to codelengths in stochastic compl ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We show that the relative entropy between a posterior density formed from a smooth likelihood and prior and a limiting normal form tends to zero in the independent and identically distributed case. The mode of convergence is in probability and in mean. Applications to codelengths in stochastic complexity and to sample size selection are briey discussed. Index Terms: Posterior density, asymptotic normality, relative entropy. Revision submitted to Trans. Inform Theory , 22 May 1998. This research was partially supported by NSERC Operating Grant 5-54891. The author is with the Department of Statistics, University of British Columbia, Room 333, 6356 Agricultural Road, Vancouver, BC, Canada V6T 1Z2. 1 I.
An Information Criterion for Likelihood Selection
- IEEE Trans. Inform. Theory
, 1999
"... For a given source distribution, we establish properties of the conditional density achieving the rate distortion function lower bound as the distortion parameter varies. In the limit as the distortion tolerated goes to zero, the conditional density achieving the rate distortion function lower bound ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
For a given source distribution, we establish properties of the conditional density achieving the rate distortion function lower bound as the distortion parameter varies. In the limit as the distortion tolerated goes to zero, the conditional density achieving the rate distortion function lower bound becomes degenerate in the sense that the channel it denes becomes error free. As the permitted distortion increases to its limit, the conditional density achieving the rate distor1 tion function lower bound denes a channel which no longer depends on the source distribution. In addition to the data compression motivation, we establish two results { one asymptotic, one non-asymptotic { showing that the the conditional densities achieving the rate distortion function lower bound make relatively weak assumptions on the dependence between the source and its representation. This corresponds, in Bayes estimation, to choosing a likelihood which makes relatively weak assumptions on the data generating mechanism if the source is regarded as a prior. Taken together, these results suggest one can use the conditional density obtained from the rate distortion function in data analysis. That is, when it is impossible to identify a `true' parametric family on the basis of physical modeling, our results provide both data compression and channel coding justication for using the conditional density achieving the rate distortion function lower bound as a likelihood. Index Terms { Mutual information, rate distortion, likelihood selection. 2 I.
A Minimally Informative Likelihood for Decision Analysis: Robustness and Illustration
- Canadian Journal Statistics
, 1999
"... Here we use a class of likelihoods which makes weak assumptions on data generating mechanisms. These likelihoods may be appropriate for data sets where it is difficult to propose physically motivated models. We give some properties of these likelihoods, showing how they can be computed numerically b ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Here we use a class of likelihoods which makes weak assumptions on data generating mechanisms. These likelihoods may be appropriate for data sets where it is difficult to propose physically motivated models. We give some properties of these likelihoods, showing how they can be computed numerically by use of the Blahut-Arimoto algorithm. Then, in the context of a data set for which no plausible physical model is apparent, we show how these likelihoods give useful inferences for the location of a distribution. The plausibility of the inferences is enhanced by the extensive robustness analysis these likelihoods permit.
B-Course: A Web Service for Bayesian Data Analysis
"... B-Course is a free web-based online data analysis tool, which allows the users to analyze their data for multivariate probabilistic dependencies. These dependencies are represented as Bayesian network models. In addition to this, B-Course also oers facilities for inferring certain type of causal ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
B-Course is a free web-based online data analysis tool, which allows the users to analyze their data for multivariate probabilistic dependencies. These dependencies are represented as Bayesian network models. In addition to this, B-Course also oers facilities for inferring certain type of causal dependencies from the data. The software uses a novel "tutorial style" userfriendly interface which intertwines the steps in the data analysis with support material that gives an informal introduction to the Bayesian approach adopted. Although the analysis methods, modeling assumptions and restrictions are totally transparent to the user, this transparency is not achieved at the expense of analysis power: with the restrictions stated in the support material, B-Course is a powerful analysis tool exploiting several theoretically elaborate results developed recently in the elds of Bayesian and causal modeling. B-Course can be used with most web-browsers (even Lynx), and the facilities include features such as automatic missing data handling and discretization, a exible graphical interface for probabilistic inference on the constructed Bayesian network models (for Java enabled browsers), automatic pretty-printed layout for the networks, exportation of the models, and analysis of the importance of the derived dependencies. In this paper we discuss both the theoretical design principles underlying the BCourse tool, and the pragmatic methods adopted in the implementation of the software.
Tree Augmented Classification of Binary Data Minimizing Stochastic Complexity
, 2002
"... We establish the algorithms and procedures that augment by trees the classfiers of binary feature vectors in (Gyllenberg et. al. 1993, 1997, Gyllenberg et. al. 1999 and Gyllenberg and Koski 2002). The notion of augmenting a classifier by a tree is due to (Chow and Liu 1968) and in a more extensive f ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We establish the algorithms and procedures that augment by trees the classfiers of binary feature vectors in (Gyllenberg et. al. 1993, 1997, Gyllenberg et. al. 1999 and Gyllenberg and Koski 2002). The notion of augmenting a classifier by a tree is due to (Chow and Liu 1968) and in a more extensive form due to (Friedman et. al. 1997). These techniques will in another report be primarily applied to unsupervised classification of bacterial DNA fingerprints (or electrophoretic patterns), c.f., (Gyllenberg and Koski 2001 (a), Rademaker et. al. 1999). By classification we mean here both the (unsupervised) procedures of finding the classes in (training) data of items as well as the actual outcome of the procedure, i.e., a partitioning of the items. By identification we mean the procedures for finding the assignment of items in classes, pre-established in one way or the other. The distinction should be clear, although the algorithms of classification as given in the sequel will also...
Partial Information Reference Priors
, 2000
"... Suppose X 1 ; : : : ; X n are IID p(j; ) where (; ) 2 IR d is distributed according to the prior density w(). For estimators S n = S(X n ) and T n = T (X n ) assumed to be consistent for some function of and asymptotically normal we examine the conditional Shannon mutual information (CSMI) be ..."
Abstract
- Add to MetaCart
Suppose X 1 ; : : : ; X n are IID p(j; ) where (; ) 2 IR d is distributed according to the prior density w(). For estimators S n = S(X n ) and T n = T (X n ) assumed to be consistent for some function of and asymptotically normal we examine the conditional Shannon mutual information (CSMI) between and T n given and S n , I(; T n j ; S n ). It is seen there are several important special cases of this CSMI. We establish an asymptotic formula for it and identify the resulting noninformative reference prior. As a consequence, we develop the notion of data dependent priors and a calibration for how close an estimator is to suciency. 1 x1

