Results 1  10
of
10
InformationTheoretic Determination of Minimax Rates of Convergence
 Ann. Stat
, 1997
"... In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain informationtheoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence. ..."
Abstract

Cited by 98 (18 self)
 Add to MetaCart
In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain informationtheoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence.
Information Theoretic Methods in Probability and Statistics
, 2001
"... Ideas of information theory have found fruitful applications not only in various fields of science and engineering but also within mathematics, both pure and applied. This is illustrated by several typical applications of information theory specifically in probability and statistics. ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Ideas of information theory have found fruitful applications not only in various fields of science and engineering but also within mathematics, both pure and applied. This is illustrated by several typical applications of information theory specifically in probability and statistics.
Asymptotic Redundancies for Universal Quantum Coding
"... Clarke and Barron have recently shown that the Jereys' invariant prior of Bayesian theory yields the common asymptotic (minimax and maximin) redundancy of universal data compression in a parametric setting. We seek a possible analogue of this result for the twolevel quantum systems. We restrict ou ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
Clarke and Barron have recently shown that the Jereys' invariant prior of Bayesian theory yields the common asymptotic (minimax and maximin) redundancy of universal data compression in a parametric setting. We seek a possible analogue of this result for the twolevel quantum systems. We restrict our considerations to prior probability distributions belonging to a certain oneparameter family, qu , 1 < u < 1. Within this setting, we are able to compute exact redundancy formulas, for which we nd the asymptotic limits. We compare our quantum asymptotic redundancy formulas to those derived by naively applying the (nonquantum) counterparts of Clarke and Barron, and nd certain common features. Our results are based on formulas we obtain for the eigenvalues and eigenvectors of 2 n 2 n (Bayesian density) matrices, n (u). These matrices are the weighted averages (with respect to qu) of all possible tensor products of n identical 2 2 density matrices, representing the twolevel quantum systems. We propose a form of universal coding for the situation in which the density matrix describing an ensemble of quantum signal states is unknown. A sequence of n signals would be projected onto the dominant eigenspaces of n (u).
Asymptotic Normality of the Posterior in Relative Entropy
 IEEE Trans. Inform. Theory
, 1999
"... We show that the relative entropy between a posterior density formed from a smooth likelihood and prior and a limiting normal form tends to zero in the independent and identically distributed case. The mode of convergence is in probability and in mean. Applications to codelengths in stochastic compl ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
We show that the relative entropy between a posterior density formed from a smooth likelihood and prior and a limiting normal form tends to zero in the independent and identically distributed case. The mode of convergence is in probability and in mean. Applications to codelengths in stochastic complexity and to sample size selection are briey discussed. Index Terms: Posterior density, asymptotic normality, relative entropy. Revision submitted to Trans. Inform Theory , 22 May 1998. This research was partially supported by NSERC Operating Grant 554891. The author is with the Department of Statistics, University of British Columbia, Room 333, 6356 Agricultural Road, Vancouver, BC, Canada V6T 1Z2. 1 I.
An Information Criterion for Likelihood Selection
 IEEE Trans. Inform. Theory
, 1999
"... For a given source distribution, we establish properties of the conditional density achieving the rate distortion function lower bound as the distortion parameter varies. In the limit as the distortion tolerated goes to zero, the conditional density achieving the rate distortion function lower bound ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
For a given source distribution, we establish properties of the conditional density achieving the rate distortion function lower bound as the distortion parameter varies. In the limit as the distortion tolerated goes to zero, the conditional density achieving the rate distortion function lower bound becomes degenerate in the sense that the channel it denes becomes error free. As the permitted distortion increases to its limit, the conditional density achieving the rate distor1 tion function lower bound denes a channel which no longer depends on the source distribution. In addition to the data compression motivation, we establish two results { one asymptotic, one nonasymptotic { showing that the the conditional densities achieving the rate distortion function lower bound make relatively weak assumptions on the dependence between the source and its representation. This corresponds, in Bayes estimation, to choosing a likelihood which makes relatively weak assumptions on the data generating mechanism if the source is regarded as a prior. Taken together, these results suggest one can use the conditional density obtained from the rate distortion function in data analysis. That is, when it is impossible to identify a `true' parametric family on the basis of physical modeling, our results provide both data compression and channel coding justication for using the conditional density achieving the rate distortion function lower bound as a likelihood. Index Terms { Mutual information, rate distortion, likelihood selection. 2 I.
A Minimally Informative Likelihood for Decision Analysis: Robustness and Illustration
 Canadian Journal Statistics
, 1999
"... Here we use a class of likelihoods which makes weak assumptions on data generating mechanisms. These likelihoods may be appropriate for data sets where it is difficult to propose physically motivated models. We give some properties of these likelihoods, showing how they can be computed numerically b ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Here we use a class of likelihoods which makes weak assumptions on data generating mechanisms. These likelihoods may be appropriate for data sets where it is difficult to propose physically motivated models. We give some properties of these likelihoods, showing how they can be computed numerically by use of the BlahutArimoto algorithm. Then, in the context of a data set for which no plausible physical model is apparent, we show how these likelihoods give useful inferences for the location of a distribution. The plausibility of the inferences is enhanced by the extensive robustness analysis these likelihoods permit.
BCourse: A Web Service for Bayesian Data Analysis
"... BCourse is a free webbased online data analysis tool, which allows the users to analyze their data for multivariate probabilistic dependencies. These dependencies are represented as Bayesian network models. In addition to this, BCourse also oers facilities for inferring certain type of causal ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
BCourse is a free webbased online data analysis tool, which allows the users to analyze their data for multivariate probabilistic dependencies. These dependencies are represented as Bayesian network models. In addition to this, BCourse also oers facilities for inferring certain type of causal dependencies from the data. The software uses a novel "tutorial style" userfriendly interface which intertwines the steps in the data analysis with support material that gives an informal introduction to the Bayesian approach adopted. Although the analysis methods, modeling assumptions and restrictions are totally transparent to the user, this transparency is not achieved at the expense of analysis power: with the restrictions stated in the support material, BCourse is a powerful analysis tool exploiting several theoretically elaborate results developed recently in the elds of Bayesian and causal modeling. BCourse can be used with most webbrowsers (even Lynx), and the facilities include features such as automatic missing data handling and discretization, a exible graphical interface for probabilistic inference on the constructed Bayesian network models (for Java enabled browsers), automatic prettyprinted layout for the networks, exportation of the models, and analysis of the importance of the derived dependencies. In this paper we discuss both the theoretical design principles underlying the BCourse tool, and the pragmatic methods adopted in the implementation of the software.
Tree Augmented Classification of Binary Data Minimizing Stochastic Complexity
, 2002
"... We establish the algorithms and procedures that augment by trees the classfiers of binary feature vectors in (Gyllenberg et. al. 1993, 1997, Gyllenberg et. al. 1999 and Gyllenberg and Koski 2002). The notion of augmenting a classifier by a tree is due to (Chow and Liu 1968) and in a more extensive f ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We establish the algorithms and procedures that augment by trees the classfiers of binary feature vectors in (Gyllenberg et. al. 1993, 1997, Gyllenberg et. al. 1999 and Gyllenberg and Koski 2002). The notion of augmenting a classifier by a tree is due to (Chow and Liu 1968) and in a more extensive form due to (Friedman et. al. 1997). These techniques will in another report be primarily applied to unsupervised classification of bacterial DNA fingerprints (or electrophoretic patterns), c.f., (Gyllenberg and Koski 2001 (a), Rademaker et. al. 1999). By classification we mean here both the (unsupervised) procedures of finding the classes in (training) data of items as well as the actual outcome of the procedure, i.e., a partitioning of the items. By identification we mean the procedures for finding the assignment of items in classes, preestablished in one way or the other. The distinction should be clear, although the algorithms of classification as given in the sequel will also...
Partial Information Reference Priors
, 2000
"... Suppose X 1 ; : : : ; X n are IID p(j; ) where (; ) 2 IR d is distributed according to the prior density w(). For estimators S n = S(X n ) and T n = T (X n ) assumed to be consistent for some function of and asymptotically normal we examine the conditional Shannon mutual information (CSMI) be ..."
Abstract
 Add to MetaCart
Suppose X 1 ; : : : ; X n are IID p(j; ) where (; ) 2 IR d is distributed according to the prior density w(). For estimators S n = S(X n ) and T n = T (X n ) assumed to be consistent for some function of and asymptotically normal we examine the conditional Shannon mutual information (CSMI) between and T n given and S n , I(; T n j ; S n ). It is seen there are several important special cases of this CSMI. We establish an asymptotic formula for it and identify the resulting noninformative reference prior. As a consequence, we develop the notion of data dependent priors and a calibration for how close an estimator is to suciency. 1 x1
Hierarchical Universal Coding
, 1998
"... In an earlier paper, we proved a strong version of the redundancycapacity converse theorem of universal coding, stating that for `most ' sources in a given class, the universal coding redundancy is essentially lower bounded by the capacity of the channel induced by this class. Since this result hol ..."
Abstract
 Add to MetaCart
In an earlier paper, we proved a strong version of the redundancycapacity converse theorem of universal coding, stating that for `most ' sources in a given class, the universal coding redundancy is essentially lower bounded by the capacity of the channel induced by this class. Since this result holds for general classes of sources, it extends Rissanen's strong converse theorem for parametric families. While our earlier result has established strong optimality only for mixture codes weighted by the capacityachieving prior, our rst result herein extends this nding to a general prior. For some cases our technique also leads to a simpli ed proof of the above mentioned strong converse theorem. The major interest in this paper, however, is in extending the theory of universal coding to hierarchical structures of classes, where each class may havea di erent capacity. In this setting, one wishes to incur redundancy essentially as small as that corresponding to the active class, and not the union of classes. Our main result is that the redundancy of a code based on a twostage mixture ( rst, within each class, and then over the classes), is no worse than that of any other code for `most ' sources of `most ' classes. If, in addition, the classes can be e ciently distinguished by a certain decision rule, then the best attainable redundancy is given explicitly by the capacity of the active class plus the normalized negative logarithm of the prior probability assigned to this class. These results suggest some interesting guidelines as for the choice of the prior. We also discuss some examples with a natural hierarchical partition into classes.