Results 1  10
of
39
The minimum description length principle in coding and modeling
 IEEE Trans. Inform. Theory
, 1998
"... Abstract — We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized m ..."
Abstract

Cited by 305 (12 self)
 Add to MetaCart
Abstract — We review the principles of Minimum Description Length and Stochastic Complexity as used in data compression and statistical modeling. Stochastic complexity is formulated as the solution to optimum universal coding problems extending Shannon’s basic source coding theorem. The normalized maximized likelihood, mixture, and predictive codings are each shown to achieve the stochastic complexity to within asymptotically vanishing terms. We assess the performance of the minimum description length criterion both from the vantage point of quality of data compression and accuracy of statistical inference. Context tree modeling, density estimation, and model selection in Gaussian linear regression serve as examples. Index Terms—Complexity, compression, estimation, inference, universal modeling.
Informationtheoretic asymptotics of Bayes methods
 IEEE Transactions on Information Theory
, 1990
"... AbstractIn the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian densit ..."
Abstract

Cited by 107 (10 self)
 Add to MetaCart
AbstractIn the absence of knowledge of the true density function, Bayesian models take the joint density function for a sequence of n random variables to be an average of densities with respect to a prior. We examine the relative entropy distance D,, between the true density and the Bayesian density and show that the asymptotic distance is (d/2Xlogn)+ c, where d is the dimension of the parameter vector. Therefore, the relative entropy rate D,,/n converges to zero at rate (logn)/n. The constant c, which we explicitly identify, depends only on the prior density function and the Fisher information matrix evaluated at the true parameter value. Consequences are given for density estimation, universal data compression, composite hypothesis testing, and stockmarket portfolio selection. 1.
Asymptotic normality of posterior distributions in highdimensional linear models, Bernoulli 5
, 1999
"... We study consistency and asymptotic normality of posterior distributions of the natural parameter for an exponential family when the dimension of the parameter grows with the sample size. Under certain growth restrictions on the dimension, we show that the posterior distributions concentrate in neig ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
We study consistency and asymptotic normality of posterior distributions of the natural parameter for an exponential family when the dimension of the parameter grows with the sample size. Under certain growth restrictions on the dimension, we show that the posterior distributions concentrate in neighbourhoods of the true parameter and can be approximated by an appropriate normal distribution.
Two Estimators of the Mean of a Counting Process with Panel Count Data
, 1998
"... We study two estimators of the mean function of a counting process based on "panel count data". The setting for "panel count data" is one in which n independent subjects, each with a counting process with common mean function, are observed at several possibly di erent times during a study. Following ..."
Abstract

Cited by 21 (12 self)
 Add to MetaCart
We study two estimators of the mean function of a counting process based on "panel count data". The setting for "panel count data" is one in which n independent subjects, each with a counting process with common mean function, are observed at several possibly di erent times during a study. Following a model proposed by Schick and Yu (1997), we allow the number of observation times, and the observation times themselves, to be random variables. Our goal is to estimate the mean function of the counting process. We show that the estimator of the mean function proposed by Sun and Kalbfleisch (1995) can be viewed as a pseudomaximum likelihood estimator when a nonhomogeneous Poisson process model is assumed for the counting process. We establish consistency of both the nonparametric pseudo maximum likelihood estimator of Sun and Kalbfleisch (1995) and the full maximum likelihood estimator, even if the underlying counting process is not a Poisson process. We also derive the asymptotic distribution of both estimators at a xed time t, and compare the resulting theoretical relative e ciency with nite sample relative efficiency by way of a limited montecarlo study.
Bayesian Statistics
 in WWW', Computing Science and Statistics
, 1989
"... ∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second o ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
∗ Signatures are on file in the Graduate School. This dissertation presents two topics from opposite disciplines: one is from a parametric realm and the other is based on nonparametric methods. The first topic is a jackknife maximum likelihood approach to statistical model selection and the second one is a convex hull peeling depth approach to nonparametric massive multivariate data analysis. The second topic includes simulations and applications on massive astronomical data. First, we present a model selection criterion, minimizing the KullbackLeibler distance by using the jackknife method. Various model selection methods have been developed to choose a model of minimum KullbackLiebler distance to the true model, such as Akaike information criterion (AIC), Bayesian information criterion (BIC), Minimum description length (MDL), and Bootstrap information criterion. Likewise, the jackknife method chooses a model of minimum KullbackLeibler distance through bias reduction. This bias, which is inevitable in model
An inverse of Sanov's theorem
 Statist. Probab. Lett
, 1999
"... Let Xk be a sequence of i.i.d. random variables taking values in a nite set, and consider the problem of estimating the law of X1 in a Bayesian framework. We prove that the sequence of posterior distributions satis es a large deviation principle, and give an explicit expression for the rate function ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
Let Xk be a sequence of i.i.d. random variables taking values in a nite set, and consider the problem of estimating the law of X1 in a Bayesian framework. We prove that the sequence of posterior distributions satis es a large deviation principle, and give an explicit expression for the rate function. As an application, we obtain an asymptotic formula for the predictive probability of ruin in the classical gambler’s ruin problem. c ○ 1999 Elsevier Science B.V. All rights reserved
On asymptotic problems of parameter estimation in stochastic PDE's: the case of discrete time sampling
, 1997
"... The problem of estimating parameters of randomly perturbed PDE's is considered. ML estimators based on discrete time sampling of M observable Fourier coefficients of the random field governed by the stochastic PDE in question are studied. Necessary and sufficient conditions are given for the consis ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
The problem of estimating parameters of randomly perturbed PDE's is considered. ML estimators based on discrete time sampling of M observable Fourier coefficients of the random field governed by the stochastic PDE in question are studied. Necessary and sufficient conditions are given for the consistency, asymptotic normality and asymptotic efficiency of the ML estimators when M !1. These conditions are given in terms of simple properties of the operators involved in the equation and are easy to check. Key words: parameter estimation, SPDE, ML Introduction We consider the problem of estimating the unknown scalar parameter ` from partial observations of the random field u(t; x) governed by the equation @u=@t + (A 0 + `A 1 )u = S(t; x) (1) where x 2 G; G is a bounded region in the ddimensional Euclidean space R d ; A k ; k = 0; 1 are linear operators and S(t; x) is the Gaussian "white noise" in t. We assume that only the amplitudes um (t n ); n = 1; : : : ; N m = 1; : : : ; M (2...
On the uniform consistency of Bayes estimates for multinomial probabilities
, 1988
"... Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
Consistency of Bayes estimates for nonparametric regression: normal theory
 Bernoulli
, 1998
"... Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at
Prior Information and Generalized Questions
, 1996
"... In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint). ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
In learning problems available information is usually divided into two categories: examples of function values (or training data) and prior information (e.g. a smoothness constraint).