Results 1 -
7 of
7
The consistency of the BIC Markov order estimator.
"... . The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet A) from observation of a sample path x 1 ; x 2 ; : : : ; x n , as that value k = k that minimizes the sum of the negative logarithm of the k-th order maximum likelihood and the penalty term jAj ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
. The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet A) from observation of a sample path x 1 ; x 2 ; : : : ; x n , as that value k = k that minimizes the sum of the negative logarithm of the k-th order maximum likelihood and the penalty term jAj k (jAj\Gamma1) 2 log n: We show that k equals the correct order of the chain, eventually almost surely as n ! 1, thereby strengthening earlier consistency results that assumed an apriori bound on the order. A key tool is a strong ratio-typicality result for Markov sample paths. We also show that the Bayesian estimator or minimum description length estimator, of which the BIC estimator is an approximation, fails to be consistent for the uniformly distributed i.i.d. process. AMS 1991 subject classification: Primary 62F12, 62M05; Secondary 62F13, 60J10 Key words and phrases: Bayesian Information Criterion, order estimation, ratiotypicality, Markov chains. 1 Supported in part by a joint N...
Blind construction of optimal nonlinear recursive predictors for discrete sequences
- In “Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference
, 2004
"... We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (Causal-State Splitting Reconstru ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (Causal-State Splitting Reconstruction), which approximates the ideal predictor from data. We discuss the reliability of CSSR, its data requirements, and its performance in simulations. Finally, we compare our approach to existing methods using variablelength Markov models and cross-validated hidden Markov models, and show theoretically and experimentally that our method delivers results superior to the former and at least comparable to the latter. 1
The Interactions Between Ergodic Theory and Information Theory
- IEEE Transactions on Information Theory
, 1998
"... Information theorists frequently use the ergodic theorem; likewise entropy concepts are often used in information theory. Recently the two subjects have become partially intertwined as deeper results from each discipline find use in the other. A brief history of this interaction is presented in this ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Information theorists frequently use the ergodic theorem; likewise entropy concepts are often used in information theory. Recently the two subjects have become partially intertwined as deeper results from each discipline find use in the other. A brief history of this interaction is presented in this paper, together with a more detailed look at three areas of connection, namely, recurrence theory, blowing-up bounds, and direct sample-path methods.
Dynamics of Bayesian updating with dependent data and misspecified models
, 2009
"... Abstract: Much is now known about the consistency of Bayesian updating on infinite-dimensional parameter spaces with independent or Markovian data. Necessary conditions for consistency include the prior putting enough weight on the correct neighborhoods of the data-generating distribution; various s ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Abstract: Much is now known about the consistency of Bayesian updating on infinite-dimensional parameter spaces with independent or Markovian data. Necessary conditions for consistency include the prior putting enough weight on the correct neighborhoods of the data-generating distribution; various sufficient conditions further restrict the prior in ways analogous to capacity control in frequentist nonparametrics. The asymptotics of Bayesian updating with mis-specified models or priors, or non-Markovian data, are far less well explored. Here I establish sufficient conditions for posterior convergence when all hypotheses are wrong, and the data have complex dependencies. The main dynamical assumption is the asymptotic equipartition (Shannon-McMillan-Breiman) property of information theory. This, along with Egorov’s Theorem on uniform convergence, lets me build a sieve-like structure for the prior. The main statistical assumption, also a form of capacity control, concerns the compatibility of the prior and the data-generating process, controlling the fluctuations in the loglikelihood when averaged over the sieve-like sets. In addition to posterior convergence, I derive a kind of large deviations principle for the posterior measure, extending in some cases to rates of convergence, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between these results and the replicator dynamics of evolutionary theory.
Consistency Of The Bic Order Estimator
, 1999
"... . We announce two results on the problem of estimating the order of a Markov chain from observation of a sample path. First is that the Bayesian Information Criterion (BIC) leads to an almost surely consistent estimator. Second is that the Bayesian minimum description length estimator, of which the ..."
Abstract
- Add to MetaCart
. We announce two results on the problem of estimating the order of a Markov chain from observation of a sample path. First is that the Bayesian Information Criterion (BIC) leads to an almost surely consistent estimator. Second is that the Bayesian minimum description length estimator, of which the BIC estimator is an approximation, fails to be consistent for the uniformly distributed i.i.d. process. A key tool is a strong ratio-typicality result for empirical k-block distributions. Complete proofs are given in the authors' article to appear in The Annals of Statistics. 1. Introduction Let M k denote the class of Markov chains of order at most k, with values drawn from a finite set A, and let M = S 1 k=0 M k . An important problem is to estimate the order of a Markov chain from observation of a finite sample path. A popular method is the so-called Bayesian Information Criterion (BIC), first introduced by Schwarz, [12], which gives the estimator defined by k BIC = k BIC (x n 1 ) ...
Submitted to the Annals of Statistics arXiv: 0901.1342 DYNAMICS OF BAYESIAN UPDATING WITH DEPENDENT DATA AND MISSPECIFIED MODELS
, 901
"... Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data ..."
Abstract
- Add to MetaCart
Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data being independent and identically distributed or Markovian. Here I establish sufficient conditions for the convergence of the posterior distribution in non-parametric problems even when all of the hypotheses are wrong, and the data-generating process has a complicated dependence structure. The main dynamical assumption is the generalized asymptotic equipartition (or “Shannon-McMillan-Breiman”) property of information theory. I derive a kind of large deviations principle for the posterior measure, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between the present results and the “replicator dynamics ” of evolutionary theory. 1. Introduction. The
Submitted to the Annals of Statistics DYNAMICS OF BAYESIAN UPDATING WITH DEPENDENT DATA AND MISSPECIFIED MODELS
, 901
"... Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter is has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the da ..."
Abstract
- Add to MetaCart
Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter is has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data-generating process being independent and identically distributed or a Markov process. Here I establish sufficient conditions for the convergence of the posterior distribution in non-parametric problems even when all of the hypotheses are wrong, and the data-generating process has a complicated dependence structure. The main dynamical assumption is the generalized asymptotic equipartition (or “Shannon-McMillan-Breiman”) property of information theory. I derive a kind of large deviations principle for the posterior measure, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between the present results and the “replicator dynamics ” of evolutionary theory. 1. Introduction. The

