Results 1 -
9 of
9
The consistency of the BIC Markov order estimator.
"... . The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet A) from observation of a sample path x 1 ; x 2 ; : : : ; x n , as that value k = k that minimizes the sum of the negative logarithm of the k-th order maximum likelihood and the penalty term jAj ..."
Abstract
-
Cited by 42 (3 self)
- Add to MetaCart
. The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet A) from observation of a sample path x 1 ; x 2 ; : : : ; x n , as that value k = k that minimizes the sum of the negative logarithm of the k-th order maximum likelihood and the penalty term jAj k (jAj\Gamma1) 2 log n: We show that k equals the correct order of the chain, eventually almost surely as n ! 1, thereby strengthening earlier consistency results that assumed an apriori bound on the order. A key tool is a strong ratio-typicality result for Markov sample paths. We also show that the Bayesian estimator or minimum description length estimator, of which the BIC estimator is an approximation, fails to be consistent for the uniformly distributed i.i.d. process. AMS 1991 subject classification: Primary 62F12, 62M05; Secondary 62F13, 60J10 Key words and phrases: Bayesian Information Criterion, order estimation, ratiotypicality, Markov chains. 1 Supported in part by a joint N...
Rates of Convergence of Posterior Distributions
, 1998
"... We compute the rate at which the posterior distribution concentrates around the true parameter value. The spaces we work in are quite general and include infinite dimensional cases. The rates are driven by two quantities: the size of the space, as measure by metric entropy or bracketing entropy, and ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
We compute the rate at which the posterior distribution concentrates around the true parameter value. The spaces we work in are quite general and include infinite dimensional cases. The rates are driven by two quantities: the size of the space, as measure by metric entropy or bracketing entropy, and the degree to which the prior concentrates in a small ball around the true parameter. We apply the results to several examples. In some cases, natural priors give sub-optimal rates of convergence and better rates can be obtained by using sievebased priors such as those introduced by Zhao (1993, 1998). AMS 1990 classification: Primary, 62A15, Secondary: 62E20, 62G15. KEYWORDS: Bayesian inference, asymptotic inference, non-parametric models, sieves. 1 Introduction. Nonparametric Bayesian methods have become quite popular lately, largely because of advances in computing; see Dey, Mueller and Sinha (1998) for a recent account. Because of their growing popularity, it is important to understand ...
Consistency issues in Bayesian Nonparametrics
- IN ASYMPTOTICS, NONPARAMETRICS AND TIME SERIES: A TRIBUTE
, 1998
"... ..."
Asymptotic Normality of Posterior Distributions in High Dimensional Linear Models
, 1996
"... this paper, we study the behaviour of the posterior distribution as the sample size n tends to infinity where the dimension of the parameter space p = p n is also allowed to grow to infinity with n. This problem is of significant practical importance since in data analysis, one often uses a delicate ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
this paper, we study the behaviour of the posterior distribution as the sample size n tends to infinity where the dimension of the parameter space p = p n is also allowed to grow to infinity with n. This problem is of significant practical importance since in data analysis, one often uses a delicate model (i.e., with a large number of parameters) if one has enough data. In other words, one allows the dimension of the parameter to grow with the sample size. Moreover, nonparametric models can be approximated by parametric models with increasing dimension as discussed by Shibata (1981) and Diaconis and Freedman (1993). The frequentist version of this problem, namely consistency and asymptotic normality of M-estimates has been studied by Huber (1973), Yohai and Maronna (1979), Ringland (1983) and Portnoy (1984, 1985, 1986). In this paper we show that, under certain growth restrictions on the dimension depending on the design variables, the posterior distributions concentrate in the neighbourhoods of the true value of the parameter and admit a normal approximation. It seems that the present paper is the first attempt to study Bayesian asymptotic properties in models of increasing dimension. We observe that the condition required on the growth rate of the dimension p n is more stringent than its frequentist counterparts. Though no claim is made about the necessity of this condition on the growth of p n , we believe that there are at least three reasons to expect some difficulties if p n grows very fast with n. First, there is a long tail area which may substantially contribute to the posterior probabilities although the likelihood is small there. Secondly, our choice of the L
Consistency issues in Bayesian Nonparametrics
- In Asymptotics, Nonparametrics and Time Series: A Tribute
, 1998
"... this paper we are mainly concerned with consistency of the posterior. Informally, the posterior is said to be consistent at a true value ` 0 if the following holds: Suppose X 1 ; X 2 ; : : : ; Xn indeed arise from P `0 , then the posterior converges to the degenerate probability ffi `0 . Alternative ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
this paper we are mainly concerned with consistency of the posterior. Informally, the posterior is said to be consistent at a true value ` 0 if the following holds: Suppose X 1 ; X 2 ; : : : ; Xn indeed arise from P `0 , then the posterior converges to the degenerate probability ffi `0 . Alternatively with P `0 probability 1, the posterior probability of any neighborhood U of ` 0 converges to 1. Why would a Bayesian be interested in consistency? Think of an experiment in which an experimenter generates observations from a known (to the experimenter) distribution. The observations are presented to a Bayesian. It would be embarrassing if, even with large data, the Bayesian fails to come close to finding the mechanism used by the experimenter. Consistency can be thought of as a validation of the Bayesian method. It can also be interpreted as requiring that the data, at least eventually, overrides the prior opinion. Alternatively two Bayesians, with two different priors, presented with the same data eventually agree. A result of this kind relating "merging of opinions" and posterior consistency is discussed in Diaconis and Freedman [86a]. In fact, Diaconis and Freedman [86a] (henceforth abbreviated as DF) and the ensuing discussions contain a wealth of material pertaining to posterior consistency. An early result in posterior consistency is due to Doob [48], who showed that posterior consistency obtains on a set of prior measure 1. This result does not settle the question of consistency for a particular ` 0 of interest. In smooth finite dimensional problems, different methods show (for example Berk [66]) that consistency obtains at all parameter points. Freedman [63] exhibits a prior and points of inconsistency for the infinite cell multinomial. He also showed that this p...
UNAWARENESS, PRIORS AND POSTERIORS
"... Abstract. This note contains first thoughts on awareness of unawareness in a simple dynamic context where a decision situation is repeated over time. The main consequence of increasing awareness is that the model the decision maker uses, and the prior which it contains, becomes richer over time. The ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. This note contains first thoughts on awareness of unawareness in a simple dynamic context where a decision situation is repeated over time. The main consequence of increasing awareness is that the model the decision maker uses, and the prior which it contains, becomes richer over time. The decision maker is prepared to this change, and we show that if a projection-consistency axiom is satisfied unawareness does not affect the value of her estimate of a payoff-relevant conditional probability (although it may weaken confidence in such estimate). Probability-zero events however pose a challenge to this axiom, and if that fails, even estimate values will be different if the decision maker takes unawareness into account. In examining evolution of knowledge about relevant variable through time, we distinguish between transition from uncertainty to certainty and from unawareness to certainty directly, and argue that new knowledge may cause posteriors to jump more if it is also new awareness. Some preliminary considerations on convergence of estimates are included.
Asymptotic Properties of Nonparametric Bayesian Procedures
, 1997
"... This chapter provides a brief review of some large sample frequentist properties of nonparametric Bayesian procedures. The review is not comprehensive, but rather, is meant to give a simple, heuristic introduction to some of the main concepts. We mainly focus on consistency but we touch on a few oth ..."
Abstract
- Add to MetaCart
This chapter provides a brief review of some large sample frequentist properties of nonparametric Bayesian procedures. The review is not comprehensive, but rather, is meant to give a simple, heuristic introduction to some of the main concepts. We mainly focus on consistency but we touch on a few other issues as well. 1 Introduction Nonparametric Bayesian procedures present a paradox in the Bayesian paradigm. On the one hand, they are most useful when we don't have precise information. On the other hand, they require huge amounts of prior information because nonparametric procedures involve high dimensional if not infinite dimensional parameter spaces. The usual hope that the data will dominate the prior was dashed by Freedman (1963, 1965) and then Diaconis and Freedman (1986) who showed that putting mass in weak neighborhoods the true distribution does not guarantee that the posterior accumulates in weak neighborhoods. Interest in properties like consistency derive from our desire tha...
Consistency Of The Bic Order Estimator
, 1999
"... . We announce two results on the problem of estimating the order of a Markov chain from observation of a sample path. First is that the Bayesian Information Criterion (BIC) leads to an almost surely consistent estimator. Second is that the Bayesian minimum description length estimator, of which the ..."
Abstract
- Add to MetaCart
. We announce two results on the problem of estimating the order of a Markov chain from observation of a sample path. First is that the Bayesian Information Criterion (BIC) leads to an almost surely consistent estimator. Second is that the Bayesian minimum description length estimator, of which the BIC estimator is an approximation, fails to be consistent for the uniformly distributed i.i.d. process. A key tool is a strong ratio-typicality result for empirical k-block distributions. Complete proofs are given in the authors' article to appear in The Annals of Statistics. 1. Introduction Let M k denote the class of Markov chains of order at most k, with values drawn from a finite set A, and let M = S 1 k=0 M k . An important problem is to estimate the order of a Markov chain from observation of a finite sample path. A popular method is the so-called Bayesian Information Criterion (BIC), first introduced by Schwarz, [12], which gives the estimator defined by k BIC = k BIC (x n 1 ) ...

