Results 1  10
of
76
The consistency of the BIC Markov order estimator.
"... . The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet A) from observation of a sample path x 1 ; x 2 ; : : : ; x n , as that value k = k that minimizes the sum of the negative logarithm of the kth order maximum likelihood and the penalty term jAj ..."
Abstract

Cited by 64 (3 self)
 Add to MetaCart
. The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet A) from observation of a sample path x 1 ; x 2 ; : : : ; x n , as that value k = k that minimizes the sum of the negative logarithm of the kth order maximum likelihood and the penalty term jAj k (jAj\Gamma1) 2 log n: We show that k equals the correct order of the chain, eventually almost surely as n ! 1, thereby strengthening earlier consistency results that assumed an apriori bound on the order. A key tool is a strong ratiotypicality result for Markov sample paths. We also show that the Bayesian estimator or minimum description length estimator, of which the BIC estimator is an approximation, fails to be consistent for the uniformly distributed i.i.d. process. AMS 1991 subject classification: Primary 62F12, 62M05; Secondary 62F13, 60J10 Key words and phrases: Bayesian Information Criterion, order estimation, ratiotypicality, Markov chains. 1 Supported in part by a joint N...
Inequalities for the occurrence times of rare events in mixing processes. The state of the art.
 MARKOV PROC. RELAT. FIELDS
, 2000
"... The first occurrence time of a rare event in a mixing process typically has a distribution which can be well approximated by the exponential law. In this paper we review recent theorems giving upper bounds for the error term of this approximation. We shall focus on papers treating the problem in a g ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
The first occurrence time of a rare event in a mixing process typically has a distribution which can be well approximated by the exponential law. In this paper we review recent theorems giving upper bounds for the error term of this approximation. We shall focus on papers treating the problem in a general mixing framework. Running title: Rare events in mixing processes.
On The Role of Pattern Matching In Information Theory
 IEEE TRANSACTIONS ON INFORMATION THEORY
"... In this paper, the role of pattern matching information theory is motivated and discussed. We describe the relationship between a pattern's recurrence time and its probability under the data generating stochastic source. We motivate how this relationship has led to great advances in universal d ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
In this paper, the role of pattern matching information theory is motivated and discussed. We describe the relationship between a pattern's recurrence time and its probability under the data generating stochastic source. We motivate how this relationship has led to great advances in universal datacompression. We then describe nonasymptotic uniform bounds on the performance of data compression algorithms in cases where the size of the training data that is available to the encoder is not large enough so as to yield the asymptotic compression: the Shannon entropy. We then discuss applications of pattern matching and universal compression to universal prediction, classification and to entropy estimation.
The Interactions Between Ergodic Theory and Information Theory
 IEEE Transactions on Information Theory
, 1998
"... Information theorists frequently use the ergodic theorem; likewise entropy concepts are often used in information theory. Recently the two subjects have become partially intertwined as deeper results from each discipline find use in the other. A brief history of this interaction is presented in this ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
Information theorists frequently use the ergodic theorem; likewise entropy concepts are often used in information theory. Recently the two subjects have become partially intertwined as deeper results from each discipline find use in the other. A brief history of this interaction is presented in this paper, together with a more detailed look at three areas of connection, namely, recurrence theory, blowingup bounds, and direct samplepath methods.
DYNAMICS OF BAYESIAN UPDATING WITH DEPENDENT DATA AND MISSPECIFIED MODELS
, 2009
"... Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
(Show Context)
Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data being independent and identically distributed or Markovian. Here I establish sufficient conditions for the convergence of the posterior distribution in nonparametric problems even when all of the hypotheses are wrong, and the datagenerating process has a complicated dependence structure. The main dynamical assumption is the generalized asymptotic equipartition (or “ShannonMcMillanBreiman”) property of information theory. I derive a kind of large deviations principle for the posterior measure, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between the present results and the “replicator dynamics” of evolutionary theory.
The Kolmogorov sampler
, 2002
"... iid 2 Given noisy observations Xi = θi + Zi, i =1,...,n, with noise Zi ∼ N(0,σ), we wish to recover the signal θ with small meansquared error. We consider the Minimum Kolmogorov Complexity Estimator (MKCE), defined roughly as the nvector ˆ θ(X) solving the problem min Y K(Y) subject to �X − Y �2 l ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
iid 2 Given noisy observations Xi = θi + Zi, i =1,...,n, with noise Zi ∼ N(0,σ), we wish to recover the signal θ with small meansquared error. We consider the Minimum Kolmogorov Complexity Estimator (MKCE), defined roughly as the nvector ˆ θ(X) solving the problem min Y K(Y) subject to �X − Y �2 l 2 n ≤ σ2 · n, where K(Y) denotes the length of the shortest computer program that can compute the finiteprecision nvector Y.Inwords, this is the simplest object that fits the data to within the lackoffit between θ and X that would be expected on statistical grounds. Suppose that the θi are successive samples from a stationary ergodic process obeying
Online Clustering of Processes
"... The problem of online clustering is considered in the case where each data point is a sequence generated by a stationary ergodic process. Data arrive in an online fashion so that the sample received at every timestep is either a continuation of some previously received sequence or a new sequence. Th ..."
Abstract

Cited by 10 (9 self)
 Add to MetaCart
(Show Context)
The problem of online clustering is considered in the case where each data point is a sequence generated by a stationary ergodic process. Data arrive in an online fashion so that the sample received at every timestep is either a continuation of some previously received sequence or a new sequence. The dependence between the sequences can be arbitrary. No parametric or independence assumptions are made; the only assumption is that the marginal distribution of each sequence is stationary and ergodic. A novel, computationally efficient algorithm is proposed and is shown to be asymptotically consistent (under a natural notion of consistency). The performance of the proposed algorithm is evaluated on simulated data, as well as on real datasets (motion classification). 1
SHORTEST SPANNING TREES AND A COUNTEREXAMPLE FOR RANDOM WALKS IN RANDOM ENVIRONMENTS
"... Abstract. We construct forests spanning Z d, d ≥ 2, that are stationary and directed, and whose trees are infinite but are as short as possible. For d ≥ 3, two independent copies of such forests, pointing into opposite directions, can be pruned so as to become disjoint. From this, we construct in d ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Abstract. We construct forests spanning Z d, d ≥ 2, that are stationary and directed, and whose trees are infinite but are as short as possible. For d ≥ 3, two independent copies of such forests, pointing into opposite directions, can be pruned so as to become disjoint. From this, we construct in d ≥ 3 a stationary, polynomially mixing and uniformly elliptic environment of nearestneighbor transition probabilities on Z d, for which the corresponding random walk (RWRE) disobeys a certain zeroone law for directional transience. 1.
Estimating the Entropy of Binary Time Series: Methodology, Some Theory and a Simulation Study
"... entropy ..."
(Show Context)
Channel Simulation by Interval Algorithm: A Performance Analysis of Interval Algorithm
 IEEE Trans. Inform. Theory
"... This paper deals with the problem of simulating a discrete memoryless channel and proposes two algorithms for channel simulation by using the interval algorithm. The first algorithm provides exact channel simulation and the number of fair random bits per input sample approaches the conditional resol ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
This paper deals with the problem of simulating a discrete memoryless channel and proposes two algorithms for channel simulation by using the interval algorithm. The first algorithm provides exact channel simulation and the number of fair random bits per input sample approaches the conditional resolvability of the channel with probability one. The second algorithm provides approximate channel simulation and the approximation error measured by the variational distance vanishes exponentially as the block length tends to infinity, when the number of fair random bits per input sample is above the conditional resolvability. Further, some asymptotic properties of these algorithms as well as the original interval algorithm for random number generation are clarified. Keywords: channel simulation, interval algorithm, conditional resolvability, conditional entropy, random number generation 2 I.