Results 1  10
of
11
The consistency of the BIC Markov order estimator.
"... . The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet A) from observation of a sample path x 1 ; x 2 ; : : : ; x n , as that value k = k that minimizes the sum of the negative logarithm of the kth order maximum likelihood and the penalty term jAj ..."
Abstract

Cited by 55 (3 self)
 Add to MetaCart
. The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet A) from observation of a sample path x 1 ; x 2 ; : : : ; x n , as that value k = k that minimizes the sum of the negative logarithm of the kth order maximum likelihood and the penalty term jAj k (jAj\Gamma1) 2 log n: We show that k equals the correct order of the chain, eventually almost surely as n ! 1, thereby strengthening earlier consistency results that assumed an apriori bound on the order. A key tool is a strong ratiotypicality result for Markov sample paths. We also show that the Bayesian estimator or minimum description length estimator, of which the BIC estimator is an approximation, fails to be consistent for the uniformly distributed i.i.d. process. AMS 1991 subject classification: Primary 62F12, 62M05; Secondary 62F13, 60J10 Key words and phrases: Bayesian Information Criterion, order estimation, ratiotypicality, Markov chains. 1 Supported in part by a joint N...
Weakly convergent nonparametric forecasting of stationary time series
 IEEE Trans. Inf. Theory
, 1997
"... The conditional distribution of the next outcome given the infinite past of a stationary process can be inferred from finite but growing segments of the past. Several schemes are known for constructing pointwise consistent estimates, but they all demand prohibitive amounts of input data. In this pap ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
The conditional distribution of the next outcome given the infinite past of a stationary process can be inferred from finite but growing segments of the past. Several schemes are known for constructing pointwise consistent estimates, but they all demand prohibitive amounts of input data. In this paper we consider realvalued time series and construct conditional distribution estimates that make much more efficient use of the input data. The estimates are consistent in a weak sense, and the question whether they are pointwise consistent is still open. For finitealphabet processes one may rely on a universal data compression scheme like the LempelZiv algorithm to construct conditional probability mass function estimates that are consistent in expected information divergence. Consistency in this strong sense cannot be attained in a universal sense for all stationary processes with values in an infinite alphabet, but weak consistency can. Some applications of the estimates to online forecasting, regression and classification are discussed. 1 I. Introduction and Overview
Blind construction of optimal nonlinear recursive predictors for discrete sequences
 In “Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference
, 2004
"... We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (CausalState Splitting Reconstru ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
We present a new method for nonlinear prediction of discrete random sequences under minimal structural assumptions. We give a mathematical construction for optimal predictors of such processes, in the form of hidden Markov models. We then describe an algorithm, CSSR (CausalState Splitting Reconstruction), which approximates the ideal predictor from data. We discuss the reliability of CSSR, its data requirements, and its performance in simulations. Finally, we compare our approach to existing methods using variablelength Markov models and crossvalidated hidden Markov models, and show theoretically and experimentally that our method delivers results superior to the former and at least comparable to the latter. 1
The Interactions Between Ergodic Theory and Information Theory
 IEEE Transactions on Information Theory
, 1998
"... Information theorists frequently use the ergodic theorem; likewise entropy concepts are often used in information theory. Recently the two subjects have become partially intertwined as deeper results from each discipline find use in the other. A brief history of this interaction is presented in this ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Information theorists frequently use the ergodic theorem; likewise entropy concepts are often used in information theory. Recently the two subjects have become partially intertwined as deeper results from each discipline find use in the other. A brief history of this interaction is presented in this paper, together with a more detailed look at three areas of connection, namely, recurrence theory, blowingup bounds, and direct samplepath methods.
DYNAMICS OF BAYESIAN UPDATING WITH DEPENDENT DATA AND MISSPECIFIED MODELS
, 2009
"... Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
Recent work on the convergence of posterior distributions under Bayesian updating has established conditions under which the posterior will concentrate on the truth, if the latter has a perfect representation within the support of the prior, and under various dynamical assumptions, such as the data being independent and identically distributed or Markovian. Here I establish sufficient conditions for the convergence of the posterior distribution in nonparametric problems even when all of the hypotheses are wrong, and the datagenerating process has a complicated dependence structure. The main dynamical assumption is the generalized asymptotic equipartition (or “ShannonMcMillanBreiman”) property of information theory. I derive a kind of large deviations principle for the posterior measure, and discuss the advantages of predicting using a combination of models known to be wrong. An appendix sketches connections between the present results and the “replicator dynamics” of evolutionary theory.
Stochastic chains with memory of variable length. Festschrift for Jorma Rissanen, Grünwald et al
 eds), TICSP Series 38:117–133
, 2008
"... Dedicated to Jorma Rissanen on his 75’th birthday Stochastic chains with memory of variable length constitute an interesting family of stochastic chains of infinite order on a finite alphabet. The idea is that for each past, only a finite suffix of the past, called context, is enough to predict the ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Dedicated to Jorma Rissanen on his 75’th birthday Stochastic chains with memory of variable length constitute an interesting family of stochastic chains of infinite order on a finite alphabet. The idea is that for each past, only a finite suffix of the past, called context, is enough to predict the next symbol. These models were first introduced in the information theory literature by Rissanen (1983) as a universal tool to perform data compression. Recently, they have been used to model up scientific data in areas as different as biology, linguistics and music. This paper presents a personal introductory guide to this class of models focusing on the algorithm Context and its rate of convergence. 1
Consistency Of The Bic Order Estimator
, 1999
"... . We announce two results on the problem of estimating the order of a Markov chain from observation of a sample path. First is that the Bayesian Information Criterion (BIC) leads to an almost surely consistent estimator. Second is that the Bayesian minimum description length estimator, of which the ..."
Abstract
 Add to MetaCart
. We announce two results on the problem of estimating the order of a Markov chain from observation of a sample path. First is that the Bayesian Information Criterion (BIC) leads to an almost surely consistent estimator. Second is that the Bayesian minimum description length estimator, of which the BIC estimator is an approximation, fails to be consistent for the uniformly distributed i.i.d. process. A key tool is a strong ratiotypicality result for empirical kblock distributions. Complete proofs are given in the authors' article to appear in The Annals of Statistics. 1. Introduction Let M k denote the class of Markov chains of order at most k, with values drawn from a finite set A, and let M = S 1 k=0 M k . An important problem is to estimate the order of a Markov chain from observation of a finite sample path. A popular method is the socalled Bayesian Information Criterion (BIC), first introduced by Schwarz, [12], which gives the estimator defined by k BIC = k BIC (x n 1 ) ...
Markov
"... approximation and consistent estimation of unbounded probabilistic suffix trees ..."
Abstract
 Add to MetaCart
approximation and consistent estimation of unbounded probabilistic suffix trees
Predictability of User Behavior in Social Media: BottomUp v. TopDown Modeling
"... Abstract—Recent work has attempted to capture the behavior of users on social media by modeling them as computational units processing information. We propose to extend this perspective by explicitly examining the predictive power of such a view. We consider a network of fifteen thousand users on Tw ..."
Abstract
 Add to MetaCart
Abstract—Recent work has attempted to capture the behavior of users on social media by modeling them as computational units processing information. We propose to extend this perspective by explicitly examining the predictive power of such a view. We consider a network of fifteen thousand users on Twitter over a seven week period. To evaluate the predictability of the users, we apply two contrasting modeling paradigms: computational mechanics and echo state networks. Computational mechanics seeks to construct the simplest model with the maximal predictive capability, while echo state networks relax from very complicated dynamics until predictive capability is reached. We demonstrate that the behavior of users on Twitter can be wellmodeled as processes with selffeedback and compare the performance of models built with both the statistical and neural paradigms. I.
Author manuscript, published in "Festschrift in honour of the 75th birthday of Jorma Rissanen (2008) 329463" STOCHASTIC CHAINS WITH MEMORY OF VARIABLE LENGTH
, 2013
"... Abstract. Stochastic chains with memory of variable length constitute an interesting family of stochastic chains of infinite order on a finite alphabet. The idea is that for each past, only a finite suffix of the past, called context, is enough to predict the next symbol. These models were first int ..."
Abstract
 Add to MetaCart
Abstract. Stochastic chains with memory of variable length constitute an interesting family of stochastic chains of infinite order on a finite alphabet. The idea is that for each past, only a finite suffix of the past, called context, is enough to predict the next symbol. These models were first introduced in the information theory literature by Rissanen (1983) as a universal tool to perform data compression. Recently, they have been used to model up scientific data in areas as different as biology, linguistics and music. This paper presents a personal introductory guide to this class of models focusing on the algorithm Context and its rate of convergence. 1.