## Bayesian Variable Order Markov Models

### BibTeX

@MISC{Dimitrakakis_bayesianvariable,

author = {Christos Dimitrakakis},

title = {Bayesian Variable Order Markov Models},

year = {}

}

### OpenURL

### Abstract

We present a simple, effective generalisation of variable order Markov models to full online Bayesian estimation. The mechanism used is close to that employed in context tree weighting. The main contribution is the addition of a prior, conditioned on context, on the Markov order. The resulting construction uses a simple recursion and can be updated efficiently. This allows the model to make predictions using more complex contexts, as more data is acquired, if necessary. In addition, our model can be alternatively seen as a mixture of tree experts. Experimental results show that the predictive model exhibits consistently good performance in a variety of domains. We consider Bayesian estimation of variable order Markov models (see Begleiter et al., 2004, for an overview). Such models create a tree of partitions, where the disjoint sets of every partition correspond to different contexts. We can associate a sub-model or expert with each context in order to make predictions. The main contribution of this paper is a conditional prior on the Markov order—or equivalently the context depth. This is based on a recursive construction that estimates, for each context at a certain depth k, whether it makes better predictions than the predictions of contexts at depths smaller than k. This simple model defines a mixture of variable order Marko models and its parameters can be updated in closed form in time O (D) for trees of depth D with each new observation. For unbounded length contexts, the complexity of the algorithm is O ( T 2) for an input sequence of length T. Furthermore, it exhibits robust performance in a variety of tasks. Finally, the model is easily extensible to controlled processes.

### Citations

514 | Factorial hidden markov models
- Ghahramani, Jordan
- 1997
(Show Context)
Citation Context ... which will be next visited. This allows closed form updates, but lacks the additional expressiveness possible with BVMMs. Dirichlet processes are also used in the infinite hidden Markov model (IHMM, =-=Beal et al., 2001-=-) and the infinite Markov model (IMM, Mochihashi and Sumita, 2008). In particular, the IMM uses a similar structure, with the difference that a Beta prior on the stopping variable s is used. Inference... |

356 | Data compression using adaptive coding and partial string matching
- Cleary, Witten
- 1984
(Show Context)
Citation Context ...t to the adaptation of the experts µ, when those are Dirichlet-multinomial. While CTW uses a closed-form update, the weights used in CTW are fixed. The prediction by partial matching (PPM) algorithm (=-=Cleary and Witten, 1984-=-), includes a closed-form weight update, which is however ad-hoc (Begleiter et al., 2004, p.392). Other variants are examined in (Begleiter et al., 2004) which in addition supplies an experimental com... |

217 | Being Bayesian about network structure - Friedman, Koller - 2003 |

174 |
Prior distributions on spaces of probability measures
- Ferguson
- 1974
(Show Context)
Citation Context ...perts. A low regret prediction algorithm for such models is given in (CesaBianchi and Lugosi, 2006, ch. 5.3). Dirichlet process models. An important class of priors over distriutions are Polya trees (=-=Ferguson, 1974-=-). Just as in BVMMs, a distribution is defined over a partition tree. However, there is only one set of parameters for each node, which relates to the child node which will be next visited. This allow... |

117 | Learning with mixtures of trees - Meila, Jordan |

96 | Variable length Markov chains
- BÜHLMANN, WYNER
- 1999
(Show Context)
Citation Context ...., 2004, p.392). Other variants are examined in (Begleiter et al., 2004) which in addition supplies an experimental comparison between methods. A final related model is Variable length Markov chains (=-=Bühlmann and Wyner, 1999-=-) (henceforth VMC), which however utilises growing and subsequent pruning of the context tree. It is thus a batch (offline) algorithm. Tree experts. A tree expert is a collection of a finite number of... |

61 | On prediction using variable order Markov models
- Ronbeg, Yona
(Show Context)
Citation Context ... tree experts. Experimental results show that the predictive model exhibits consistently good performance in a variety of domains. We consider Bayesian estimation of variable order Markov models (see =-=Begleiter et al., 2004-=-, for an overview). Such models create a tree of partitions, where the disjoint sets of every partition correspond to different contexts. We can associate a sub-model or expert with each context in or... |

25 | Bayes-adaptive POMDPs
- Ross, Chaib-draa, et al.
- 2008
(Show Context)
Citation Context ...he model can be extended to controlled processes. In particular, it may be an effective Bayesian model for near-optimal decision making in unknown partially observable Markov decision processes (i.e. =-=Ross et al., 2008-=-). Since BVMMs are able to provide good predictions, as well as easily computable closed-form posteriors, they are an excellent candidate for planning under uncertainty in such domains (Dimitrakakis, ... |

11 | Lossless compression based on the Sequence Memoizer
- Gasthaus, Wood, et al.
- 2010
(Show Context)
Citation Context ...he calgary corpus is shown in Table 1, which in addition shows results for CTW and SM 3 . Both the CTW and SM algorithm enjoy an advantage of 0.15−0.25 bits/symbol on average 3 Results obtained from (=-=Gasthaus et al., 2010-=-) 166Christos Dimitrakakis -2 -2.2 -2.4 -2.4 -2.6 -2.6 -2.8 L -2.8 L -3 -3 -3.2 -3.2 -3.4 -3.4 -3.6 BMCM BVMM PPM 2 4 6 8 10 12 14 16 D (a) The bib dataset -3.6 BMCM BVMM PPM 2 4 6 8 10 12 14 16 D (b... |

8 | The infinite Markov model
- Mochihashi, Sumita
- 2007
(Show Context)
Citation Context ...dates, but lacks the additional expressiveness possible with BVMMs. Dirichlet processes are also used in the infinite hidden Markov model (IHMM, Beal et al., 2001) and the infinite Markov model (IMM, =-=Mochihashi and Sumita, 2008-=-). In particular, the IMM uses a similar structure, with the difference that a Beta prior on the stopping variable s is used. Inference in both of these models requires sampling instead. Thus, as long... |

1 | Variable order Markov decision processes: Exact Bayesian inference with an application to POMDPs. Submitted - Dimitrakakis - 2010 |

1 | Stationary autoregressive models via a bayesian nonparametric approach - Mena, Walker - 2005 |