Results 1  10
of
11
Lossless compression based on the Sequence Memoizer
 In Data Compression Conference 2010
, 2010
"... In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of PitmanYor processes of unbounded depth previously proposed by Wood et al. [2009] in the context of language modelling, allows modelling ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of PitmanYor processes of unbounded depth previously proposed by Wood et al. [2009] in the context of language modelling, allows modelling of longrange dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties. 1
Characterizations of exchangeable partitions and random discrete distributions by deletion properties
, 2009
"... We prove a longstanding conjecture which characterises the EwensPitman twoparameter family of exchangeable random partitions, plus a short list of limit and exceptional cases, by the following property: for each n = 2,3,..., if one of n individuals is chosen uniformly at random, independently of ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We prove a longstanding conjecture which characterises the EwensPitman twoparameter family of exchangeable random partitions, plus a short list of limit and exceptional cases, by the following property: for each n = 2,3,..., if one of n individuals is chosen uniformly at random, independently of the random partition πn of these individuals into various types, and all individuals of the same type as the chosen individual are deleted, then for each r> 0, given that r individuals remain, these individuals are partitioned according to π ′ r for some sequence of random partitions (π ′ r) that does not depend on n or r. An analogous result characterizes the associated PoissonDirichlet family of random discrete distributions by an independence property related to random deletion of a frequency chosen by a sizebiased pick. We also survey the regenerative properties of members of the twoparameter family, and settle a question regarding the explicit arrangement of intervals with lengths given by the terms of the PoissonDirichlet random sequence into the interval partition induced by the range of a neutraltothe right process.
Improvements to the Sequence Memoizer
"... The sequence memoizer is a model for sequence data with stateoftheart performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memoryefficient representation, and inference algori ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
The sequence memoizer is a model for sequence data with stateoftheart performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memoryefficient representation, and inference algorithms operating on the new representation. Our derivations are based on precise definitions of the various processes that will also allow us to provide an elementary proof of the “mysterious ” coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements. 1
A Bayesian Review of the PoissonDirichlet Process
, 2010
"... The two parameter PoissonDirichlet process is also known as the PitmanYor Process and related to the ChineseRestaurant Process, is a generalisation of the Dirichlet Process, and is increasingly being used for probabilistic modelling in discrete areas such as language and images. This article revie ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The two parameter PoissonDirichlet process is also known as the PitmanYor Process and related to the ChineseRestaurant Process, is a generalisation of the Dirichlet Process, and is increasingly being used for probabilistic modelling in discrete areas such as language and images. This article reviews the theory of the PoissonDirichlet process in terms of its consistency for estimation, the convergence rates and the posteriors of data. This theory has been well developed for continuous distributions (more generally referred to as nonatomic distributions). This article then presents a Bayesian interpretation of the PoissonDirichlet process: it is a mixture using an improper and infinite dimensional Dirichlet distribution. This interpretation requires technicalities of priors, posteriors and Hilbert spaces, but conceptually, this means we can understand the process as just another Dirichlet and thus all its sampling properties fit naturally. Finally, this article also presents results for the discrete case which is the case seeing widespread use now in computer science, but which has received less attention in the literature.
Ensemble Prediction by Partial Matching
"... Prediction by Partial Matching (PPM) is a lossless compression algorithm which consistently performs well on text compression benchmarks. This paper introduces a new PPM implementation called PPMEns which uses unbounded context lengths and ensemble voting to combine multiple contexts. The algorithm ..."
Abstract
 Add to MetaCart
Prediction by Partial Matching (PPM) is a lossless compression algorithm which consistently performs well on text compression benchmarks. This paper introduces a new PPM implementation called PPMEns which uses unbounded context lengths and ensemble voting to combine multiple contexts. The algorithm is evaluated on the Calgary corpus. The results indicate that combining multiple contexts leads to an improvement in the compression performance of PPMEns, although it does not outperform state of the art compression techniques. 1
Probabilistic Deterministic Infinite Automata
"... We propose a novel Bayesian nonparametric approach to learning with probabilistic deterministic finite automata (PDFA). We define and develop a sampler for a PDFA with an infinite number of states which we call the probabilistic deterministic infinite automata (PDIA). Posterior predictive inference ..."
Abstract
 Add to MetaCart
We propose a novel Bayesian nonparametric approach to learning with probabilistic deterministic finite automata (PDFA). We define and develop a sampler for a PDFA with an infinite number of states which we call the probabilistic deterministic infinite automata (PDIA). Posterior predictive inference in this model, given a finite training sequence, can be interpreted as averaging over multiple PDFAs of varying structure, where each PDFA is biased towards having few states. We suggest that our method for averaging over PDFAs is a novel approach to predictive distribution smoothing. We test PDIA inference both on PDFA structure learning and on both natural language and DNA data prediction tasks. The results suggest that the PDIA presents an attractive compromise between the computational cost of hidden Markov models and the storage requirements of hierarchically smoothed Markov models. 1
Gatsby Computational
"... Probabilistic models of sequences play a central role in most machine translation, automated speech recognition, lossless compression, spellchecking, and gene identification applications to name but a few. Unfortunately, realworld sequence data often exhibit long range dependencies which can only ..."
Abstract
 Add to MetaCart
Probabilistic models of sequences play a central role in most machine translation, automated speech recognition, lossless compression, spellchecking, and gene identification applications to name but a few. Unfortunately, realworld sequence data often exhibit long range dependencies which can only be captured by computationally challenging, complex models. Sequence data arising from natural processes also often exhibit powerlaw properties, yet common sequence models do not capture such properties. The sequence memoizer is a new hierarchical Bayesian model for discrete sequence data that captures long range dependencies and powerlaw characteristics while remaining computationally attractive. Its utility as a language model and general purpose lossless compressor is demonstrated. 1.
Beyond Bayesians and Frequentists
, 2012
"... If you are a newly initiated student into the field of machine learning, it won’t be long before you start hearing the words “Bayesian ” and “frequentist ” thrown around. Many people around you probably have strong opinions on which is the “right ” way to do statistics, and within a year you’ve prob ..."
Abstract
 Add to MetaCart
If you are a newly initiated student into the field of machine learning, it won’t be long before you start hearing the words “Bayesian ” and “frequentist ” thrown around. Many people around you probably have strong opinions on which is the “right ” way to do statistics, and within a year you’ve probably developed your own strong opinions (which are suspiciously similar to those of the people around you, despite there being a much greater variance of opinion between different labs). In fact, now that the year is 2012 the majority of new graduate students are being raised as Bayesians (at least in the U.S.) with frequentists thought of as stodgy emeritus professors stuck in their ways. If you are like me, the preceding set of facts will make you very uneasy. They will make you uneasy because simple patternmatching – the strength of people’s opinions, the reliability with which these opinions split along age boundaries and lab boundaries, and the ridicule that each side levels at the other camp makes the “Bayesians vs. frequentists ” debate look far more like politics than like scholarly discourse. Of course, that alone does not necessarily prove anything; these disconcerting similarities could just be coincidences that I happened to cherrypick. My next point, then, is that we are right to be uneasy, because such debate makes us less likely to evaluate the strengths and weaknesses of both approaches in good faith. This essay is a push against that — I summarize the justifications for Bayesian methods and where they fall short, show how frequentist approaches can fill in some of their shortcomings, and then present my personal (though probably woefully underinformed) guidelines for choosing which type of approach to use. Before doing any of this, though, a bit of background is in order...
Modeling Streaming Data In the Absence of Sufficiency
"... We interpret results from a study where data was modeled using constant space approximations to the sequence memoizer. The sequence memoizer (SM) is a nonconstantspace, Bayesian nonparametric model in which the data are the sufficient statistic in the streaming setting. We review approximations to ..."
Abstract
 Add to MetaCart
We interpret results from a study where data was modeled using constant space approximations to the sequence memoizer. The sequence memoizer (SM) is a nonconstantspace, Bayesian nonparametric model in which the data are the sufficient statistic in the streaming setting. We review approximations to the probabilistic model underpinning the SM that yield the computational asymptotic complexities necessary for modeling very large (streaming) datasets with fixed computational resource. Results from modeling a benchmark corpus are shown for both the effectively parametric, approximate models and the fully nonparametric SM. We find that the approximations perform nearly as well in terms of predictive likelihood. We argue from this single example that, due to the lack of sufficiency, Bayesian nonparametric models may, in general, not be suitable as models of streaming data, and propose that nonstationary parametric models and estimators for the same inspired by Bayesian nonparametric models may be worth investigating more fully. 1
Structured Penalties for Loglinear Language Models
"... Language models can be formalized as loglinear regression models where the input features represent previously observed contexts up to a certain length m. The complexity of existing algorithms to learn the parameters by maximum likelihood scale linearly in nd, where n is the length of the training c ..."
Abstract
 Add to MetaCart
Language models can be formalized as loglinear regression models where the input features represent previously observed contexts up to a certain length m. The complexity of existing algorithms to learn the parameters by maximum likelihood scale linearly in nd, where n is the length of the training corpus and d is the number of observed features. We present a model that grows logarithmically in d, making it possible to efficiently leverage longer contexts. We account for the sequential structure of natural language using treestructured penalized objectives to avoid overfitting and achieve better generalization. 1