Results 1  10
of
20
Lossless compression based on the Sequence Memoizer
 In Data Compression Conference 2010
, 2010
"... In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of PitmanYor processes of unbounded depth previously proposed by Wood et al. [2009] in the context of language modelling, allows modelling ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of PitmanYor processes of unbounded depth previously proposed by Wood et al. [2009] in the context of language modelling, allows modelling of longrange dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties. 1
A Bayesian Review of the PoissonDirichlet Process
, 2010
"... The two parameter PoissonDirichlet process is also known as the PitmanYor Process and related to the ChineseRestaurant Process, is a generalisation of the Dirichlet Process, and is increasingly being used for probabilistic modelling in discrete areas such as language and images. This article revie ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
The two parameter PoissonDirichlet process is also known as the PitmanYor Process and related to the ChineseRestaurant Process, is a generalisation of the Dirichlet Process, and is increasingly being used for probabilistic modelling in discrete areas such as language and images. This article reviews the theory of the PoissonDirichlet process in terms of its consistency for estimation, the convergence rates and the posteriors of data. This theory has been well developed for continuous distributions (more generally referred to as nonatomic distributions). This article then presents a Bayesian interpretation of the PoissonDirichlet process: it is a mixture using an improper and infinite dimensional Dirichlet distribution. This interpretation requires technicalities of priors, posteriors and Hilbert spaces, but conceptually, this means we can understand the process as just another Dirichlet and thus all its sampling properties fit naturally. Finally, this article also presents results for the discrete case which is the case seeing widespread use now in computer science, but which has received less attention in the literature.
Sampling table configurations for the hierarchical PoissonDirichlet process
 In ECML. 2011
"... Abstract. Hierarchical modeling and reasoning are fundamental in machine intelligence, and for this the twoparameter PoissonDirichlet Process (PDP) plays an important role. The most popular MCMC sampling algorithm for the hierarchical PDP and hierarchical Dirichlet Process is to conduct an incre ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Hierarchical modeling and reasoning are fundamental in machine intelligence, and for this the twoparameter PoissonDirichlet Process (PDP) plays an important role. The most popular MCMC sampling algorithm for the hierarchical PDP and hierarchical Dirichlet Process is to conduct an incremental sampling based on the Chinese restaurant metaphor, which originates from the Chinese restaurant process (CRP). In this paper, with the same metaphor, we propose a new table representation for the hierarchical PDPs by introducing an auxiliary latent variable, called table indicator, to record which customer takes responsibility for starting a new table. In this way, the new representation allows full exchangeability that is an essential condition for a correct Gibbs sampling algorithm. Based on this representation, we develop a block Gibbs sampling algorithm, which can jointly sample the data item and its table contribution. We test this out on the hierarchical Dirichlet process variant of latent Dirichlet allocation (HDPLDA) developed by Teh, Jordan, Beal and Blei. Experiment results show that the proposed algorithm outperforms their “posterior sampling by direct assignment” algorithm in both outofsample perplexity and convergence speed. The representation can be used with many other hierarchical PDP models.
Characterizations of exchangeable partitions and random discrete distributions by deletion properties
, 2009
"... We prove a longstanding conjecture which characterises the EwensPitman twoparameter family of exchangeable random partitions, plus a short list of limit and exceptional cases, by the following property: for each n = 2,3,..., if one of n individuals is chosen uniformly at random, independently of ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
We prove a longstanding conjecture which characterises the EwensPitman twoparameter family of exchangeable random partitions, plus a short list of limit and exceptional cases, by the following property: for each n = 2,3,..., if one of n individuals is chosen uniformly at random, independently of the random partition πn of these individuals into various types, and all individuals of the same type as the chosen individual are deleted, then for each r> 0, given that r individuals remain, these individuals are partitioned according to π ′ r for some sequence of random partitions (π ′ r) that does not depend on n or r. An analogous result characterizes the associated PoissonDirichlet family of random discrete distributions by an independence property related to random deletion of a frequency chosen by a sizebiased pick. We also survey the regenerative properties of members of the twoparameter family, and settle a question regarding the explicit arrangement of intervals with lengths given by the terms of the PoissonDirichlet random sequence into the interval partition induced by the range of a neutraltothe right process.
Structured Penalties for Loglinear Language Models
"... Language models can be formalized as loglinear regression models where the input features represent previously observed contexts up to a certain length m. The complexity of existing algorithms to learn the parameters by maximum likelihood scale linearly in nd, where n is the length of the training c ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Language models can be formalized as loglinear regression models where the input features represent previously observed contexts up to a certain length m. The complexity of existing algorithms to learn the parameters by maximum likelihood scale linearly in nd, where n is the length of the training corpus and d is the number of observed features. We present a model that grows logarithmically in d, making it possible to efficiently leverage longer contexts. We account for the sequential structure of natural language using treestructured penalized objectives to avoid overfitting and achieve better generalization. 1
Improvements to the Sequence Memoizer
"... The sequence memoizer is a model for sequence data with stateoftheart performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memoryefficient representation, and inference algori ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
The sequence memoizer is a model for sequence data with stateoftheart performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memoryefficient representation, and inference algorithms operating on the new representation. Our derivations are based on precise definitions of the various processes that will also allow us to provide an elementary proof of the “mysterious ” coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements. 1
Probabilistic Deterministic Infinite Automata
"... We propose a novel Bayesian nonparametric approach to learning with probabilistic deterministic finite automata (PDFA). We define and develop a sampler for a PDFA with an infinite number of states which we call the probabilistic deterministic infinite automata (PDIA). Posterior predictive inference ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We propose a novel Bayesian nonparametric approach to learning with probabilistic deterministic finite automata (PDFA). We define and develop a sampler for a PDFA with an infinite number of states which we call the probabilistic deterministic infinite automata (PDIA). Posterior predictive inference in this model, given a finite training sequence, can be interpreted as averaging over multiple PDFAs of varying structure, where each PDFA is biased towards having few states. We suggest that our method for averaging over PDFAs is a novel approach to predictive distribution smoothing. We test PDIA inference both on PDFA structure learning and on both natural language and DNA data prediction tasks. The results suggest that the PDIA presents an attractive compromise between the computational cost of hidden Markov models and the storage requirements of hierarchically smoothed Markov models. 1
Gatsby Computational
"... Probabilistic models of sequences play a central role in most machine translation, automated speech recognition, lossless compression, spellchecking, and gene identification applications to name but a few. Unfortunately, realworld sequence data often exhibit long range dependencies which can only ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Probabilistic models of sequences play a central role in most machine translation, automated speech recognition, lossless compression, spellchecking, and gene identification applications to name but a few. Unfortunately, realworld sequence data often exhibit long range dependencies which can only be captured by computationally challenging, complex models. Sequence data arising from natural processes also often exhibit powerlaw properties, yet common sequence models do not capture such properties. The sequence memoizer is a new hierarchical Bayesian model for discrete sequence data that captures long range dependencies and powerlaw characteristics while remaining computationally attractive. Its utility as a language model and general purpose lossless compressor is demonstrated. 1.
Ensemble Prediction by Partial Matching
"... Prediction by Partial Matching (PPM) is a lossless compression algorithm which consistently performs well on text compression benchmarks. This paper introduces a new PPM implementation called PPMEns which uses unbounded context lengths and ensemble voting to combine multiple contexts. The algorithm ..."
Abstract
 Add to MetaCart
(Show Context)
Prediction by Partial Matching (PPM) is a lossless compression algorithm which consistently performs well on text compression benchmarks. This paper introduces a new PPM implementation called PPMEns which uses unbounded context lengths and ensemble voting to combine multiple contexts. The algorithm is evaluated on the Calgary corpus. The results indicate that combining multiple contexts leads to an improvement in the compression performance of PPMEns, although it does not outperform state of the art compression techniques. 1
Gatsby Unit
"... Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics. Random forests achieve competitive predictive performance and are computationally efficient to train and test, making them excell ..."
Abstract
 Add to MetaCart
Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics. Random forests achieve competitive predictive performance and are computationally efficient to train and test, making them excellent candidates for realworld prediction tasks. The most popular random forest variants (such as Breiman’s random forest and extremely randomized trees) operate on batches of training data. Online methods are now in greater demand. Existing online random forests, however, require more training data than their batch counterpart to achieve comparable predictive performance. In this work, we use Mondrian processes (Roy and Teh, 2009) to construct ensembles of random decision trees we call Mondrian forests. Mondrian forests can be grown in an incremental/online fashion and remarkably, the distribution of online Mondrian forests is the same as that of batch Mondrian forests. Mondrian forests achieve competitive predictive performance comparable with existing online random forests and periodically retrained batch random forests, while being more than an order of magnitude faster, thus representing a better computation vs accuracy tradeoff. 1