Results 1 
8 of
8
A Stochastic Memoizer for Sequence Data
"... We propose an unboundeddepth, hierarchical, Bayesian nonparametric model for discrete sequence data. This model can be estimated from a single training sequence, yet shares statistical strength between subsequent symbol predictive distributions in such a way that predictive performance generalizes ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
We propose an unboundeddepth, hierarchical, Bayesian nonparametric model for discrete sequence data. This model can be estimated from a single training sequence, yet shares statistical strength between subsequent symbol predictive distributions in such a way that predictive performance generalizes well. The model builds on a specific parameterization of an unboundeddepth hierarchical PitmanYor process. We introduce analytic marginalization steps (using coagulation operators) to reduce this model to one that can be represented in time and space linear in the length of the training sequence. We show how to perform inference in such a model without truncation approximation and introduce fragmentation operators necessary to do predictive inference. We demonstrate the sequence memoizer by using it as a language model, achieving stateoftheart results. 1.
A VOCABULARYFREE INFINITYGRAM MODEL FOR NONPARAMETRIC BAYESIAN CHORD PROGRESSION ANALYSIS
"... This paper presents probabilistic ngram models for symbolic chord sequences. To overcome the fundamental limitations in conventional models—that the model optimality is not guaranteed, that the value of n is fixed uniquely, and that a vocabulary of chord types (e.g., major, minor, ···)is defined in ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper presents probabilistic ngram models for symbolic chord sequences. To overcome the fundamental limitations in conventional models—that the model optimality is not guaranteed, that the value of n is fixed uniquely, and that a vocabulary of chord types (e.g., major, minor, ···)is defined in an arbitrary way—we propose a vocabularyfree infinitygram model based on Bayesian nonparametrics. It accepts any combinations of notes as chord types and allows each chord appearing in a sequence to have an unbounded and variablelength context. All possibilities of n are taken into account when calculating the predictive probability of a next chord given a particular context, and when an unseen chord type emerges we can avoid outofvocabulary error by adaptively evaluating the 0gram probability, i.e., the combinatorial probability of note components. Our experiments using Beatles songs showed that the predictive performance of the proposed model is better than that of the stateoftheart models and that we could find stochasticallycoherent chord patterns by sorting variablelength ngrams in a line according to their generative probabilities. 1.
PitmanYor ProcessBased Language Models for Machine Translation
"... The hierarchical PitmanYor processbased smoothing method applied to language model was proposed by Goldwater and by Teh; the performance of this smoothing method is shown comparable with the modified KneserNey method in terms of perplexity. Although this method was presented four years ago, there ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
The hierarchical PitmanYor processbased smoothing method applied to language model was proposed by Goldwater and by Teh; the performance of this smoothing method is shown comparable with the modified KneserNey method in terms of perplexity. Although this method was presented four years ago, there has been no paper which reports that this language model indeed improves translation quality in the context of Machine Translation (MT). This is important for the MT community since an improvement in perplexity does not always lead to an improvement in BLEU score; for example, the success of word alignment measured by Alignment Error Rate (AER) does not often lead to an improvement in BLEU. This paper reports in the context of MT that an improvement in perplexity really leads to an improvement in BLEU score. It turned out that an application of the Hierarchical PitmanYor Language Model (HPYLM) requires a minor change in the conventional decoding process. Additionally to this, we propose a new PitmanYor processbased statistical smoothing method similar to the GoodTuring method although the performance of this is inferior to HPYLM. We conducted experiments; HPYLM improved by 1.03 BLEU points absolute and 6 % relative for 50k ENJP, which was statistically significant.
A Bayesian Review of the PoissonDirichlet Process
, 2010
"... The two parameter PoissonDirichlet process is also known as the PitmanYor Process and related to the ChineseRestaurant Process, is a generalisation of the Dirichlet Process, and is increasingly being used for probabilistic modelling in discrete areas such as language and images. This article revie ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The two parameter PoissonDirichlet process is also known as the PitmanYor Process and related to the ChineseRestaurant Process, is a generalisation of the Dirichlet Process, and is increasingly being used for probabilistic modelling in discrete areas such as language and images. This article reviews the theory of the PoissonDirichlet process in terms of its consistency for estimation, the convergence rates and the posteriors of data. This theory has been well developed for continuous distributions (more generally referred to as nonatomic distributions). This article then presents a Bayesian interpretation of the PoissonDirichlet process: it is a mixture using an improper and infinite dimensional Dirichlet distribution. This interpretation requires technicalities of priors, posteriors and Hilbert spaces, but conceptually, this means we can understand the process as just another Dirichlet and thus all its sampling properties fit naturally. Finally, this article also presents results for the discrete case which is the case seeing widespread use now in computer science, but which has received less attention in the literature.
Hierarchical PitmanYor Language Model for Machine Translation
"... Abstract—The hierarchical PitmanYor processbased smoothing method applied to language model was proposed by Goldwater and by Teh; the performance of this smoothing method is shown comparable with the modified KneserNey method in terms of perplexity. Although this method was presented four years a ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract—The hierarchical PitmanYor processbased smoothing method applied to language model was proposed by Goldwater and by Teh; the performance of this smoothing method is shown comparable with the modified KneserNey method in terms of perplexity. Although this method was presented four years ago, there has been no paper which reports that this language model indeed improves translation quality in the context of Machine Translation (MT). This is important for the MT community since an improvement in perplexity does not always lead to an improvement in BLEU score; for example, the success of word alignment measured by Alignment Error Rate (AER) does not often lead to an improvement in BLEU. This paper reports in the context of MT that an improvement in perplexity really leads to an improvement in BLEU score. It turned out that an application of the Hierarchical PitmanYor Language Model (HPYLM) requires a minor change in the conventional decoding process. Additionally to this, we propose a new PitmanYor processbased statistical smoothing method similar to the GoodTuring method although the performance of this is inferior to HPYLM. We conducted experiments; HPYLM improved by 1.03 BLEU points absolute and 6 % relative for 50k EN–JP, which was statistically significant.
Bayesian Networks on Dirichlet Distributed Vectors
"... Exact Bayesian network inference exists for Gaussian and multinomial distributions. For other kinds of distributions, approximations or restrictions on the kind of inference done are needed. In this paper we present generalized networks of Dirichlet distributions, and show how, using the twoparamet ..."
Abstract
 Add to MetaCart
Exact Bayesian network inference exists for Gaussian and multinomial distributions. For other kinds of distributions, approximations or restrictions on the kind of inference done are needed. In this paper we present generalized networks of Dirichlet distributions, and show how, using the twoparameter PoissonDirichlet distribution and Gibbs sampling, one can do approximate inference over them. This involves integrating out the probability vectors but leaving auxiliary discrete count vectors in their place. We illustrate the technique by extending standard topic models to “structured ” documents, where the document structure is given by a Bayesian network of Dirichlets. 1
Bayesian Variable Order Markov Models
"... We present a simple, effective generalisation of variable order Markov models to full online Bayesian estimation. The mechanism used is close to that employed in context tree weighting. The main contribution is the addition of a prior, conditioned on context, on the Markov order. The resulting const ..."
Abstract
 Add to MetaCart
We present a simple, effective generalisation of variable order Markov models to full online Bayesian estimation. The mechanism used is close to that employed in context tree weighting. The main contribution is the addition of a prior, conditioned on context, on the Markov order. The resulting construction uses a simple recursion and can be updated efficiently. This allows the model to make predictions using more complex contexts, as more data is acquired, if necessary. In addition, our model can be alternatively seen as a mixture of tree experts. Experimental results show that the predictive model exhibits consistently good performance in a variety of domains. We consider Bayesian estimation of variable order Markov models (see Begleiter et al., 2004, for an overview). Such models create a tree of partitions, where the disjoint sets of every partition correspond to different contexts. We can associate a submodel or expert with each context in order to make predictions. The main contribution of this paper is a conditional prior on the Markov order—or equivalently the context depth. This is based on a recursive construction that estimates, for each context at a certain depth k, whether it makes better predictions than the predictions of contexts at depths smaller than k. This simple model defines a mixture of variable order Marko models and its parameters can be updated in closed form in time O (D) for trees of depth D with each new observation. For unbounded length contexts, the complexity of the algorithm is O ( T 2) for an input sequence of length T. Furthermore, it exhibits robust performance in a variety of tasks. Finally, the model is easily extensible to controlled processes.
A Unified Probabilistic Model of Note Combinations and Chord Progressions
"... This paper presents a unified simultaneous and sequential model for note combinations and chord progressions. In chord progression analysis, ngram models have often been used for modeling temporal sequences of chord labels (e.g., C major, D minor, and E # seventh). These models require us to specif ..."
Abstract
 Add to MetaCart
This paper presents a unified simultaneous and sequential model for note combinations and chord progressions. In chord progression analysis, ngram models have often been used for modeling temporal sequences of chord labels (e.g., C major, D minor, and E # seventh). These models require us to specify the value of n and define a limited vocabulary of chord labels. On the other hand, our model is designed to directly modeling temporal sequences of note combinations without specifying the value of n, because we aim to use our model as a prior distribution on musical notes in polyphonic music transcription. To do this, we extend a nonparametric Bayesian ngram model that was designed for modeling sequences of words in the field of computational linguistics. More specifically, our model can accept any combinations of notes as chords and allows each chord appearing in a sequence to have an unbounded and variablelength context. All possibilities of n are taken into account when predicting a next chord given precedent chords. Even when an unseen note combination (chord) emerges, we can estimate its ngram probability by referring to its 0gram probability, i.e., the combinatorial probability of note components. We tested our model by using the groundtruth chord annotations and automatic chord recognition results of the Beatles songs. 1