Results 1  10
of
33
A bayesian framework for word segmentation: Exploring the effects of context
 In 46th Annual Meeting of the ACL
, 2009
"... Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of differen ..."
Abstract

Cited by 110 (30 self)
 Add to MetaCart
Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of different assumptions the learner might make regarding the nature of words – in particular, how these assumptions affect the kinds of words that are segmented from a corpus of transcribed childdirected speech. We develop several models within a Bayesian ideal observer framework, and use them to examine the consequences of assuming either that words are independent units, or units that help to predict other units. We show through empirical and theoretical results that the assumption of independence causes the learner to undersegment the corpus, with many two and threeword sequences (e.g. what’s that, do you, in the house) misidentified as individual words. In contrast, when the learner assumes that words are predictive, the resulting segmentation is far more accurate. These results indicate that taking context into account is important for a statistical word segmentation strategy to be successful, and raise the possibility that even young infants may be able to exploit more subtle statistical patterns than have usually been considered. 1
A unified local and global model for discourse coherence
 In Proceedings of HLTNAACL
, 2007
"... We present a model for discourse coherence which combines the local entitybased approach of (Barzilay and Lapata, 2005) and the HMMbased content model ..."
Abstract

Cited by 19 (1 self)
 Add to MetaCart
We present a model for discourse coherence which combines the local entitybased approach of (Barzilay and Lapata, 2005) and the HMMbased content model
S.: Streambased joint explorationexploitation active learning
 In: CVPR (2012
"... 1 st table 2 nd table k th table New table n1 n + ↵ 5 ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
(Show Context)
1 st table 2 nd table k th table New table n1 n + ↵ 5
A VocabularyFree Infinitygram Model For Nonparametric Bayesian Chord Progression Analysis
 In Proceedings of the 12th International Conference on Music Information Retrieval
, 2011
"... This paper presents probabilistic ngram models for symbolic chord sequences. To overcome the fundamental limitations in conventional models—that the model optimality is not guaranteed, that the value of n is fixed uniquely, and that a vocabulary of chord types (e.g., major, minor, ···)is defined in ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
This paper presents probabilistic ngram models for symbolic chord sequences. To overcome the fundamental limitations in conventional models—that the model optimality is not guaranteed, that the value of n is fixed uniquely, and that a vocabulary of chord types (e.g., major, minor, ···)is defined in an arbitrary way—we propose a vocabularyfree infinitygram model based on Bayesian nonparametrics. It accepts any combinations of notes as chord types and allows each chord appearing in a sequence to have an unbounded and variablelength context. All possibilities of n are taken into account when calculating the predictive probability of a next chord given a particular context, and when an unseen chord type emerges we can avoid outofvocabulary error by adaptively evaluating the 0gram probability, i.e., the combinatorial probability of note components. Our experiments using Beatles songs showed that the predictive performance of the proposed model is better than that of the stateoftheart models and that we could find stochasticallycoherent chord patterns by sorting variablelength ngrams in a line according to their generative probabilities. 1.
A look at parsing and its applications
 In Proceedings of the TwentyFirst National Conference on Artificial Intelligence (AAAI06
, 2006
"... This paper provides a brief introduction to recent work in statistical parsing and its applications. We highlight successes to date, remaining challenges, and promising future work. ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
This paper provides a brief introduction to recent work in statistical parsing and its applications. We highlight successes to date, remaining challenges, and promising future work.
Sampling table configurations for the hierarchical PoissonDirichlet process
 In ECML. 2011
"... Abstract. Hierarchical modeling and reasoning are fundamental in machine intelligence, and for this the twoparameter PoissonDirichlet Process (PDP) plays an important role. The most popular MCMC sampling algorithm for the hierarchical PDP and hierarchical Dirichlet Process is to conduct an incre ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
Abstract. Hierarchical modeling and reasoning are fundamental in machine intelligence, and for this the twoparameter PoissonDirichlet Process (PDP) plays an important role. The most popular MCMC sampling algorithm for the hierarchical PDP and hierarchical Dirichlet Process is to conduct an incremental sampling based on the Chinese restaurant metaphor, which originates from the Chinese restaurant process (CRP). In this paper, with the same metaphor, we propose a new table representation for the hierarchical PDPs by introducing an auxiliary latent variable, called table indicator, to record which customer takes responsibility for starting a new table. In this way, the new representation allows full exchangeability that is an essential condition for a correct Gibbs sampling algorithm. Based on this representation, we develop a block Gibbs sampling algorithm, which can jointly sample the data item and its table contribution. We test this out on the hierarchical Dirichlet process variant of latent Dirichlet allocation (HDPLDA) developed by Teh, Jordan, Beal and Blei. Experiment results show that the proposed algorithm outperforms their “posterior sampling by direct assignment” algorithm in both outofsample perplexity and convergence speed. The representation can be used with many other hierarchical PDP models.
Topic Segmentation with a Structured Topic Model
"... We present a new hierarchical Bayesian model for unsupervised topic segmentation. This new model integrates a pointwise boundary sampling algorithm used in Bayesian segmentation into a structured topic model that can capture a simple hierarchical topic structure latent in documents. We develop an M ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
(Show Context)
We present a new hierarchical Bayesian model for unsupervised topic segmentation. This new model integrates a pointwise boundary sampling algorithm used in Bayesian segmentation into a structured topic model that can capture a simple hierarchical topic structure latent in documents. We develop an MCMC inference algorithm to split/merge segment(s). Experimental results show that our model outperforms previous unsupervised segmentation methods using only lexical information on Choi’s datasets and two meeting transcripts and has performance comparable to those previous methods on two written datasets. 1
Hierarchical Bayesian Language Models for Conversational Speech Recognition
"... Abstract—Traditionalgram language models are widely used in stateoftheart large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfitting of maximumlikelihood estimation and the lack of rich contextual knowledge sources. In this paper, we exploi ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Traditionalgram language models are widely used in stateoftheart large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfitting of maximumlikelihood estimation and the lack of rich contextual knowledge sources. In this paper, we exploit a hierarchical Bayesian interpretation for language modeling, based on a nonparametric prior called Pitman–Yor process. This offers a principled approach to language model smoothing, embedding the powerlaw distribution for natural language. Experiments on the recognition of conversational speech in multiparty meetings demonstrate that by using hierarchical Bayesian language models, we are able to achieve significant reductions in perplexity and word error rate. Index Terms—AMI corpus, conversational speech recognition, hierarchical Bayesian model, language model (LM), meetings, smoothing. I.
Unsupervised Word Discovery from Phonetic Input Using Nested PitmanYor Language Modeling
"... Abstract — In this paper we consider the unsupervised word discovery from phonetic input. We employ a word segmentation algorithm which simultaneously develops a lexicon, i.e., the transcription of a word in terms of a phone sequence, learns a ngram language model describing word and word sequence ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper we consider the unsupervised word discovery from phonetic input. We employ a word segmentation algorithm which simultaneously develops a lexicon, i.e., the transcription of a word in terms of a phone sequence, learns a ngram language model describing word and word sequence probabilities, and carries out the segmentation itself. The underlying statistical model is that of a PitmanYor process, a concept known from Bayesian nonparametrics, which allows for an a priori unknown and unlimited number of different words. Using a hierarchy of PitmanYor processes, language models of different order can be employed and nesting it with another hierarchy of PitmanYor processes on the phone level allows for backing off unknown word unigrams by phone mgrams. We present results on a largevocabulary task, assuming an errorfree phone sequence is given. We finish by discussing options how to cope with noisy phone sequences. I.
Differential topic models
 In IEEE Pattern Analysis and Machine Intelligence
, 2014
"... Abstract—In applications we may want to compare different document collections: they could have shared content but also different and unique aspects in particular collections. This task has been called comparative text mining or crosscollection modeling. We present a differential topic model for th ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract—In applications we may want to compare different document collections: they could have shared content but also different and unique aspects in particular collections. This task has been called comparative text mining or crosscollection modeling. We present a differential topic model for this application that models both topic differences and similarities. For this we use hierarchical Bayesian nonparametric models. Moreover, we found it was important to properly model powerlaw phenomena in topicword distributions and thus we used the full PitmanYor process rather than just a Dirichlet process. Furthermore, we propose the transformed PitmanYor process (TPYP) to incorporate prior knowledge such as vocabulary variations in different collections into the model. To deal with the nonconjugate issue between model prior and likelihood in the TPYP, we thus propose an efficient sampling algorithm using a data augmentation technique based on the multinomial theorem. Experimental results show the model discovers interesting aspects of different collections. We also show the proposed MCMC based algorithm achieves a dramatically reduced test perplexity compared to some existing topic models. Finally, we show our model outperforms the stateoftheart for document classification/ideology prediction on a number of text collections. Index Terms—Differential topic model, transformed PitmanYor process, MCMC, data augmentation Ç