Results 1  10
of
16
A bayesian framework for word segmentation: Exploring the effects of context
 In 46th Annual Meeting of the ACL
, 2009
"... Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of differen ..."
Abstract

Cited by 50 (11 self)
 Add to MetaCart
Since the experiments of Saffran et al. (1996a), there has been a great deal of interest in the question of how statistical regularities in the speech stream might be used by infants to begin to identify individual words. In this work, we use computational modeling to explore the effects of different assumptions the learner might make regarding the nature of words – in particular, how these assumptions affect the kinds of words that are segmented from a corpus of transcribed childdirected speech. We develop several models within a Bayesian ideal observer framework, and use them to examine the consequences of assuming either that words are independent units, or units that help to predict other units. We show through empirical and theoretical results that the assumption of independence causes the learner to undersegment the corpus, with many two and threeword sequences (e.g. what’s that, do you, in the house) misidentified as individual words. In contrast, when the learner assumes that words are predictive, the resulting segmentation is far more accurate. These results indicate that taking context into account is important for a statistical word segmentation strategy to be successful, and raise the possibility that even young infants may be able to exploit more subtle statistical patterns than have usually been considered. 1
A unified local and global model for discourse coherence
 In Proceedings of HLTNAACL
, 2007
"... We present a model for discourse coherence which combines the local entitybased approach of (Barzilay and Lapata, 2005) and the HMMbased content model ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
We present a model for discourse coherence which combines the local entitybased approach of (Barzilay and Lapata, 2005) and the HMMbased content model
A Look at Parsing and Its Applications
 In Proceedings of the TwentyFirst National Conference on Artificial Intelligence (AAAI06
, 2006
"... This paper provides a brief introduction to recent work in statistical parsing and its applications. We highlight successes to date, remaining challenges, and promising future work. ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
This paper provides a brief introduction to recent work in statistical parsing and its applications. We highlight successes to date, remaining challenges, and promising future work.
A VOCABULARYFREE INFINITYGRAM MODEL FOR NONPARAMETRIC BAYESIAN CHORD PROGRESSION ANALYSIS
"... This paper presents probabilistic ngram models for symbolic chord sequences. To overcome the fundamental limitations in conventional models—that the model optimality is not guaranteed, that the value of n is fixed uniquely, and that a vocabulary of chord types (e.g., major, minor, ···)is defined in ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
This paper presents probabilistic ngram models for symbolic chord sequences. To overcome the fundamental limitations in conventional models—that the model optimality is not guaranteed, that the value of n is fixed uniquely, and that a vocabulary of chord types (e.g., major, minor, ···)is defined in an arbitrary way—we propose a vocabularyfree infinitygram model based on Bayesian nonparametrics. It accepts any combinations of notes as chord types and allows each chord appearing in a sequence to have an unbounded and variablelength context. All possibilities of n are taken into account when calculating the predictive probability of a next chord given a particular context, and when an unseen chord type emerges we can avoid outofvocabulary error by adaptively evaluating the 0gram probability, i.e., the combinatorial probability of note components. Our experiments using Beatles songs showed that the predictive performance of the proposed model is better than that of the stateoftheart models and that we could find stochasticallycoherent chord patterns by sorting variablelength ngrams in a line according to their generative probabilities. 1.
Hierarchical Bayesian Language Models for Conversational Speech Recognition
"... Abstract—Traditionalgram language models are widely used in stateoftheart large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfitting of maximumlikelihood estimation and the lack of rich contextual knowledge sources. In this paper, we exploi ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Abstract—Traditionalgram language models are widely used in stateoftheart large vocabulary speech recognition systems. This simple model suffers from some limitations, such as overfitting of maximumlikelihood estimation and the lack of rich contextual knowledge sources. In this paper, we exploit a hierarchical Bayesian interpretation for language modeling, based on a nonparametric prior called Pitman–Yor process. This offers a principled approach to language model smoothing, embedding the powerlaw distribution for natural language. Experiments on the recognition of conversational speech in multiparty meetings demonstrate that by using hierarchical Bayesian language models, we are able to achieve significant reductions in perplexity and word error rate. Index Terms—AMI corpus, conversational speech recognition, hierarchical Bayesian model, language model (LM), meetings, smoothing. I.
Improvements to the Sequence Memoizer
"... The sequence memoizer is a model for sequence data with stateoftheart performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memoryefficient representation, and inference algori ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
The sequence memoizer is a model for sequence data with stateoftheart performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memoryefficient representation, and inference algorithms operating on the new representation. Our derivations are based on precise definitions of the various processes that will also allow us to provide an elementary proof of the “mysterious ” coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements. 1
S.: Streambased joint explorationexploitation active learning
 In: CVPR (2012
"... 1 st table 2 nd table k th table New table n1 n + ↵ 5 ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
1 st table 2 nd table k th table New table n1 n + ↵ 5
Nonparametric Bayesian methods for extracting structure from data
, 2008
"... One desirable property of machine learning algorithms is the ability to balance the number of parameters in a model in accordance with the amount of available data. Incorporating nonparametric Bayesian priors into models is one approach of automatically adjusting model capacity to the amount of avai ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
One desirable property of machine learning algorithms is the ability to balance the number of parameters in a model in accordance with the amount of available data. Incorporating nonparametric Bayesian priors into models is one approach of automatically adjusting model capacity to the amount of available data: with small datasets, models are less complex (require storing fewer parameters in memory), whereas with larger datasets, models are implicitly more complex (require storing more parameters in memory). Thus, nonparametric Bayesian priors satisfy frequentist intuitions about model complexity within a fully Bayesian framework. This thesis presents several novel machine learning models and applications that use nonparametric Bayesian priors. We introduce two novel models that use flat, Dirichlet process priors. The first is an infinite mixture of experts model, which builds a fully generative, joint density model of the input and output space. The second is a Bayesian biclustering model, which simultaneously organizes a data matrix into blockconstant biclusters. The model capable of efficiently processing very large, sparse matrices, enabling cluster analysis on incomplete data matrices. We introduce binary matrix factorization, a novel matrix factorization model that, in contrast to classic factorization methods, such as singular value decomposition, decomposes a matrix using
Separating Precision and Mean in Dirichletenhanced Highorder Markov Models
"... Abstract. Robustly estimating the statetransition probabilities of highorder Markov processes is an essential task in many applications such as natural language modeling or protein sequence modeling. We propose a novel estimation algorithm called Hierarchical Separated Dirichlet Smoothing (HSDS), w ..."
Abstract
 Add to MetaCart
Abstract. Robustly estimating the statetransition probabilities of highorder Markov processes is an essential task in many applications such as natural language modeling or protein sequence modeling. We propose a novel estimation algorithm called Hierarchical Separated Dirichlet Smoothing (HSDS), where Dirichlet distributions are hierarchically assumed to be the prior distributions of the statetransition probabilities. The key idea in HSDS is to separate the parameters of a Dirichlet distribution into the precision and mean, so that the precision depends on the context while the mean is given by the lowerorder distribution. HSDS is designed to outperform KneserNey smoothing especially when the number of states is small, where KneserNey smoothing is currently known as the stateoftheart technique for Ngram natural language models. Our experiments in protein sequence modeling showed the superiority of HSDS both in perplexity evaluation and classification tasks. 1
unknown title
"... In Appendix A.5 of [1] it was shown by induction that [1 − d] a−1 1 = Sd(c, t) (1) A∈Act a∈A where Sd(c, t) is a generalized Stirling number of type (−1, −d, 0) [2]. These can be computed recursively as follows: Sd(1, 1) = Sd(0, 0) = 1 (2) Sd(c, 0) = Sd(0, t) = 0 for c, t> 0 (3) Sd(c, t) = 0 ..."
Abstract
 Add to MetaCart
In Appendix A.5 of [1] it was shown by induction that [1 − d] a−1 1 = Sd(c, t) (1) A∈Act a∈A where Sd(c, t) is a generalized Stirling number of type (−1, −d, 0) [2]. These can be computed recursively as follows: Sd(1, 1) = Sd(0, 0) = 1 (2) Sd(c, 0) = Sd(0, t) = 0 for c, t> 0 (3) Sd(c, t) = 0 for t> c (4) Sd(c, t) = Sd(c − 1, t − 1) + (c − 1 − dt)Sd(c − 1, t) for 0 < t ≤ c (5)