Results 1 - 10
of
25
2009. Bayesian learning of a tree substitution grammar
- In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL-09), Suntec
"... Tree substitution grammars (TSGs) offer many advantages over context-free grammars (CFGs), but are hard to learn. Past approaches have resorted to heuristics. In this paper, we learn a TSG using Gibbs sampling with a nonparametric prior to control subtree size. The learned grammars perform significa ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Tree substitution grammars (TSGs) offer many advantages over context-free grammars (CFGs), but are hard to learn. Past approaches have resorted to heuristics. In this paper, we learn a TSG using Gibbs sampling with a nonparametric prior to control subtree size. The learned grammars perform significantly better than heuristically extracted ones on parsing accuracy. 1
Modeling Human Performance in Statistical Word Segmentation
"... What mechanisms support the ability of human infants, adults, and other primates to identify words from fluent speech using distributional regularities? In order to better characterize this ability, we collected data from adults in an artificial language segmentation task similar to Saffran, Newport ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
What mechanisms support the ability of human infants, adults, and other primates to identify words from fluent speech using distributional regularities? In order to better characterize this ability, we collected data from adults in an artificial language segmentation task similar to Saffran, Newport, and Aslin (1996) in which the length of sentences was systematically varied between groups of participants. We then compared the fit of a variety of computational models— including simple statistical models of transitional probability and mutual information, a clustering model based on mutual information by Swingley (2005), PARSER (Perruchet & Vintner, 1998), and a Bayesian model. We found that while all models were able to successfully complete the task, fit to the human data varied considerably, with the Bayesian model achieving the highest correlation with our results.
Latent variable models of selectional preference
- In ACL 2010
, 2010
"... This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to pr ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data. 1
Type-Based MCMC
"... Most existing algorithms for learning latentvariable models—such as EM and existing Gibbs samplers—are token-based, meaning that they update the variables associated with one sentence at a time. The incremental nature of these methods makes them susceptible to local optima/slow mixing. In this paper ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Most existing algorithms for learning latentvariable models—such as EM and existing Gibbs samplers—are token-based, meaning that they update the variables associated with one sentence at a time. The incremental nature of these methods makes them susceptible to local optima/slow mixing. In this paper, we introduce a type-based sampler, which updates a block of variables, identified by a type, which spans multiple sentences. We show improvements on part-of-speech induction, word segmentation, and learning tree-substitution grammars. 1
Beyond Transitional Probabilities: Human Learners Impose a Parsimony Bias in Statistical Word Segmentation
"... Human infants and adults are able to segment coherent sequences from unsegmented strings of auditory stimuli after only a short exposure, an ability thought to be linked to early language acquisition. Although some research has hypothesized that learners succeed in these tasks by computing transitio ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Human infants and adults are able to segment coherent sequences from unsegmented strings of auditory stimuli after only a short exposure, an ability thought to be linked to early language acquisition. Although some research has hypothesized that learners succeed in these tasks by computing transitional probabilities between syllables, current experimental results do not differentiate between a range of models of different computations that learners could perform. We created a set of stimuli that was consistent with two different lexicons—one consisting of two-syllable words and one of three-syllable words—but where transition probabilities would not lead learners to segment sentences consistently according to either lexicon. Participants ’ responses formed a distribution over possible segmentations that included consistent segmentations into both two- and three-syllable words, suggesting that learners do not use pure transitional probabilities to segment but instead impose a bias towards parsimony on the lexicons they learn.
Language Modeling with Tree Substitution Grammars
"... We show that a tree substitution grammar (TSG) induced with a collapsed Gibbs sampler results in lower perplexity on test data than both a standard context-free grammar and other heuristically trained TSGs, suggesting that it is better suited to language modeling. Training a more complicated bilexic ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We show that a tree substitution grammar (TSG) induced with a collapsed Gibbs sampler results in lower perplexity on test data than both a standard context-free grammar and other heuristically trained TSGs, suggesting that it is better suited to language modeling. Training a more complicated bilexical parsing model across TSG derivations shows further (though nuanced) improvement. We conduct analysis and point to future areas of research using TSGs as language models. 1
An Overview of Nonparametric Bayesian Models and Applications to Natural Language Processing
"... This paper provides an overview of nonparametric Bayesian models relevant to natural language processing (NLP) tasks. We first introduce Bayesian parametric methods, followed by nonparametric Bayesian modeling based on the most common nonparametric prior, the Dirichlet Process. We give characterizat ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper provides an overview of nonparametric Bayesian models relevant to natural language processing (NLP) tasks. We first introduce Bayesian parametric methods, followed by nonparametric Bayesian modeling based on the most common nonparametric prior, the Dirichlet Process. We give characterizations of the Dirichlet Process via the Polya urn scheme, the related Chinese restaurant metaphor, and the stick-breaking construction. We will also introduce two generalizations
Modeling Infant Word Segmentation
"... While many computational models have been created to explore how children might learn to segment words, the focus has largely been on achieving higher levels of performance and exploring cues suggested by artificial learning experiments. We propose a broader focus that includes designing models that ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
While many computational models have been created to explore how children might learn to segment words, the focus has largely been on achieving higher levels of performance and exploring cues suggested by artificial learning experiments. We propose a broader focus that includes designing models that display properties of infants ’ performance as they begin to segment words. We develop an efficient bootstrapping online learner with this focus in mind, and evaluate it on child-directed speech. In addition to attaining a high level of performance, this model predicts the error patterns seen in infants learning to segment words. 1
A Probabilistic Model of Syntactic and Semantic Acquisition from Child-Directed Utterances and their Meanings
"... This paper presents an incremental probabilistic learner that models the acquistion of syntax and semantics from a corpus of child-directed utterances paired with possible representations of their meanings. These meaning representations approximate the contextual input available to the child; they d ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper presents an incremental probabilistic learner that models the acquistion of syntax and semantics from a corpus of child-directed utterances paired with possible representations of their meanings. These meaning representations approximate the contextual input available to the child; they do not specify the meanings of individual words or syntactic derivations. The learner then has to infer the meanings and syntactic properties of the words in the input along with a parsing model. We use the CCG grammatical framework and train a non-parametric Bayesian model of parse structure with online variational Bayesian expectation maximization. When tested on utterances from the CHILDES corpus, our learner outperforms a state-of-the-art semantic parser. In addition, it models such aspects of child acquisition as “fast mapping,” while also countering previous criticisms of statistical syntactic learners. 1
U N I V E R S
"... The recent years have seen the emergence of multiple stochastic language and grammar models, that make use of Pitman-Yor processes as Bayesian priors. Thus far, those models proved very effective for NLP tasks that involve unsupervised inference. The aim of this project is to investigate on semi-sup ..."
Abstract
- Add to MetaCart
The recent years have seen the emergence of multiple stochastic language and grammar models, that make use of Pitman-Yor processes as Bayesian priors. Thus far, those models proved very effective for NLP tasks that involve unsupervised inference. The aim of this project is to investigate on semi-supervised learning methods and to test their applicability and effectiveness on this class of language models. The original methods have to be adapted as the usual semi-supervised inference with Expectation-Maximization (EM) is not applicable. The alternative is to retreat to Gibbs sampling. The two major contributions are novel variants of the Stable Mixing method (Corduneanu and Jaakkola, 2002) and of Active Learning (Cohn et al., 1996). Unlike their originals, the new methods are applicable to Gibbs-based inference methods. For an exemplary word separation task on the Bernstein Ratner corpus, the new Stable Mixing variant improves the word F0-score by 9 % in comparison to the base-line approach. 2 Acknowledgements I want to express my gratitute to my project supervisor, Miles Osborne, for his support

