Results 1  10
of
29
Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing
"... Unsupervised grammar induction models tend to employ relatively simple models of syntax when compared to their supervised counterparts. Traditionally, the unsupervised models have been kept simple due to tractability and data sparsity concerns. In this paper, we introduce basic valence frames and le ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
Unsupervised grammar induction models tend to employ relatively simple models of syntax when compared to their supervised counterparts. Traditionally, the unsupervised models have been kept simple due to tractability and data sparsity concerns. In this paper, we introduce basic valence frames and lexical information into an unsupervised dependency grammar inducer and show how this additional information can be leveraged via smoothing. Our model produces stateoftheart results on the task of unsupervised grammar induction, improving over the best previous work by almost 10 percentage points. 1
A comparison of Bayesian estimators for unsupervised Hidden Markov Model POS taggers
"... There is growing interest in applying Bayesian techniques to NLP problems. There are a number of different estimators for Bayesian models, and it is useful to know what kinds of tasks each does well on. This paper compares a variety of different Bayesian estimators for Hidden Markov Model POS tagger ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
There is growing interest in applying Bayesian techniques to NLP problems. There are a number of different estimators for Bayesian models, and it is useful to know what kinds of tasks each does well on. This paper compares a variety of different Bayesian estimators for Hidden Markov Model POS taggers with various numbers of hidden states on data sets of different sizes. Recent papers have given contradictory results when comparing Bayesian estimators to Expectation Maximization (EM) for unsupervised HMM POS tagging, and we show that the difference in reported results is largely due to differences in the size of the training data and the number of states in the HMM. We invesigate a variety of samplers for HMMs, including some that these earlier papers did not study. We find that all of Gibbs samplers do well with small data sets and few states, and that Variational Bayes does well on large data sets and is competitive with the Gibbs samplers. In terms of times of convergence, we find that Variational Bayes was the fastest of all the estimators, especially on large data sets, and that explicit Gibbs sampler (both pointwise and sentenceblocked) were generally faster than their collapsed counterparts on large data sets. 1
Improving nonparameteric Bayesian inference: experiments on unsupervised word segmentation with adaptor grammars
"... One of the reasons nonparametric Bayesian inference is attracting attention in computational linguistics is because it provides a principled way of learning the units of generalization together with their probabilities. Adaptor grammars are a framework for defining a variety of hierarchical nonparam ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
One of the reasons nonparametric Bayesian inference is attracting attention in computational linguistics is because it provides a principled way of learning the units of generalization together with their probabilities. Adaptor grammars are a framework for defining a variety of hierarchical nonparametric Bayesian models. This paper investigates some of the choices that arise in formulating adaptor grammars and associated inference procedures, and shows that they can have a dramatic impact on performance in an unsupervised word segmentation task. With appropriate adaptor grammars and inference procedures we achieve an 87 % word token fscore on the standard Brent version of the BernsteinRatner corpus, which is an error reduction of over 35 % over the best previously reported results for this corpus. 1
Why doesn’t EM find good HMM POStaggers
 In EMNLP
, 2007
"... This paper investigates why the HMMs estimated by ExpectationMaximization (EM) produce such poor results as PartofSpeech (POS) taggers. We find that the HMMs estimated by EM generally assign a roughly equal number of word tokens to each hidden state, while the empirical distribution of tokens to ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
This paper investigates why the HMMs estimated by ExpectationMaximization (EM) produce such poor results as PartofSpeech (POS) taggers. We find that the HMMs estimated by EM generally assign a roughly equal number of word tokens to each hidden state, while the empirical distribution of tokens to POS tags is highly skewed. This motivates a Bayesian approach using a sparse prior to bias the estimator toward such a skewed distribution. We investigate Gibbs Sampling (GS) and Variational Bayes (VB) estimators and show that VB converges faster than GS for this task and that VB significantly improves 1to1 tagging accuracy over EM. We also show that EM does nearly as well as VB when the number of hidden HMM states is dramatically reduced. We also point out the high variance in all of these estimators, and that they require many more iterations to approach convergence than usually thought. 1
Sampling alignment structure under a Bayesian translation model
 In Empirical Methods in Natural Language Processing (EMNLP
, 2008
"... We describe the first tractable Gibbs sampling procedure for estimating phrase pair frequencies under a probabilistic model of phrase alignment. We propose and evaluate two nonparametric priors that successfully avoid the degenerate behavior noted in previous work, where overly large phrases memoriz ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
We describe the first tractable Gibbs sampling procedure for estimating phrase pair frequencies under a probabilistic model of phrase alignment. We propose and evaluate two nonparametric priors that successfully avoid the degenerate behavior noted in previous work, where overly large phrases memorize the training data. Phrase table weights learned under our model yield an increase in BLEU score over the wordalignment based heuristic estimates used regularly in phrasebased translation systems. 1
Inducing compact but accurate treesubstitution grammars
 In Proc. NAACL
, 2009
"... Tree substitution grammars (TSGs) are a compelling alternative to contextfree grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and overfitting. We present a theoretica ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Tree substitution grammars (TSGs) are a compelling alternative to contextfree grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and overfitting. We present a theoretically principled model which solves these problems using a Bayesian nonparametric formulation. Our model learns compact and simple grammars, uncovering latent linguistic structures (e.g., verb subcategorisation), and in doing so far outperforms a standard PCFG. 1
A Bayesian Model for Unsupervised Semantic Parsing
"... We propose a nonparametric Bayesian model for unsupervised semantic parsing. Following Poon and Domingos (2009), we consider a semantic parsing setting where the goal is to (1) decompose the syntactic dependency tree of a sentence into fragments, (2) assign each of these fragments to a cluster of s ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
We propose a nonparametric Bayesian model for unsupervised semantic parsing. Following Poon and Domingos (2009), we consider a semantic parsing setting where the goal is to (1) decompose the syntactic dependency tree of a sentence into fragments, (2) assign each of these fragments to a cluster of semantically equivalent syntactic structures, and (3) predict predicateargument relations between the fragments. We use hierarchical PitmanYor processes to model statistical dependencies between meaning representations of predicates and those of their arguments, as well as the clusters of their syntactic realizations. We develop a modification of the MetropolisHastings splitmerge sampler, resulting in an efficient inference algorithm for the model. The method is experimentally evaluated by using the induced semantic representation for the question answering task in the biomedical domain. 1
Structured Generative Models for Unsupervised NamedEntity Clustering
"... We describe a generative model for clustering named entities which also models named entity internal structure, clustering related words by role. The model is entirely unsupervised; it uses features from the named entity itself and its syntactic context, and coreference information from an unsupervi ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
We describe a generative model for clustering named entities which also models named entity internal structure, clustering related words by role. The model is entirely unsupervised; it uses features from the named entity itself and its syntactic context, and coreference information from an unsupervised pronoun resolver. The model scores 86 % on the MUC7 namedentity dataset. To our knowledge, this is the best reported score for a fully unsupervised model, and the best score for a generative model. 1
Blocked Inference in Bayesian Tree Substitution Grammars
"... Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian nonparametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this p ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian nonparametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked MetropolisHastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently converges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy. 1
A Particle Filter algorithm for Bayesian Wordsegmentation
"... Bayesian models are usually learned using batch algorithms that have to iterate multiple times over the full dataset. This is both computationally expensive and, from a cognitive point of view, highly implausible. We present a novel online algorithm for the word segmentation models of Goldwater et a ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
Bayesian models are usually learned using batch algorithms that have to iterate multiple times over the full dataset. This is both computationally expensive and, from a cognitive point of view, highly implausible. We present a novel online algorithm for the word segmentation models of Goldwater et al. (2009) which is, to our knowledge, the first published version of a Particle Filter for this kind of model. Also, in contrast to other proposed algorithms, it comes with a theoretical guarantee of optimality if the number of particles goes to infinity. While this is, of course, a theoretical point, a first experimental evaluation of our algorithm shows that, as predicted, its performance improves with the use of more particles, and that it performs competitively with other online learners proposed in Pearl et al. (2011). 1