Results 1 - 10
of
17
Learning Semantic Correspondences with Less Supervision
"... A central problem in grounded language acquisition is learning the correspondences between a rich world state and a stream of text which references that world state. To deal with the high degree of ambiguity present in this setting, we present a generative model that simultaneously segments the text ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
A central problem in grounded language acquisition is learning the correspondences between a rich world state and a stream of text which references that world state. To deal with the high degree of ambiguity present in this setting, we present a generative model that simultaneously segments the text into utterances and maps each utterance to a meaning representation grounded in the world state. We show that our model generalizes across three domains of increasing difficulty—Robocup sportscasting, weather forecasts (a new domain), and NFL recaps. 1
2009. Bayesian learning of a tree substitution grammar
- In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics (ACL-09), Suntec
"... Tree substitution grammars (TSGs) offer many advantages over context-free grammars (CFGs), but are hard to learn. Past approaches have resorted to heuristics. In this paper, we learn a TSG using Gibbs sampling with a nonparametric prior to control subtree size. The learned grammars perform significa ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Tree substitution grammars (TSGs) offer many advantages over context-free grammars (CFGs), but are hard to learn. Past approaches have resorted to heuristics. In this paper, we learn a TSG using Gibbs sampling with a nonparametric prior to control subtree size. The learned grammars perform significantly better than heuristically extracted ones on parsing accuracy. 1
Re-structuring, Re-labeling, and Re-aligning for Syntax-Based Machine Translation
"... Language Weaver, Inc. This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a stat ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Language Weaver, Inc. This article shows that the structure of bilingual material from standard parsing and alignment tools is not optimal for training syntax-based statistical machine translation (SMT) systems. We present three modifications to the MT training data to improve the accuracy of a state-of-theart syntax MT system:re-structuring changes the syntactic structure of training parse trees to enable reuse of substructures; re-labeling alters bracket labels to enrich rule application context; and re-aligning unifies word alignment across sentences to remove bad word alignments and refine good ones. Better structures, labels, and word alignments are learned by the EM algorithm. We show that each individual technique leads to improvement as measured by BLEU, and we also show that the greatest improvement is achieved by combining them. We report an overall 1.48 BLEU improvement on the NIST08 evaluation set over a strong baseline in Chinese/English translation. 1. Background Syntactic methods have recently proven useful in statistical machine translation (SMT). In this article, we explore different ways of exploiting the structure of bilingual material for syntax-based SMT. In particular, we ask what kinds of tree structures, tree labels, and word alignments are best suited for improving end-to-end translation accuracy. We begin with structures from standard parsing and alignment tools, then use the EM algorithm to revise these structures in light of the translation task. We report an overall +1.48 BLEU improvement on a standard Chinese-to-English test.
Training Phrase Translation Models with Leaving-One-Out
"... Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent ov ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering models in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%. 1
Monte Carlo inference and maximization for phrase-based translation
"... Recent advances in statistical machine translation have used beam search for approximate NP-complete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for s ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Recent advances in statistical machine translation have used beam search for approximate NP-complete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for sampling translations given a source sentence and show that it effectively explores this posterior distribution. In doing so we overcome the limitations of heuristic beam search and obtain theoretically sound solutions to inference problems such as finding the maximum probability translation and minimum expected risk training and decoding. 1
Bayesian Synchronous Tree-Substitution Grammar Induction and its Application to Sentence Compression
"... We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar. 1
Blocked Inference in Bayesian Tree Substitution Grammars
"... Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this p ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently converges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy. 1
A Note on the Implementation of Hierarchical Dirichlet Processes
"... The implementation of collapsed Gibbs samplers for non-parametric Bayesian models is non-trivial, requiring considerable book-keeping. Goldwater et al. (2006a) presented an approximation which significantly reduces the storage and computation overhead, but we show here that their formulation was inc ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The implementation of collapsed Gibbs samplers for non-parametric Bayesian models is non-trivial, requiring considerable book-keeping. Goldwater et al. (2006a) presented an approximation which significantly reduces the storage and computation overhead, but we show here that their formulation was incorrect and, even after correction, is grossly inaccurate. We present an alternative formulation which is exact and can be computed easily. However this approach does not work for hierarchical models, for which case we present an efficient data structure which has a better space complexity than the naive approach. 1
An Overview of Nonparametric Bayesian Models and Applications to Natural Language Processing
"... This paper provides an overview of nonparametric Bayesian models relevant to natural language processing (NLP) tasks. We first introduce Bayesian parametric methods, followed by nonparametric Bayesian modeling based on the most common nonparametric prior, the Dirichlet Process. We give characterizat ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper provides an overview of nonparametric Bayesian models relevant to natural language processing (NLP) tasks. We first introduce Bayesian parametric methods, followed by nonparametric Bayesian modeling based on the most common nonparametric prior, the Dirichlet Process. We give characterizations of the Dirichlet Process via the Polya urn scheme, the related Chinese restaurant metaphor, and the stick-breaking construction. We will also introduce two generalizations
Inducing Synchronous Grammars with Slice Sampling
"... This paper describes an efficient sampler for synchronous grammar induction under a nonparametric Bayesian prior. Inspired by ideas from slice sampling, our sampler is able to draw samples from the posterior distributions of models for which the standard dynamic programing based sampler proves intra ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper describes an efficient sampler for synchronous grammar induction under a nonparametric Bayesian prior. Inspired by ideas from slice sampling, our sampler is able to draw samples from the posterior distributions of models for which the standard dynamic programing based sampler proves intractable on non-trivial corpora. We compare our sampler to a previously proposed Gibbs sampler and demonstrate strong improvements in terms of both training log-likelihood and performance on an end-to-end translation evaluation. 1

