Results 1  10
of
21
Statistical syntaxdirected translation with extended domain of locality
 In Proc. AMTA 2006
, 2006
"... A syntaxdirected translator first parses the sourcelanguage input into a parsetree, and then recursively converts the tree into a string in the targetlanguage. We model this conversion by an extended treetostring transducer that have multilevel trees on the sourceside, which gives our system m ..."
Abstract

Cited by 80 (13 self)
 Add to MetaCart
A syntaxdirected translator first parses the sourcelanguage input into a parsetree, and then recursively converts the tree into a string in the targetlanguage. We model this conversion by an extended treetostring transducer that have multilevel trees on the sourceside, which gives our system more expressive power and flexibility. We also define a direct probability model and use a lineartime dynamic programming algorithm to search for the best derivation. The model is then extended to the general loglinear framework in order to rescore with other features like ngram language models. We devise a simpleyeteffective algorithm to generate nonduplicate kbest translations for ngram rescoring. Initial experimental results on EnglishtoChinese translation are presented. 1
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract

Cited by 30 (8 self)
 Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)
Variational Decoding for Statistical Machine Translation
"... Statistical models in machine translation exhibit spurious ambiguity. That is, the probability of an output string is split among many distinct derivations (e.g., trees or segmentations). In principle, the goodness of a string is measured by the total probability of its many derivations. However, fi ..."
Abstract

Cited by 21 (1 self)
 Add to MetaCart
Statistical models in machine translation exhibit spurious ambiguity. That is, the probability of an output string is split among many distinct derivations (e.g., trees or segmentations). In principle, the goodness of a string is measured by the total probability of its many derivations. However, finding the best string (e.g., during decoding) is then computationally intractable. Therefore, most systems use a simple Viterbi approximation that measures the goodness of a string using only its most probable derivation. Instead, we develop a variational approximation, which considers all the derivations but still allows tractable decoding. Our particular variational distributions are parameterized as ngram models. We also analytically show that interpolating these ngram models for different n is similar to minimumrisk decoding for BLEU (Tromble et al., 2008). Experiments show that our approach improves the state of the art. 1
Bisimulation Minimisation for Weighted Tree Automata
, 2007
"... We generalise existing forward and backward bisimulation minimisation algorithms for tree automata to weighted tree automata. The obtained algorithms work for all semirings and retain the time complexity of their unweighted variants for all additively cancellative semirings. On all other semirings t ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
We generalise existing forward and backward bisimulation minimisation algorithms for tree automata to weighted tree automata. The obtained algorithms work for all semirings and retain the time complexity of their unweighted variants for all additively cancellative semirings. On all other semirings the time complexity is slightly higher (linear instead of logarithmic in the number of states). We discuss implementations of these algorithms on a typical task in natural language processing.
Monte Carlo inference and maximization for phrasebased translation
"... Recent advances in statistical machine translation have used beam search for approximate NPcomplete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for s ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Recent advances in statistical machine translation have used beam search for approximate NPcomplete inference within probabilistic translation models. We present an alternative approach of sampling from the posterior distribution defined by a translation model. We define a novel Gibbs sampler for sampling translations given a source sentence and show that it effectively explores this posterior distribution. In doing so we overcome the limitations of heuristic beam search and obtain theoretically sound solutions to inference problems such as finding the maximum probability translation and minimum expected risk training and decoding. 1
Minimizing Deterministic Weighted Tree Automata
, 2008
"... The problem of efficiently minimizing deterministic weighted tree automata (wta) is investigated. Such automata have found promising applications as language models in Natural Language Processing. A polynomialtime algorithm is presented that given a deterministic wta over a commutative semifield, o ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
The problem of efficiently minimizing deterministic weighted tree automata (wta) is investigated. Such automata have found promising applications as language models in Natural Language Processing. A polynomialtime algorithm is presented that given a deterministic wta over a commutative semifield, of which all operations including the computation of the inverses are polynomial, constructs an equivalent minimal (with respect to the number of states) deterministic and total wta. If the semifield operations can be performed in constant time, then the algorithm runs in time O(rmn 4) where r is the maximal rank of the input symbols, m is the number of transitions, and n is the number of states of the input wta.
Regular tree grammars as a formalism for scope underspecification
"... We propose the use of regular tree grammars (RTGs) as a formalism for the underspecified processing of scope ambiguities. By applying standard results on RTGs, we obtain a novel algorithm for eliminating equivalent readings and the first efficient algorithm for computing the best reading of a scope ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We propose the use of regular tree grammars (RTGs) as a formalism for the underspecified processing of scope ambiguities. By applying standard results on RTGs, we obtain a novel algorithm for eliminating equivalent readings and the first efficient algorithm for computing the best reading of a scope ambiguity. We also show how to derive RTGs from more traditional underspecified descriptions.
Determinization of weighted tree automata using factorizations
 PRESENTATION AT 8TH INT. WORKSHOP FINITESTATE METHODS AND NATURAL LANGUAGE PROCESSING
, 2009
"... We present a determinization construction for weighted tree automata using factorizations. Among others, this result subsumes a previous result for determinization of weighted string automata using factorizations (Kirsten and Mäurer, 2005) and two previous results for weighted tree automata, one of ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
We present a determinization construction for weighted tree automata using factorizations. Among others, this result subsumes a previous result for determinization of weighted string automata using factorizations (Kirsten and Mäurer, 2005) and two previous results for weighted tree automata, one of them not using factorizations (Borchardt, 2004) and one of them restricted to nonrecursive automata over the nonnegative reals (May and Knight, 2006).
Improving NLP through Marginalization of Hidden Syntactic Structure
"... Many NLP tasks make predictions that are inherently coupled to syntactic relations, but for many languages the resources required to provide such syntactic annotations are unavailable. For others it is unclear exactly how much of the syntactic annotations can be effectively leveraged with current mo ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Many NLP tasks make predictions that are inherently coupled to syntactic relations, but for many languages the resources required to provide such syntactic annotations are unavailable. For others it is unclear exactly how much of the syntactic annotations can be effectively leveraged with current models, and what structures in the syntactic trees are most relevant to the current task. We propose a novel method which avoids the need for any syntactically annotated data when predicting a related NLP task. Our method couples latent syntactic representations, constrained to form valid dependency graphs or constituency parses, with the prediction task via specialized factors in a Markov random field. At both training and test time we marginalize over this hidden structure, learning the optimal latent representations for the problem. Results show that this approach provides significant gains over a syntactically uninformed baseline, outperforming models that observe syntax on an English relation extraction task, and performing comparably to them in semantic role labeling. 1
nbest parsing revisited
 Proc. of Workshop on ATANLP
, 2010
"... We derive and implement an algorithm similar to (Huang and Chiang, 2005) for finding thenbest derivations in a weighted hypergraph. We prove the correctness and termination of the algorithm and we show experimental results concerning its runtime. Our work is different from the aforementioned one in ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We derive and implement an algorithm similar to (Huang and Chiang, 2005) for finding thenbest derivations in a weighted hypergraph. We prove the correctness and termination of the algorithm and we show experimental results concerning its runtime. Our work is different from the aforementioned one in the following respects: we consider labeled hypergraphs, allowing for treebased language models (Maletti and Satta, 2009); we specifically handle the case of cyclic hypergraphs; we admit structured weight domains, allowing for multiple features to be processed; we use the paradigm of functional programming together with lazy evaluation, achieving concise algorithmic descriptions. 1