Results 1  10
of
89
On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing
 In Proc. EMNLP
, 2010
"... This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamicprogramming algorithms as oracle solvers for subproblems, together with a simple method for forcing agreement between the different oracles. The approa ..."
Abstract

Cited by 71 (3 self)
 Add to MetaCart
This paper introduces dual decomposition as a framework for deriving inference algorithms for NLP problems. The approach relies on standard dynamicprogramming algorithms as oracle solvers for subproblems, together with a simple method for forcing agreement between the different oracles. The approach provably solves a linear programming (LP) relaxation of the global inference problem. It leads to algorithms that are simple, in that they use existing decoding algorithms; efficient, in that they avoid exact algorithms for the full model; and often exact, in that empirically they often recover the correct solution in spite of using an LP relaxation. We give experimental results on two problems: 1) the combination of two lexicalized parsing models; and 2) the combination of a lexicalized parsing model and a trigram partofspeech tagger. 1
An Augmented Lagrangian Approach to Constrained MAP Inference
"... We propose a new algorithm for approximate MAP inference on factor graphs, by combining augmented Lagrangian optimization with the dual decomposition method. Each slave subproblem is given a quadratic penalty, which pushes toward faster consensus than in previous subgradient approaches. Our algorith ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
(Show Context)
We propose a new algorithm for approximate MAP inference on factor graphs, by combining augmented Lagrangian optimization with the dual decomposition method. Each slave subproblem is given a quadratic penalty, which pushes toward faster consensus than in previous subgradient approaches. Our algorithm is provably convergent, parallelizable, and suitable for fine decompositions of the graph. We show how it can efficiently handle problems with (possibly global) structural constraints via simple sort operations. Experiments on synthetic and realworld data show that our approach compares favorably with the stateoftheart. 1.
A TransitionBased System for Joint PartofSpeech Tagging and Labeled NonProjective Dependency Parsing
"... Most current dependency parsers presuppose that input words have been morphologically disambiguated using a partofspeech tagger before parsing begins. We present a transitionbased system for joint partofspeech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluati ..."
Abstract

Cited by 32 (1 self)
 Add to MetaCart
(Show Context)
Most current dependency parsers presuppose that input words have been morphologically disambiguated using a partofspeech tagger before parsing begins. We present a transitionbased system for joint partofspeech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved stateoftheart results for all languages. 1
A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing
"... Via an oracle experiment, we show that the upper bound on accuracy of a CCG parser is significantly lowered when its search space is pruned using a supertagger, though the supertagger also prunes many bad parses. Inspired by this analysis, we design a single model with both supertagging and parsing ..."
Abstract

Cited by 26 (1 self)
 Add to MetaCart
(Show Context)
Via an oracle experiment, we show that the upper bound on accuracy of a CCG parser is significantly lowered when its search space is pruned using a supertagger, though the supertagger also prunes many bad parses. Inspired by this analysis, we design a single model with both supertagging and parsing features, rather than separating them into distinct models chained together in a pipeline. To overcome the resulting increase in complexity, we experiment with both belief propagation and dual decomposition approaches to inference, the first empirical comparison of these algorithms that we are aware of on a structured natural language processing problem. On CCGbank we achieve a labelled dependency Fmeasure of 88.8 % on gold POS tags, and 86.7 % on automatic partofspeeoch tags, the best reported results for this task. 1
Fast and Robust Joint Models for Biomedical Event Extraction
"... Extracting biomedical events from literature has attracted much recent attention. The bestperforming systems so far have been pipelines of simple subtaskspecific local classifiers. A natural drawback of such approaches are cascading errors introduced in early stages of the pipeline. We present thre ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
Extracting biomedical events from literature has attracted much recent attention. The bestperforming systems so far have been pipelines of simple subtaskspecific local classifiers. A natural drawback of such approaches are cascading errors introduced in early stages of the pipeline. We present three joint models of increasing complexity designed to overcome this problem. The first model performs joint trigger and argument extraction, and lends itself to a simple, efficient and exact inference algorithm. The second model captures correlations between events, while the third model ensures consistency between arguments of the same event. Inference in these models is kept tractable through dual decomposition. The first two models outperform the previous best joint approaches and are very competitive with respect to the current stateoftheart. The third model yields the best results reported so far on the BioNLP 2009 shared task, the BioNLP 2011 Genia task and the BioNLP
Approximate Inference in Graphical Models using LP Relaxations
, 2010
"... Graphical models such as Markov random fields have been successfully applied to a wide variety of fields, from computer vision and natural language processing, to computational biology. Exact probabilistic inference is generally intractable in complex models having many dependencies between the vari ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
Graphical models such as Markov random fields have been successfully applied to a wide variety of fields, from computer vision and natural language processing, to computational biology. Exact probabilistic inference is generally intractable in complex models having many dependencies between the variables. We present new approaches to approximate inference based on linear programming (LP) relaxations. Our algorithms optimize over the cycle relaxation of the marginal polytope, which we show to be closely related to the first lifting of the SheraliAdams hierarchy, and is significantly tighter than the pairwise LP relaxation. We show how to efficiently optimize over the cycle relaxation using a cuttingplane algorithm that iteratively introduces constraints into the relaxation. We provide a criterion to determine which constraints would be most helpful in tightening the relaxation, and give efficient algorithms for solving the search problem of finding the best cycle constraint to add according to this criterion.
Exact Decoding of Phrasebased Translation Models through Lagrangian Relaxation
 In To appear proc. of EMNLP
, 2011
"... This paper describes an algorithm for exact decoding of phrasebased translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99 % of test examples. The method is much more efficient than approaches based on linear programming ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
This paper describes an algorithm for exact decoding of phrasebased translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99 % of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.
ModelBased Aligner Combination Using Dual Decomposition
 In Proc. ACL
, 2011
"... Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional model ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like growdiagfinal. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a onetoone phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in wordlevel and phraselevel evaluations. 1
LowRank Tensors for Scoring Dependency Structures
"... Accurate scoring of syntactic structures such as headmodifier arcs in dependency parsing typically requires rich, highdimensional feature representations. A small subset of such features is often selected manually. This is problematic when features lack clear linguistic meaning as in embeddings o ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
Accurate scoring of syntactic structures such as headmodifier arcs in dependency parsing typically requires rich, highdimensional feature representations. A small subset of such features is often selected manually. This is problematic when features lack clear linguistic meaning as in embeddings or when the information is blended across features. In this paper, we use tensors to map highdimensional feature vectors into low dimensional representations. We explicitly maintain the parameters as a lowrank tensor to obtain low dimensional representations of words in their syntactic roles, and to leverage modularity in the tensor for easy training with online algorithms. Our parser consistently outperforms the Turbo and MST parsers across 14 different languages. We also obtain the best published UAS results on 5 languages.1 1