## A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing

Citations: | 5 - 1 self |

### BibTeX

@MISC{Rush_atutorial,

author = {Alexander M. Rush and Michael Collins and Pack Kaelbling},

title = {A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing},

year = {}

}

### OpenURL

### Abstract

Dual decomposition, and more generally Lagrangian relaxation, is a classical method for combinatorial optimization; it has recently been applied to several inference problems in natural language processing (NLP). This tutorial gives an overview of the technique. We describe example algorithms, describe formal guarantees for the method, and describe practical issues in implementing the algorithms. While our examples are predominantly drawn from the NLP literature, the material should be of general relevance to inference problems in machine learning. A central theme of this tutorial is that Lagrangian relaxation is naturally applied in conjunction with a broad class of combinatorial algorithms, allowing inference in models that go significantly beyond previous work on Lagrangian relaxation for inference in graphical models. 1.

### Citations

2306 | Conditional random fields: probabilistic models for segmenting and labeling sequence data
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ...ndividual context-free rules are defined. As one example, in a probabilistic context-free grammar, we would define θ(α → β) = log p(α → β|α). As a second example, in a conditional random field (CRF) (=-=Lafferty et al., 2001-=-) we would define θ(α → β) =w · φ(α → β) where w ∈ R q is a parameter vector, and φ(α → β) ∈ R q is a feature vector representing the rule α → β. 2. l(y) is a function that maps a parse tree y to the ... |

633 | Statistical phrase-based translation
- Koehn, Och, et al.
- 2003
(Show Context)
Citation Context ...he traveling salesman problem. 6.4 Phrase-based Translation We next consider a Lagrangian relaxation algorithm, described by Chang and Collins (2011), for decoding of phrase-based translation models (=-=Koehn et al., 2003-=-). The input to a phrase-based translation model is a source-language sentence with n words, x = x1 . . . xn. The output is a sentence in the target language. The examples in this section will use Ger... |

503 | Three generative, lexicalised models for statistical parsing - Collins - 1997 |

342 | Feature-rich part-ofspeech tagging with a cyclic dependency network - Klein, Manning, et al. - 2003 |

273 | Minimization Methods for Nondifferentiable Functions - Shor - 1985 |

226 | The lagrangian relaxation method for solving integer programming problems - Fisher - 1981 |

201 | Fast Exact Inference with a Factored Model for Natural Language Parsing - Klein, Manning - 2002 |

186 | Combinatorial Optimization: Theory and Algorithms. Algorithms and Combinatorics - Korte, Vygen - 2005 |

119 | The traveling-salesman problem and minimum spanning trees - Held, Karp - 1970 |

117 |
On formal properties of simple phrase structure grammars
- Bar-Hillel, Perles, et al.
- 1964
(Show Context)
Citation Context ...n to using f(y) alone. 4 Under this definition of h(y), the conventional approach to finding y ⇤ in Eq. 8 is to to construct a new context-free grammar that introduces sensitivity to surface bigrams (=-=Bar-Hillel et al., 1964-=-). Roughly speaking, in this approach (assuming a first-order tagging model) rules such as S ! NP VP are replaced with rules such as SD,A ! NPD,N VPV,A (10) 3. We define zi for i apple 0 to be a speci... |

110 | Simple semi-supervised dependency parsing - Koo, Carreras, et al. - 2008 |

74 | MRF optimization via dual decomposition: Messagepassing revisited
- Komodakis, Paragios, et al.
- 2007
(Show Context)
Citation Context ...elaxation algorithm for the traveling salesman problem. Initial work on Lagrangian relaxation/dual decomposition for decoding in statistical models focused on the MAP problem in Markov random fields (=-=Komodakis et al., 2007-=-, 2010). More recently, decoding algorithms have been derived for several models in statistical NLP, including models that combine a weighted context-free grammar (WCFG) with a finite-state tagger (Ru... |

65 | Dependency parsing by belief propagation
- Smith, Eisner
- 2008
(Show Context)
Citation Context .... This idea is closely related to earlier work on the use of combinatorial algorithms within belief propagation, either for the MAP inference problem (Duchi et al., 2007), or for computing marginals (=-=Smith and Eisner, 2008-=-). These methods generalize loopy BP in a way that allows the use of combinatorial algorithms. Again, we argue that methods based on Lagrangian relaxation are preferable to variants of loopy BP, as th... |

51 | Dual decomposition for parsing with nonprojective head automata - Koo, Rush, et al. - 2010 |

48 | On dual decomposition and linear programming relaxations for natural language processing
- Rush, Sontag, et al.
(Show Context)
Citation Context ...07, 2010). More recently, decoding algorithms have been derived for several models in statistical NLP, including models that combine a weighted context-free grammar (WCFG) with a finite-state tagger (=-=Rush et al., 2010-=-); models that combine a lexicalized WCFG with a discriminative dependency parsing model (Rush et al., 2010); head-automata models for non-projective dependency parsing (Koo et al., 2010); alignment m... |

39 | Using combinatorial optimization within max-product belief propagation
- Duchi, Tarlow, et al.
- 2007
(Show Context)
Citation Context ...oader class of models than those captured by MRFs. This idea is closely related to earlier work on the use of combinatorial algorithms within belief propagation, either for the MAP inference problem (=-=Duchi et al., 2007-=-), or for computing marginals (Smith and Eisner, 2008). These methods generalize loopy BP in a way that allows the use of combinatorial algorithms. Again, we argue that methods based on Lagrangian rel... |

39 | Optimal Trees - Magnanti, Wolsey - 1995 |

37 | Concise integer linear programming formulations for dependency parsing - Martins, Smith, et al. |

33 | MRF Energy Minimization and Beyond via Dual Decomposition - Komodakis, Paragios, et al. |

26 | Approximate primal solutions and rate analysis for dual subgradient methods - Nedic, Ozdaglar - 1978 |

25 | Introduction to dual decomposition for inference - Sontag, Globerson, et al. - 2011 |

22 | Lagrangian relaxation - Lemaréchal - 2001 |

20 | Lagrangian relaxation for MAP estimation in graphical models - Johnson, Malioutov, et al. - 2007 |

17 | A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing
- Auli, Lopez
- 2011
(Show Context)
Citation Context ...l., 2010); alignment models for statistical machine translation (DeNero and Macherey, 2011); models for event extraction (Riedel and McCallum, 2011); models for combined CCG parsing and supertagging (=-=Auli and Lopez, 2011-=-); phrase-based models for statistical machine translation (Chang and Collins, 2011); and syntax-based models for statistical machine translation (Rush and Collins, 2011). We will give an overview of ... |

16 | Exact decoding of phrase-based translation models through lagrangian relaxation
- Chang, Collins
- 2011
(Show Context)
Citation Context ...Macherey, 2011); models for event extraction (Riedel & McCallum, 2011); models for combined CCG parsing and supertagging (Auli & Lopez, 2011); phrase-based models for statistical machine translation (=-=Chang & Collins, 2011-=-); syntaxbased models for statistical machine translation (Rush & Collins, 2011); and models based on the intersection of weighted automata (Paul & Eisner, 2012). We will give an overview of several o... |

15 | Lagrangian relaxation, in "Computational Combinatorial Optimization - LEMARÉCHAL |

15 | Polyhedral characterization of discrete dynamic programming - Martin, Rardin, et al. - 1990 |

15 | Fast and robust joint models for biomedical event extraction
- Riedel, McCallum
- 2011
(Show Context)
Citation Context ...dels for non-projective dependency parsing (Koo, Rush, Collins, Jaakkola, & Sontag, 2010); alignment models for statistical machine translation (DeNero & Macherey, 2011); models for event extraction (=-=Riedel & McCallum, 2011-=-); models for combined CCG parsing and supertagging (Auli & Lopez, 2011); phrase-based models for statistical machine translation (Chang & Collins, 2011); syntaxbased models for statistical machine tr... |

12 | Discriminative Training and Spanning Tree Algorithms for Dependency Parsing - McDonald - 2006 |

10 | Exact decoding of syntactic translation models through lagrangian relaxation
- Rush, Collins
- 2011
(Show Context)
Citation Context ...or combined CCG parsing and supertagging (Auli & Lopez, 2011); phrase-based models for statistical machine translation (Chang & Collins, 2011); syntaxbased models for statistical machine translation (=-=Rush & Collins, 2011-=-); and models based on the intersection of weighted automata (Paul & Eisner, 2012). We will give an overview of several of these algorithms in this paper. While our focus is on examples from natural l... |

9 |
Model-based aligner combination using dual decomposition
- DeNero, Macherey
- 2011
(Show Context)
Citation Context ...ith a discriminative dependency parsing model (Rush et al., 2010); head-automata models for non-projective dependency parsing (Koo et al., 2010); alignment models for statistical machine translation (=-=DeNero and Macherey, 2011-=-); models for event extraction (Riedel and McCallum, 2011); models for combined CCG parsing and supertagging (Auli and Lopez, 2011); phrase-based models for statistical machine translation (Chang and ... |

8 |
Fast and smooth: Accelerated dual decomposition for MAP inference
- Jojic, Gould, et al.
- 2010
(Show Context)
Citation Context ...l guarantees. This tutorial will concentrate on subgradient optimization methods for inference (i.e., for minimization of the dual objective that results from the Lagrangian relaxation). Recent work (=-=Jojic et al., 2010-=-; Martins et al., 2011) has considered alternative optimization methods; see also Sontag et al. (2010) for a description of alternative algorithms, in particular dual coordinate descent. 3. Lagrangian... |

7 | Dual decomposition with many overlapping components
- Martins, Smith, et al.
(Show Context)
Citation Context ...utorial will concentrate on subgradient optimization methods for inference (i.e., for minimization of the dual objective that results from the Lagrangian relaxation). Recent work (Jojic et al., 2010; =-=Martins et al., 2011-=-) has considered alternative optimization methods; see also Sontag et al. (2010) for a description of alternative algorithms, in particular dual coordinate descent. 3. Lagrangian Relaxation This secti... |

1 | Angelia Nedić and Asuman Ozdaglar. Approximate primal solutions and rate analysis for dual subgradient methods - Riedel, McCallum |

1 | Subgradient Methods. Course Notes for EE364b - Boyd, Mutapcic - 2007 |

1 |
parsing with non-projective head automata
- Koo, Rush, et al.
- 2010
(Show Context)
Citation Context ...tag, Collins, & Jaakkola, 2010); models that combine a lexicalized WCFG with a discriminative dependency parsing model (Rush et al., 2010); head-automata models for non-projective dependency parsing (=-=Koo, Rush, Collins, Jaakkola, & Sontag, 2010-=-); alignment models for statistical machine translation (DeNero & Macherey, 2011); models for event extraction (Riedel & McCallum, 2011); models for combined CCG parsing and supertagging (Auli & Lopez... |

1 | Implicitly intersecting weighted automata using dual decomposition
- Paul, Eisner
- 2012
(Show Context)
Citation Context ... for statistical machine translation (Chang & Collins, 2011); syntaxbased models for statistical machine translation (Rush & Collins, 2011); and models based on the intersection of weighted automata (=-=Paul & Eisner, 2012-=-). We will give an overview of several of these algorithms in this paper. While our focus is on examples from natural language processing, the material in this tutorial should be of general relevance ... |