## Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

### Cached

### Download Links

Citations: | 10 - 2 self |

### BibTeX

@MISC{Rush_exactdecoding,

author = {Alexander M. Rush and Michael Collins},

title = {Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation},

year = {}

}

### OpenURL

### Abstract

We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimality, on over 97 % of test examples; it has comparable speed to state-of-the-art decoders. 1

### Citations

369 |
Algorithm 97: Shortest path
- Floyd
- 1962
(Show Context)
Citation Context ...α ∗ v3(p) . The first step involves finding the highest scoring incoming trigram path for each leafv. This step can be performed efficiently using the Floyd-Warshall allpairs shortest path algorithm (=-=Floyd, 1962-=-) over the graph (S,T); the details are given in the appendix. The second step involves simple dynamic programming over the hypergraph (V,E) (it is simple to integrate theβs terms into this algorithm)... |

197 |
Combinatorial Optimization: Theory and Algorithms
- Korte, Vygen
- 2007
(Show Context)
Citation Context ...rs. The FST algorithms are shown to produce higher scoring solutions than cube-pruning on a large proportion of examples. Lagrangian relaxation is a classical technique in combinatorial optimization (=-=Korte and Vygen, 2008-=-). Lagrange multipliers are used to add linear constraints to an existing problem that can be solved using a combinatorial algorithm; the resulting dual function is then minimized, for example using s... |

140 | MAP estimation via agreement on trees: message-passing and linear programming
- Wainwright, Jaakkola, et al.
- 2005
(Show Context)
Citation Context ...cent work, dual decomposition—a special case of Lagrangian relaxation, where the linear constraints enforce agreement between two or more models—has been applied to inference in Markov random fields (=-=Wainwright et al., 2005-=-; Komodakis et al., 2007; Sontag et al., 2008), and also to inference problems in NLP (Rush et al., 2010; Koo et al., 2010). There are close connections between dual decomposition and work on belief p... |

121 |
On Formal Properties of Simple Phrase Structure Grammars. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung
- Bar-Hillel, Perles, et al.
- 1961
(Show Context)
Citation Context ...ing with these models is challenging, largely because of the cost of integrating an n-gram language model into the search process. Exact dynamic programming algorithms for the problem are well known (=-=Bar-Hillel et al., 1964-=-), but are too expensive to be used in practice. 2 Previous work on decoding for syntax-based SMT has therefore been focused primarily on approximate search methods. This paper describes an efficient ... |

98 | A New String-to-Dependency Machine Translation Algorithm with a Target Dependency Language Model
- Shen, Xu, et al.
- 2008
(Show Context)
Citation Context ...s seen widespread use of synchronous probabilistic grammars in statistical machine translation (SMT). The decoding problem for a broad range of these systems (e.g., (Chiang, 2005; Marcu et al., 2006; =-=Shen et al., 2008-=-)) corresponds to the intersection of a (weighted) hypergraph with an n-gram language model. 1 The hypergraph represents a large set of possible translations, and is created by applying a synchronous ... |

86 | SPMT: Statistical Machine Translation with Syntactified Target Language Phraases
- Marcu, Wang, et al.
- 2006
(Show Context)
Citation Context ...ction Recent work has seen widespread use of synchronous probabilistic grammars in statistical machine translation (SMT). The decoding problem for a broad range of these systems (e.g., (Chiang, 2005; =-=Marcu et al., 2006-=-; Shen et al., 2008)) corresponds to the intersection of a (weighted) hypergraph with an n-gram language model. 1 The hypergraph represents a large set of possible translations, and is created by appl... |

79 | MRF Optimization via Dual Decomposition: Message-Passing Revisited
- Komodakis, Paragios, et al.
- 2007
(Show Context)
Citation Context ...tion—a special case of Lagrangian relaxation, where the linear constraints enforce agreement between two or more models—has been applied to inference in Markov random fields (Wainwright et al., 2005; =-=Komodakis et al., 2007-=-; Sontag et al., 2008), and also to inference problems in NLP (Rush et al., 2010; Koo et al., 2010). There are close connections between dual decomposition and work on belief propagation (Smith and Ei... |

70 | Tightening LP relaxations for MAP using message passing
- Sontag, Meltzer, et al.
- 2008
(Show Context)
Citation Context ...agrangian relaxation, where the linear constraints enforce agreement between two or more models—has been applied to inference in Markov random fields (Wainwright et al., 2005; Komodakis et al., 2007; =-=Sontag et al., 2008-=-), and also to inference problems in NLP (Rush et al., 2010; Koo et al., 2010). There are close connections between dual decomposition and work on belief propagation (Smith and Eisner, 2008). 3 Backgr... |

67 | 2008. Dependency Parsing by Belief Propagation - Smith, Eisner |

58 | Rescoring: Faster Decoding with Integrated Language Models - Forest - 2007 |

58 | Dual decomposition for parsing with non-projective head automata
- Koo, Rush, et al.
- 2010
(Show Context)
Citation Context ... or more models—has been applied to inference in Markov random fields (Wainwright et al., 2005; Komodakis et al., 2007; Sontag et al., 2008), and also to inference problems in NLP (Rush et al., 2010; =-=Koo et al., 2010-=-). There are close connections between dual decomposition and work on belief propagation (Smith and Eisner, 2008). 3 Background: Hypergraphs Translation with many syntax-based systems (e.g., (Chiang, ... |

28 | Hierarchical phrase-based translation with weighted finitestate transducers and shallow-n grammars - Gispert, Iglesias, et al. - 2010 |

27 |
Hierarchical phrase-based translation. Computational Linguistics
- Chiang
- 2007
(Show Context)
Citation Context ...xamples. The method is comparable in speed to state-of-the-art decoding algorithms; for example, over 70% of the test examples are decoded in 2 seconds or less. We compare our method to cube pruning (=-=Chiang, 2007-=-), and find that our method gives improved model scores on a significant number of examples. One consequence of our work is that we give accurate estimates of the number of search errors for cube prun... |

19 | Coarse-to-fine syntactic machine translation using language projections
- Petrov, Haghighi, et al.
- 2008
(Show Context)
Citation Context ...ed translation systems, including cube-pruning (Chiang, 2007; Huang and Chiang, 2007), left-to-right decoding with beam search (Watanabe et al., 2006; Huang and Mi, 2010), and coarse-to-fine methods (=-=Petrov et al., 2008-=-). Recent work has developed decoding algorithms based on finite state transducers (FSTs). Iglesias et al. (2009) show that exact FST decoding is feasible for a phrase-based system with limited reorde... |

15 |
Polyhedral characterization of discrete dynamic programming
- Martin, Rardin, et al.
- 1990
(Show Context)
Citation Context ...an edge e, and t(e) to refer to the tail. We will assume that the hypergraph is acyclic: intuitively this will mean that no derivation (as defined below) contains the same vertex more than once (see (=-=Martin et al., 1990-=-) for a formal definition). Each vertex v ∈ V is either a non-terminal in the hypergraph, or a leaf. The set of non-terminals is VN = {v ∈ V : ∃e ∈ E such that h(e) = v} Conversely, the set of leaves ... |