## Translation as weighted deduction (2009)

### Cached

### Download Links

Venue: | In Proc. of EACL |

Citations: | 11 - 3 self |

### BibTeX

@INPROCEEDINGS{Lopez09translationas,

author = {Adam Lopez},

title = {Translation as weighted deduction},

booktitle = {In Proc. of EACL},

year = {2009},

pages = {532--540}

}

### OpenURL

### Abstract

We present a unified view of many translation algorithms that synthesizes work on deductive parsing, semiring parsing, and efficient approximate search algorithms. This gives rise to clean analyses and compact descriptions that can serve as the basis for modular implementations. We illustrate this with several examples, showing how to build search spaces for several disparate phrase-based search strategies, integrate non-local features, and devise novel models. Although the framework is drawn from parsing and applied to translation, it is applicable to many dynamic programming problems arising in natural language processing and other areas. 1

### Citations

8557 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
- 2001
(Show Context)
Citation Context ...rove f due to the language model dependency. This means that f is nonmonotonic—it does not display the optimal substructure property on partial derivations, which is required for dynamic programming (=-=Cormen et al., 2001-=-). The logics still work for some semirings (e.g. boolean), but not others. Therefore, non-local parameterizations break semiringweighted deduction, because we can no longer use 7 General weighted ded... |

1177 | The mathematics of statistical machine translation: Parameter estimation
- Brown, Pietra, et al.
(Show Context)
Citation Context ...he latter would suffer from more model errors since its space of possible reorderings is smaller. We emphasize that many other translation models can be described this way. Logics for the IBM Models (=-=Brown et al., 1993-=-) would be similar to our logics for phrase-based models. Syntax-based translation logics are similar to parsing logics; a few examples already appear in the literature (Chiang, 2007; Venugopal et al.... |

883 | Moses: Open source toolkit for statistical machine translation
- Koehn, Hoang, et al.
- 2007
(Show Context)
Citation Context ...egies would be identical (except for degenerate cases 3 Moore and Quirk (2007) give a nice description of MDd. 4 We do not know if WLd is documented anywhere, but from inspection it is used in Moses (=-=Koehn et al., 2007-=-). This was confirmed by Philipp Koehn and Hieu Hoang (p.c.). 5 When a phrase covers the first uncovered word in the source sentence, the new first uncovered word may be further along in the sentence ... |

653 | An efficient context-free parsing algorithm - Earley - 1983 |

376 | Hierarchical phrase-based translation
- Chiang
(Show Context)
Citation Context ...extended with semirings (Goodman, 1999), is an established formalism used in parsing. It is occasionally used to describe formally syntactic translation models, but these treatments tend to be brief (=-=Chiang, 2007-=-; Venugopal et al., 2007; Dyer et al., 2008; Melamed, 2004). We apply weighted deduction much more thoroughly, first extending it to phrasebased models and showing that the set of search strategies us... |

165 | Principles and implementation of deductive parsing - Shieber, Schabes, et al. - 1995 |

138 |
Parsing as deduction
- Pereira, Warren
- 1983
(Show Context)
Citation Context ...hms do the same thing: dynamic programming search over a space of weighted rules (§2). Fortunately, we need not search far for modular descriptions of dynamic programming algorithms. Deductive logic (=-=Pereira and Warren, 1983-=-), extended with semirings (Goodman, 1999), is an established formalism used in parsing. It is occasionally used to describe formally syntactic translation models, but these treatments tend to be brie... |

128 |
Algorithm schemata and data structures in syntactic processing
- Kay
- 1986
(Show Context)
Citation Context ...ched the surface, and we believe it is possibly to unify a wide variety of translation algorithms. For example, we believe that cube pruning can be described as an agenda discipline in chart parsing (=-=Kay, 1986-=-). Although the work presented here is abstract, our motivation is practical. Isolating the errors in translation systems is a difficult task which can be made easier by describing and analyzing model... |

121 | A smorgasbord of features for statistical machine translation - Och, Gildea, et al. - 2004 |

120 | An end-to-end discriminative approach to machine translation
- Liang, Bouchard, et al.
- 2006
(Show Context)
Citation Context ...the translation model to generate only derivations that produce that sentence. Alignment is often used in training both generative and discriminative models (Brown et al., 1993; Blunsom et al., 2008; =-=Liang et al., 2006-=-). Our approach to alignment is similar to the one for language modeling. First, we implement a logic requiring aninput to be identical to the reference. item form: [j] goal: [J] rule: [j] [j + 1] ej... |

100 | Directed hypergraphs and applications
- Gallo, Longo, et al.
- 1993
(Show Context)
Citation Context ...oding is a search heuristic that simplifies the complexity of searching a minimal logic. Each item is associated with a stack whose signa12 Specifically a B-hypergraph, equivalent to an and-or graph (=-=Gallo et al., 1993-=-) or context-free grammar (Nederhof, 2003). In the degenerate case, this is simply a graph, as is the case with most phrase-based models.item forms: [i, ue • ve], [A, i, ue • ve, i ′ , u ′ e • v ′ e]... |

84 | A Polynomial-Time Algorithm for Statistical Machine Translation
- Wu
- 1996
(Show Context)
Citation Context ...we consider the CKY algorithm for context-free parsing, a common example that we will revisit in §6.2. It is also relevant since it can form the basis of a decoder for inversion transduction grammar (=-=Wu, 1996-=-). In the discussion that follows, we use A, B, and C to denote arbitrary nonterminal symbols, S to denote the start nonterminal symbol, and a to denote a terminal symbol. CKY works on grammars in Cho... |

65 | On the complexity analysis of static analyses
- McAllester
- 2002
(Show Context)
Citation Context ... ⎧ R(A → ak) ⎪⎨ [A, k − 1, k] rules: ⎪⎩ R(A → BC) [B, k, k ′′ ] [C, k ′′ , k ′ ] [A, k, k ′ ] (Logic CKY) A benefit of this declarative description is that complexity can be determined by inspection (=-=McAllester, 1999-=-). We elaborate on complexity in §7, but for now it suffices to point out that the number of possible items and possible deductions depends on the product of the domains of the free variables. For exa... |

64 | Semiring parsing
- Goodman
- 1999
(Show Context)
Citation Context ...er a space of weighted rules (§2). Fortunately, we need not search far for modular descriptions of dynamic programming algorithms. Deductive logic (Pereira and Warren, 1983), extended with semirings (=-=Goodman, 1999-=-), is an established formalism used in parsing. It is occasionally used to describe formally syntactic translation models, but these treatments tend to be brief (Chiang, 2007; Venugopal et al., 2007; ... |

64 | Statistical machine translation by parsing
- Melamed
- 2004
(Show Context)
Citation Context ...d formalism used in parsing. It is occasionally used to describe formally syntactic translation models, but these treatments tend to be brief (Chiang, 2007; Venugopal et al., 2007; Dyer et al., 2008; =-=Melamed, 2004-=-). We apply weighted deduction much more thoroughly, first extending it to phrasebased models and showing that the set of search strategies used by these models have surprisingly different implication... |

56 | Parsing and hypergraphs - Klein, Manning - 2001 |

52 |
Coping with syntactic ambiguity or how to put the block in the box on the table
- Church, Patil
- 1982
(Show Context)
Citation Context ...m: all rules are either binary as in A → BC, or unary as in A → a. The number of possible binary-branching parses of a sentence is defined by the Catalan number, an exponential combinatoric function (=-=Church and Patil, 1982-=-), so dynamic programming is crucial for efficiency. CKY computes all parses in cubic time by reusing subparses. To parse a sentence a1...aK, we compute a set of items in the 1 The true noisy channel ... |

51 | Forest rescoring: Faster decoding with integrated language models - Huang, Chiang - 2007 |

46 | A discriminative latent variable model for statistical machine translation
- Blunsom, Cohn, et al.
- 2008
(Show Context)
Citation Context ...ver all possible derivations D) of the algebraic expression in Equation 1. We might also want to calculate the total probability of all possible derivations, which is useful for parameter estimation (=-=Blunsom et al., 2008-=-). We can do this using the following equation. p(C) = p(C) + (p(A1) × ... × p(AL)) (4)Equations 3 and 4 are quite similar. This suggests a useful generalization: semiring-weighted deduction (Goodman... |

44 |
Generalizing word lattice translation
- Dyer, Muresan, et al.
- 2008
(Show Context)
Citation Context ...), is an established formalism used in parsing. It is occasionally used to describe formally syntactic translation models, but these treatments tend to be brief (Chiang, 2007; Venugopal et al., 2007; =-=Dyer et al., 2008-=-; Melamed, 2004). We apply weighted deduction much more thoroughly, first extending it to phrasebased models and showing that the set of search strategies used by these models have surprisingly differ... |

41 | Syntax-based language models for statistical machine translation
- Charniak, Knight, et al.
- 2003
(Show Context)
Citation Context ...ive an alignment logic for ITG from the product of two CKY logics. 6.2 Translation Model Design A motivation for many syntax-based translation models is to use target-side syntax as a language model (=-=Charniak et al., 2003-=-). Och et al. (2004) showed that simply parsing the N-best outputs of a phrase-based model did not work; to obtain the full power of a language model, we need to integrate it into the search process. ... |

41 | Local phrase reordering models for statistical machine translation - Kumar, Byrne - 2005 |

24 |
Compiling Comp Ling: Weighted Dynamic Programming and the Dyna Language
- Eisner, Goldlust, et al.
- 2005
(Show Context)
Citation Context ...(e.g. boolean), but not others. Therefore, non-local parameterizations break semiringweighted deduction, because we can no longer use 7 General weighted deduction subsumes semiringweighted deduction (=-=Eisner et al., 2005-=-; Eisner and Blatz, 2006; Nederhof, 2003), but semiring-weighted deduction covers all translation models we are aware of, so it is a good first step in applying weighted deduction to translation. 8 Se... |

21 | Faster beam-search decoding for phrasal statistical machine translation - Moore, Quirk - 2007 |

21 | An efficient two-pass approach to synchronous-CFG driven statistical MT
- Venugopal, Zollmann, et al.
- 2007
(Show Context)
Citation Context ...semirings (Goodman, 1999), is an established formalism used in parsing. It is occasionally used to describe formally syntactic translation models, but these treatments tend to be brief (Chiang, 2007; =-=Venugopal et al., 2007-=-; Dyer et al., 2008; Melamed, 2004). We apply weighted deduction much more thoroughly, first extending it to phrasebased models and showing that the set of search strategies used by these models have ... |

16 | Statistical machine reordering - Costa-jussà, Fonollosa - 2006 |

11 | Program transformations for optimization of parsing algorithms and other weighted logic programs - Eisner, Blatz - 2007 |

10 | A systematic analysis of translation model search spaces
- Auli, Lopez, et al.
- 2009
(Show Context)
Citation Context ... presented here is abstract, our motivation is practical. Isolating the errors in translation systems is a difficult task which can be made easier by describing and analyzing models in a modular way (=-=Auli et al., 2009-=-). Furthermore, building large-scale translation systems from scratch should be unnecessary if existing systems were built using modular logics and algorithms. We aim to build such systems. Acknowledg... |

4 | Dynamic programming algorithms as products of weighted logic programs
- Cohen, Simmons, et al.
(Show Context)
Citation Context ...atz (2006) give an alternate strategy for the best derivation. the same logic under all semirings. We need new logics; for this we will use a logic programming transform called the PRODUCT transform (=-=Cohen et al., 2008-=-). We first define a logic for the non-local parameterization. The logic for an n-gram language model generates sequence e1...eQ by generating each new word given the past n − 1 words. 10 item form: [... |

4 |
Word reordering and a dynamic programming beam search algorithm for statistical machine translation
- Tillman, Ney
- 2003
(Show Context)
Citation Context ...nts) depending on how much the projection reduces the search space. In many phrase-based implementations the stack signature is just the number of words translated, but other strategies are possible (=-=Tillman and Ney, 2003-=-). It is worth noting that logic FdUW (§3.2), depends on stack pruning for speed. Because the number of stacks is linear in the length of the input, so is the number of unpruned nodes in the search gr... |

1 | Approximation semirings: Dynamic programming with non-local features - Gimpel, Smith - 2009 |