## Binarization of Synchronous Context-Free Grammars

### Cached

### Download Links

Citations: | 24 - 5 self |

### BibTeX

@MISC{Huang_binarizationof,

author = {Liang Huang and Hao Zhang and Daniel Gildea and Kevin Knight},

title = {Binarization of Synchronous Context-Free Grammars},

year = {}

}

### OpenURL

### Abstract

Systems based on synchronous grammars and tree transducers promise to improve the quality of statistical machine translation output, but are often very computationally intensive. The complexity is exponential in the size of individual grammar rules due to arbitrary re-orderings between the two languages. We develop a theory of binarization for synchronous context-free grammars and present a linear-time algorithm for binarizing synchronous rules when possible. In our large-scale experiments, we found that almost all rules are binarizable and the resulting binarized rule set significantly improves the speed and accuracy of a state-of-the-art syntaxbased machine translation system. We also discuss the more general, and computationally more difficult, problem of finding good parsing strategies for non-binarizable rules, and present an approximate polynomial-time algorithm for this problem. 1.

### Citations

8843 |
Introduction to Algorithms
- Cormen, Leiserson, et al.
(Show Context)
Citation Context ...g the canonical binarization tree for a, which is either [bi(b), bi(c)] or 〈bi(b), bi(c)〉. 3. The running time of Algorithm 1 (regardless of success or failure) is linear in n: By amortized analysis (=-=Cormen, Leiserson, and Rivest, 1990-=-), there are exactly n shifts and at most n − 1 reductions, and each shift or reduction takes O(1) time. So the total time complexity is O(n). 4.2 Binarizing tree transducers Without loss of generalit... |

655 | Synchronous Tree Adjoining Grammars - Shieber, Schabes - 1990 |

462 | Stochastic inversion transduction grammars and bilingual parsing of parallel corpora
- Wu
- 1997
(Show Context)
Citation Context ...difficulty of binarization for efficient synchronous parsing. One way around this difficulty is to stipulate that all rules must be binary from the outset, as in inversion-transduction grammar (ITG) (=-=Wu, 1997-=-) and the binary synchronous context-free grammar (SCFG) employed by the Hiero system (Chiang, 2005) to model the hierarchical phrases. In contrast, the rule extraction method of (Galley et al., 2004)... |

373 |
The alignment template approach to statistical machine translation
- Och, Ney
- 2004
(Show Context)
Citation Context ...36.25 synchronous binarization 38.44 alignment-template system 37.00 We also compare the top result of our synchronous binarization system with the state-of-the-art alignment-template approach (ATS) (=-=Och and Ney, 2004-=-). The results are shown in Table 1. Our system has a promising improvement over the ATS system which is trained on a larger data-set but tuned independently. A larger-scale system based on our best r... |

332 |
Complexity of finding embeddings in a k-tree
- Arnborg, Corneil, et al.
- 1987
(Show Context)
Citation Context ...elate the problem of finding the optimal parsing strategy for a rule to computing the treewidth of a graph derived from the rule’s permutation. Computing treewidth of arbitrary graphs is NP-complete (=-=Arnborg, Corneil, and Proskurowski, 1987-=-), but the graphs derived from SCFG permutations have a restricted structure that it might be possible to exploit. In particular, the graphs have degree no greater than six. While computing treewidth ... |

255 |
Efficient Algorithm for Determining the Convex Hull of a Finite Planar
- Graham
- 1972
(Show Context)
Citation Context ...adjacent subsequences whenever possible. This procedure produces canonical binarization trees and runs in O(n 2 ) time since we need n passes in the worst case. Inspired by the Graham Scan Algorithm (=-=Graham, 1972-=-) for computing Convex-Hulls from Computational Geometry, we modify this procedure and improve it into a linear-time algorithm that only needs one pass through the sequence. The skeleton binarization ... |

238 | What’s in a translation rule
- Galley, Hopkins, et al.
- 2004
(Show Context)
Citation Context ... strategies for non-binarizable rules, and present an approximate polynomial-time algorithm for this problem. 1. Introduction Several recent syntax-based models for machine translation (Chiang, 2005; =-=Galley et al., 2004-=-) can be seen as instances of the general framework of synchronous grammars and tree transducers. In this framework, both alignment (synchronous parsing) and decoding can be thought of as parsing prob... |

206 |
The Theory of Parsing
- Aho, Ullman
- 1972
(Show Context)
Citation Context ...-ary SCFG. However, not every SCFG can be binarized. In fact, the binarizability of an n-ary rule is determined by the structure of its permutation, which can sometimes be resistant to factorization (=-=Aho and Ullman, 1972-=-). We now turn to rigorously defining the binarizability of permutations. 6s4 [[1, 2], 〈4, 3〉] [1, [2, 〈4, 3〉]] 4 2 3 [1, 2] 〈4, 3〉 1 [2, 〈4, 3〉] 2 〈4, 3〉 2 3 1 1 2 4 3 4 3 1 (a) (b) (c) (d) Figure 4 ... |

153 | Better k-best parsing
- Huang, Chiang
- 2005
(Show Context)
Citation Context ...ecoding is rescoring, where one first computes the k-best translations according to the TM only, and then rerank the k-best list with the language model costs. This method runs very fast in practice (=-=Huang and Chiang, 2005-=-), but often produces a considerable number of search errors since the true best translation is often outside of the k-best list, especially for longer sentences. 4sS x1:NP x2:PP x3:VP → S x1 VP x3 x2... |

121 |
Mappings and grammars on trees
- Rounds
- 1970
(Show Context)
Citation Context ...em of Chiang (2005), which restricts the hierarchical phrases to form binary-branching SCFG rules. The same reasoning applies to tree transducer rules. Suppose we have the following transducer rules (=-=Rounds, 1970-=-; Galley et al., 2004): (5) S(x1:NP x2:PP x3:VP) → S(x1 VP(x3 x2)) NP(Baoweier) → NP(NNP(Powell)) VP(juxing le huitan) → VP(VBD(held) NP(DT(a) NPS(meeting))) PP(yu Shalong) → PP(TO(with) NP(NNP(Sharon... |

89 | A polynomial-time algorithm for statistical machine translation
- Wu
- 1996
(Show Context)
Citation Context ...oding, we need to augment each chart item (X, i, � j) with two target-language boundary words u u ··· v � and v to produce a bigram-item like X , following the dynamic programming i j 3salgorithm of (=-=Wu, 1996-=-). 1 Now the two binarizations have very different effects. In the first case, we first combine NP with PP: � � � � Powell ··· Powell with ··· Sharon NP : p PP : q 2 4 1 2 � Powell ··· Powell ··· with... |

64 |
A Generalization of Dijkstra’s Algorithm
- Knuth
- 1977
(Show Context)
Citation Context ...thm is O(3 n ). While this is exponential in n, it is a significant improvement over considering all recursive partitions. The algorithm can be improved by adopting a best-first exploration strategy (=-=Knuth, 1977-=-), in which dynamic programming items are placed on a priority queue sorted according to their complexity, and only used to build further items after all items of lower complexity have been exhausted.... |

43 | Log-linear models for word alignment - Liu, Liu, et al. - 2005 |

40 | Independent parallelism in finite copying parallel rewriting systems - Rambow, Satta - 1999 |

33 | Empirical lower bounds on the complexity of translational equivalence - Wellington, Waxmonsky, et al. - 2006 |

26 | Machine translation as lexicalized parsing with hooks
- Huang, Zhang, et al.
- 2005
(Show Context)
Citation Context ...m integrated decoding, we have to maintain 2(m − 1) boundary words for each child nonterminal, which leads to a prohibitive overall complexity of O(|w| 3+2n(m−1) ), which is exponential in rule size (=-=Huang, Zhang, and Gildea, 2005-=-). Aggressive pruning must be used to make it tractable in practice, which in general introduces many search errors and adversely affects translation quality. In the second case, however: � � with ···... |

23 | Synchronous grammars as tree transducers - SHIEBER - 2004 |

19 | Thilikos, Treewidth for graphs with small chordality - Bodlaender, M - 1997 |

18 | Vier combinatorische Probleme. Zeitschrift für Mathematik und Physik - Schröder |

18 | Bootstrap percolation, the Schröder numbers, and the n-kings problem - Shapiro, Stephens - 1991 |

15 | Weighted deductive parsing and Knuth’s algorithm - Nederhof - 2003 |

10 | synchronous binarization, and target-side binarization - Binarization |

9 | Factorization of synchronous context-free grammars in linear time - Zhang, Gildea - 2007 |

2 | Worst-case synchronous grammar rules - Gildea, ˇStefankovič - 2007 |

1 | Scalable inference and training of context-rich syntactic translation models - Michel, Knight, et al. - 2006 |