## Hierarchical Phrase-Based Translation Representations

### Cached

### Download Links

Citations: | 6 - 2 self |

### BibTeX

@MISC{Iglesias_hierarchicalphrase-based,

author = {Gonzalo Iglesias and Cyril Allauzen and William Byrne and Adrià Gispert and Michael Riley},

title = {Hierarchical Phrase-Based Translation Representations},

year = {}

}

### OpenURL

### Abstract

This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-to-English translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance. 1

### Citations

404 | Hierarchical phrase-based translation
- Chiang
- 2007
(Show Context)
Citation Context ...on Hierarchical phrase-based translation, using a synchronous context-free translation grammar (SCFG) together with an n-gram target language model (LM), is a popular approach in machine translation (=-=Chiang, 2007-=-). Given a SCFG G and an ngram language model M, this paper focuses on how to decode with them, i.e. how to apply them to the source text to generate a target translation. Decoding has three basic ste... |

124 | Large Language Models for Machine Translation
- Brants, Popat, et al.
(Show Context)
Citation Context ...LDC2009T13). This is a total of 1.3B words. We will call this language model M1. For large language model rescoring we also use the LM M2 obtained by interpolatingM1 with a zero-cutoff stupidbackoff (=-=Brants et al., 2007-=-) 5-gram estimated using 6.6B words of English newswire text. We next describe how we build translation systems using entropy-pruned language models. 1. We build a baseline HiFST system that usesM1 an... |

121 |
On Formal Properties of Simple Phrase Structure Grammars. Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung
- Bar-Hillel, Perles, et al.
- 1961
(Show Context)
Citation Context ...d show the result of the algorithm when applied to the PDA of Figure 1c. 2.3 Intersection Algorithm The class of weighted pushdown automata is closed under intersection with weighted finite automata (=-=Bar-Hillel et al., 1964-=-; Nederhof and Satta, 2003). Considering a pair (T1,T2) where one element is an FSA and the other element a PDA, then there exists a PDA T1∩T2, the intersection of T1 and T2, such that for all x ∈ Σ∗ ... |

116 |
Transductions and Context-Free Languages. Teubner
- BERSTEL
- 1979
(Show Context)
Citation Context ...dding a stack alphabet and labeling each transition with a stack operation (a stack symbol to be pushed onto, popped or read from the stack) in additon to the usual input label (Aho and Ullman, 1972; =-=Berstel, 1979-=-) and weight (Kuich and Salomaa, 1986; Petre and Salomaa, 2009). Our equivalent representation allows a transition to be labeled by a stack operation or a regular input symbol but not both. Stack oper... |

61 | Dual decomposition for parsing with non-projective head automata
- Koo, Rush, et al.
- 2010
(Show Context)
Citation Context ...e cube pruning and FSA implementation was presented by Iglesias et al. (2009a) and de Gispert et al. (2010). Relaxation techniques have also recently been shown to finding exact solutions in parsing (=-=Koo et al., 2010-=-) and in SMT with tree-to-string translation grammars and trigram language models (Rush and Collins, 2011), much smaller models compared to the work presented in this paper. Although entropy-pruned la... |

58 | Rescoring: Faster Decoding with Integrated Language Models - Forest - 2007 |

40 | HMM Word and Phrase Alignment for Statistical Machine Translation
- Deng, Byrne, et al.
(Show Context)
Citation Context ...ained with different θ values (top row). tems, standard MERT (Och, 2003) iterative parameter estimation under IBM BLEU4 is performed on the development set. The parallel corpus is aligned using MTTK (=-=Deng and Byrne, 2008-=-) in both source-to-target and target-to-source directions. We then follow standard heuristics (Chiang, 2007) and filtering strategies (Iglesias et al., 2009b) to extract hierarchical phrases from the... |

31 | Wojciech Skut, and Mehryar Mohri. 2007. OpenFst: A general and efficient weighted finite-state transducer library - Allauzen, Riley, et al. |

29 | Hierarchical phrase-based translation with weighted finite state transducers - Iglesias, Gispert, et al. - 2009 |

26 | Machine translation as lexicalized parsing with hooks - Huang, Zhang, et al. - 2005 |

20 |
2003, ‘Probabilistic parsing as intersection
- Nederhof, Satta
(Show Context)
Citation Context ...algorithm when applied to the PDA of Figure 1c. 2.3 Intersection Algorithm The class of weighted pushdown automata is closed under intersection with weighted finite automata (Bar-Hillel et al., 1964; =-=Nederhof and Satta, 2003-=-). Considering a pair (T1,T2) where one element is an FSA and the other element a PDA, then there exists a PDA T1∩T2, the intersection of T1 and T2, such that for all x ∈ Σ∗ : (T1∩T2)(x) = T1(x)+T2(x)... |

19 | Weighted automata algorithms
- Mohri
- 2009
(Show Context)
Citation Context ...automaton representations Tf and Lf. In this case, weighted finite-state intersection and single-source shortest path algorithms (using negative log probabilities) can be used to solve Steps 2 and 3 (=-=Mohri, 2009-=-). This is the approach taken in (Iglesias et al., 2009a; de Gispert et al., 2010). Instead T and L can be represented by hypergraphs Th and Lh (or very similarly context-free rules, and-or trees, or ... |

10 | Study on interaction between entropy pruning and Kneser-Ney smoothing
- Chelba, Brants, et al.
- 2010
(Show Context)
Citation Context ...d system 4 fails in 95 cases. Interestingly, some of these sentences would require impractically huge beams. This might be due to the Kneser-Ney smoothing, which interacts badly with entropy pruning (=-=Chelba et al., 2010-=-). 6 Hiero with PDAs and FSAs In this section we contrast HiFST with HiPDT under the same translation grammar and entropy-pruned language models. Under the constrained grammar G1 their performance is ... |

10 | Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
- Rush, Collins
- 2011
(Show Context)
Citation Context .... (2010). Relaxation techniques have also recently been shown to finding exact solutions in parsing (Koo et al., 2010) and in SMT with tree-to-string translation grammars and trigram language models (=-=Rush and Collins, 2011-=-), much smaller models compared to the work presented in this paper. Although entropy-pruned language models have been used to produce real-time translation systems (Prasad et al., 2007), we believe o... |

4 |
Hierarchical phrase-based translation with weighted finite state transducers
- 2009a
(Show Context)
Citation Context ...he queue Ss, initially containing s. While the queue is not empty, a state is dequeued and its outgoing transitions examined (line 5-9). Transitions labeled by non-parenthesis are treated as in Mohri =-=(2009)-=- (line 9-10). When the considered transitione is labeled by a close parenthesis, it is remembered that it balances all incoming open parentheses in s labeled by i[e] by adding e to B[s,i[e]] (line 11-... |

3 |
Rule filtering by pattern for efficient hierarchical translation
- 2009b
(Show Context)
Citation Context ...he queue Ss, initially containing s. While the queue is not empty, a state is dequeued and its outgoing transitions examined (line 5-9). Transitions labeled by non-parenthesis are treated as in Mohri =-=(2009)-=- (line 9-10). When the considered transitione is labeled by a close parenthesis, it is remembered that it balances all incoming open parentheses in s labeled by i[e] by adding e to B[s,i[e]] (line 11-... |

3 |
Real-time speech-tospeech translation for pdas
- Prasad, Krstovski, et al.
- 2007
(Show Context)
Citation Context ...age models (Rush and Collins, 2011), much smaller models compared to the work presented in this paper. Although entropy-pruned language models have been used to produce real-time translation systems (=-=Prasad et al., 2007-=-), we believe our use of entropy-pruned language models in two-pass translation to be novel. This is an approach that is widelyused in automatic speech recognition (Ljolje et al., 1999) and we note th... |

1 | A weighted finite state transducer translation template model for statistical machine translation - Ljolje, Pereira, et al. - 2006 |

1 | Drosde et al. (Drosde et al - In |

1 | and Arto Salomaa. 2009. Algebraic systems and pushdown automata - Petre |