## Getting the structure right for word alignment: Leaf (2007)

### Cached

### Download Links

Venue: | In Proc. of EMNLP |

Citations: | 13 - 1 self |

### BibTeX

@INPROCEEDINGS{Fraser07gettingthe,

author = {Alexander Fraser and Daniel Marcu},

title = {Getting the structure right for word alignment: Leaf},

booktitle = {In Proc. of EMNLP},

year = {2007},

pages = {51--60}

}

### OpenURL

### Abstract

Word alignment is the problem of annotating parallel text with translational correspondence. Previous generative word alignment models have made structural assumptions such as the 1-to-1, 1-to-N, or phrase-based consecutive word assumptions, while previous discriminative models have either made such an assumption directly or used features derived from a generative model making one of these assumptions. We present a new generative alignment model which avoids these structural limitations, and show that it is effective when trained using both unsupervised and semi-supervised training methods. 1

### Citations

1253 | A systematic comparison of various statistical alignment models
- Och, Ney
- 2003
(Show Context)
Citation Context ...esult in infeasible alignment structures. Our model has deficiency in the non-spurious target word placement, just as Model 4 does. It has additional deficiency in the source word linking decisions. (=-=Och and Ney, 2003-=-) presented results suggesting that the additional parameters required to ensure that a model is not deficient result in inferior performance, but we plan to study whether this is the case for our gen... |

1173 | The mathematics of statistical machine translation: Parameter estimation
- Brown, Pietra, et al.
- 1994
(Show Context)
Citation Context ...is at the cost of additional deficiency. 2.2 Unsupervised Parameter Estimation We can perform maximum likelihood estimation of the parameters of this model in a similar fashion 54 to that of Model 4 (=-=Brown et al., 1993-=-), described thoroughly in (Och and Ney, 2003). We use Viterbi training (Brown et al., 1993) but neighborhood estimation (Al-Onaizan et al., 1999; Och and Ney, 2003) or “pegging” (Brown et al., 1993) ... |

634 | Statistical Phrase-Based Translation
- Koehn, Marcu
- 2003
(Show Context)
Citation Context ...nment shown to the left in Figure 2 to the alignment shown to the right. This operation does not change the collection of phrases or rules extracted from a hypothesized alignment, see, for instance, (=-=Koehn et al., 2003-=-). Working with this fully interlinked representation we found that the best settings of α were α = 0.1 for the Arabic/English task and α = 0.4 for the French/English task. 4 Experiments 4.1 Data Sets... |

427 | Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora
- Wu
- 1997
(Show Context)
Citation Context ...ad word link structure, which is both symmetric and a robust structure for modeling of nonconsecutive M-to-N alignments. In designing LEAF, we were also inspired by dependency-based alignment models (=-=Wu, 1997-=-; Alshawi et al., 2000; Yamada and Knight, 2001; Cherry and Lin, 2003; Zhang and Gildea, 2004). In contrast with their approaches, we have a very flat, one-level notion of dependency, which is bilingu... |

367 | Ney (2002), Discriminative training and maximum entropy models for statistical machine translation
- Och, Hermann
(Show Context)
Citation Context ...ive model. Our work is most similar to work using discriminative log-linear models for alignment, which is similar to discriminative log-linear models used for the SMT decoding (translation) problem (=-=Och and Ney, 2002-=-; Och, 2003). (Liu et al., 2005) presented a log-linear model combining IBM Model 3 trained in both directions with heuristic features which resulted in a 1-to-1 alignment. (Fraser and Marcu, 2006b) d... |

363 | A Hierarchical Phrase-Based Model for Statistical Machine Translation
- Chiang
- 2005
(Show Context)
Citation Context ... LEAF semi-supervised system (line 4), with a gain of 5.4 F-Measure over the baseline semi-supervised system. For Arabic/English translation we train a state of the art hierarchical model similar to (=-=Chiang, 2005-=-) using our Viterbi alignments. The translation test data used is described in Table 2. We use two trigram language models, one built using the English portion of the training data and the other built... |

349 | The Alignment Template Approach to Statistical Machine Translation - Och - 2004 |

255 | A syntax-based statistical translation model
- Yamada, Knight
- 2001
(Show Context)
Citation Context ... both symmetric and a robust structure for modeling of nonconsecutive M-to-N alignments. In designing LEAF, we were also inspired by dependency-based alignment models (Wu, 1997; Alshawi et al., 2000; =-=Yamada and Knight, 2001-=-; Cherry and Lin, 2003; Zhang and Gildea, 2004). In contrast with their approaches, we have a very flat, one-level notion of dependency, which is bilingually motivated and learned automatically from t... |

147 | Alignment by agreement
- Liang, Taskar, et al.
- 2006
(Show Context)
Citation Context ...ended in (Koehn et al., 2003). We have used insights from these works to help determine the structure of our generative model. (Zens et al., 2004) introduced a model featuring a symmetrized lexicon. (=-=Liang et al., 2006-=-) showed how to train two HMM models, a 1-to-N model and a M-to-1 model, to agree in predicting all of the links generated, resulting in a 1-to-1 alignment with occasional rare 1to-N or M-to-1 links. ... |

114 | Decoding complexity in word replacement translation models
- Knight
- 1999
(Show Context)
Citation Context ... training we search for the Viterbi solution for millions of sentences. Evidence that inference over the space of all possible alignments is intractable has been presented, for a similar problem, in (=-=Knight, 1999-=-). Unlike phrasebased SMT, left-to-right hypothesis extension using a beam decoder is unlikely to be effective because in word alignment reordering is not limited to a small local window and so the ne... |

112 | Fast decoding and optimal decoding for machine translation
- Germann, Jahr, et al.
- 2001
(Show Context)
Citation Context ...glish head word links of two French head words, link English word to French word making new head words, unlink English and French head words. We use multiple restarts to try to reduce search errors. (=-=Germann et al., 2004-=-; Marcu and Wong, 2002) have some similar operations without the head word distinction. 3 Semi-supervised parameter estimation Equation 6 defines a log-linear model. Each feature function hm has an as... |

57 | A maximum entropy word aligner for arabic-english machine translation
- Ittycheriah, Roukos
- 2005
(Show Context)
Citation Context ...de the prediction of some type of generative model, such as the HMM model or Model 4. A discriminatively trained 1-to-N model with feature functions specifically designed for Arabic was presented in (=-=Ittycheriah and Roukos, 2005-=-). (Lacoste-Julien et al., 2006) created a discriminative model able to model 1-to-1, 1-to2 and 2-to-1 alignments for which the best results were obtained using features based on symmetric HMMs traine... |

45 | A probability model to improve word alignment
- Cherry, Lin
- 2003
(Show Context)
Citation Context ...ust structure for modeling of nonconsecutive M-to-N alignments. In designing LEAF, we were also inspired by dependency-based alignment models (Wu, 1997; Alshawi et al., 2000; Yamada and Knight, 2001; =-=Cherry and Lin, 2003-=-; Zhang and Gildea, 2004). In contrast with their approaches, we have a very flat, one-level notion of dependency, which is bilingually motivated and learned automatically from the parallel corpus. Th... |

40 | Log-linear models for word alignment
- Liu, Liu, et al.
- 2005
(Show Context)
Citation Context ...lar to work using discriminative log-linear models for alignment, which is similar to discriminative log-linear models used for the SMT decoding (translation) problem (Och and Ney, 2002; Och, 2003). (=-=Liu et al., 2005-=-) presented a log-linear model combining IBM Model 3 trained in both directions with heuristic features which resulted in a 1-to-1 alignment. (Fraser and Marcu, 2006b) described symmetrized training o... |

37 | HMM word and phrase alignment for statistical machine translation
- Deng, Byrne
- 2005
(Show Context)
Citation Context ...based on the HMM model (Vogel et al., 1996). (Toutanova et al., 2002) and (Lopez and Resnik, 2005) presented a variety of refinements of the HMM model particularly effective for low data conditions. (=-=Deng and Byrne, 2005-=-) described work on extending the HMM model using a bigram formulation to generate 1-to-N alignment structure. The common thread connecting these works is their reliance on the 1-to-N approximation, w... |

34 | Word alignment via quadratic assignment
- Lacoste-Julien, Taskar, et al.
- 2006
(Show Context)
Citation Context ...f generative model, such as the HMM model or Model 4. A discriminatively trained 1-to-N model with feature functions specifically designed for Arabic was presented in (Ittycheriah and Roukos, 2005). (=-=Lacoste-Julien et al., 2006-=-) created a discriminative model able to model 1-to-1, 1-to2 and 2-to-1 alignments for which the best results were obtained using features based on symmetric HMMs trained to agree, (Liang et al., 2006... |

19 | Modeling with structures in statistical machine translation
- Wang, Waibel
- 1998
(Show Context)
Citation Context ...ined a generative model which does not require use of this approximation, at the cost of having to rely on local search. There has also been work on generative models for other alignment structures. (=-=Wang and Waibel, 1998-=-) introduced a generative story based on extension of the generative story of Model 4. The alignment structure modeled was “consecutive M to non-consecutive N”. (Marcu and Wong, 2002) defined the Join... |

10 | A maximum entropy approach to combining word alignments
- Ayan, Dorr
- 2006
(Show Context)
Citation Context ...nd 2-to-1 alignments for which the best results were obtained using features based on symmetric HMMs trained to agree, (Liang et al., 2006), and Table 3: Experimental Results 58 intersected Model 4. (=-=Ayan and Dorr, 2006-=-) defined a discriminative model which learns how to combine the predictions of several alignment algorithms. The experiments performed included Model 4 and the HMM extensions of (Lopez and Resnik, 20... |

6 | discriminative bilingual word alignment - Improved |

1 | Measuring word alignment quality for statistical machine translation - 2006a |

1 | Semisupervised training for statistical word alignment - 2006b |

1 | Symmetric word alignments for statistical machine translation - Moore, Yih, et al. - 2004 |

1 | word alignment using a symmetric lexicon model - Improved |