## Disambiguation Strategies for Data-Oriented Translation (2006)

### Cached

### Download Links

Venue: | Proceedings of the 11th Conference of the European Association for Machine Translation |

Citations: | 15 - 6 self |

### BibTeX

@INPROCEEDINGS{Hearne06disambiguationstrategies,

author = {Mary Hearne and Andy Way},

title = {Disambiguation Strategies for Data-Oriented Translation},

booktitle = {Proceedings of the 11th Conference of the European Association for Machine Translation},

year = {2006},

pages = {59--68}

}

### OpenURL

### Abstract

The Data-Oriented Translation (DOT) model – originally proposed in (Poutsma, 1998, 2003) and based on Data-Oriented Parsing (DOP) (e.g. (Bod, Scha, & Sima’an, 2003)) – is best described as a hybrid model of translation as it combines examples, linguistic information and a statistical translation model. Although theoretically interesting, it inherits the computational complexity associated with DOP. In this paper, we focus on one computational challenge for this model: efficiently selecting the ‘best’ translation to output. We present four different disambiguation strategies in terms of how they are implemented in our DOT system, along with experiments which investigate how they compare in terms of accuracy and efficiency. 1

### Citations

1468 | A Systematic Comparison of Various Statistical Alignment Models
- Och, Ney
(Show Context)
Citation Context ...ing into English rather than into French because boundary friction problems are less prevalent. The BLEU and F-score measures indicate – with the exception of 8 Training was carried out using Giza++ (=-=Och & Ney, 2003-=-) downloaded from http://www.fjoch.com/GIZA++.html. Translations were generated using the ISI ReWrite Decoder (Germann, Jahr, Knight, Marcu, & Yamada, 2001) downloaded from http://www.isi.edu/licensed... |

158 |
Beyond Grammar: An Experience-Based Theory of Language
- Bod
- 1998
(Show Context)
Citation Context ... the MPT by ranking the possible translations according to how often each one occurs in a reduced random sample of the possible derivations. This approach to disambiguation was introduced for DOP in (=-=Bod, 1998-=-) and further expanded on and refined in (Chappelier & Rajman, 2003); application of the algorithms proposed by Chappelier and Rajman (op. cit.) to translation was presented in (Hearne, 2005). The sam... |

131 | Fast decoding and optimal decoding for machine translation
- Germann, Jahr, et al.
- 2001
(Show Context)
Citation Context ...indicate – with the exception of 8 Training was carried out using Giza++ (Och & Ney, 2003) downloaded from http://www.fjoch.com/GIZA++.html. Translations were generated using the ISI ReWrite Decoder (=-=Germann, Jahr, Knight, Marcu, & Yamada, 2001-=-) downloaded from http://www.isi.edu/licensedsw/rewrite-decoder/ and the CMU-Cambridge Statistical Language Modeling toolkit (Clarkson & Rosenfeld, 1997) downloaded from http://mi.eng.cam.ac.uk/˜prc14... |

105 | Computational complexity of probabilistic disambiguation - Sima’an |

36 | Parsing with the shortest derivation
- Bod
- 2000
(Show Context)
Citation Context ... the fragment starting the highest-scoring sub-derivation is retained. Although the search for SDER does not involve actually estimating probabilities, the Viterbi algorithm can nevertheless be used (=-=Bod, 2000-=-). Derivation lengths are computed by assigning each fragment equal probability, meaning that the shortest derivation can be computed as the most probable one using Viterbi: if each fragment has proba... |

32 | An optimized algorithm for Data Oriented Parsing - Sima'an - 1996 |

31 | An efficient implementation of a new DOP model
- Bod
(Show Context)
Citation Context ...anslation accuracy than searching for the shortest derivation if parameter estimation is improved. We also intend to experiment with combining probabilities with SDER ranking, as proposed for DOP in (=-=Bod, 2003-=-). We are currently carrying out empirical evaluation of DOT on much larger datasets than heretofore, and for different language pairs. Finally, DOT models can also be defined for representations corr... |

28 | Seeing the Wood for the Trees: Data-Oriented Translation
- Hearne, Way
- 2003
(Show Context)
Citation Context ...e each test sentence using the four ranking strategies – MPT, MPP, 5 MPD and SDER) – as described in section 3. We prune the fragment base extracted from each training set with respect to link depth (=-=Hearne & Way, 2003-=-), namely the greatest number of steps taken which depart from a linked node to get from the root node to any frontier node. 6 This yields fragment bases comprising fragments of link depth 1, link dep... |

26 | Data-oriented translation
- Poutsma
- 2000
(Show Context)
Citation Context ...Way ∗ National Centre for Language Technology, School of Computing, DCU, Dublin, Ireland {mhearne | away}@computing.dcu.ie Abstract The Data-Oriented Translation (DOT) model – originally proposed in (=-=Poutsma, 1998-=-, 2003) and based on Data-Oriented Parsing (DOP) (e.g. (Bod, Scha, & Sima’an, 2003)) – is best described as a hybrid model of translation as it combines examples, linguistic information and a statisti... |

15 | Backoff Parameter Estimation for the DOP Model - Sima’an, Buratto - 2003 |

6 |
Machine Translation with Tree-DOP
- Poutsma
- 2003
(Show Context)
Citation Context ... representations, how extracted fragments are to be recombined when analysing and translating new input strings, and how the resulting translations are to be ranked. The model described here follows (=-=Poutsma, 2003-=-). Representations Many different linguistic formalisms can be used to annotate the example base which underpins any DOT model; here, we assume context-free phrase structure tree representations. Repr... |

4 |
LFG-based syntactic transfer from English to French with the Xerox Translation Environment
- Frank
- 1999
(Show Context)
Citation Context ...which was was translated by professional translators and sentence-aligned and annotated at Xerox Parc. As one would expect, the translations it contains are of extremely high quality. As observed in (=-=Frank, 1999-=-), the corpus provides a rich source of both linguistic and translational complexity. While English and French are syntactically quite similar, they often differ significantly in the surface styles us... |

2 | Parsing DOP with Monte-Carlo Techniques
- Chappelier, Rajman
- 2003
(Show Context)
Citation Context ...ding to how often each one occurs in a reduced random sample of the possible derivations. This approach to disambiguation was introduced for DOP in (Bod, 1998) and further expanded on and refined in (=-=Chappelier & Rajman, 2003-=-); application of the algorithms proposed by Chappelier and Rajman (op. cit.) to translation was presented in (Hearne, 2005). The sampling methodology itself is very simple: in order to sample a deriv... |

1 |
Data-Oriented Models of Parsing and Translation. Unpublished doctoral dissertation
- Hearne
- 2005
(Show Context)
Citation Context ...r of source and target trees) is the sum of the probabilities of the derivations which 2 Estimating fragment probabilities according to their relative frequencies is known to be undesirable for DOP. (=-=Hearne, 2005-=-) discusses the ramifications of using this method for DOT: the negative impact on accuracy is less than for DOP, but improved estimation methods (e.g. (Sima’an & Buratto, 2003)) are likely to improve... |

1 |
Extending DOP1 with the insertion operation. Unpublished master’s thesis
- Hoogweg
- 2000
(Show Context)
Citation Context ...ge to give the DOT probability for each translation. The correct values must instead be obtained by rescoring the relative frequencies of the translations in the sample set when sampling is complete (=-=Hoogweg, 2000-=-; Chappelier & Rajman, 2003). Here, we apply exact sampling (Chappelier & Rajman, 2003), the purpose of which is to ensure that the sampling probability of each translation is directly equal (without ... |