## Learning Dependency Translation Models as Collections of Finite State Head Transducers (2000)

### Cached

### Download Links

- [acl.ldc.upenn.edu]
- [www.aclweb.org]
- [wing.comp.nus.edu.sg]
- [www.aclweb.org]
- [aclweb.org]
- [aclweb.org]
- [ucrel.lancs.ac.uk]
- DBLP

### Other Repositories/Bibliography

Venue: | Computational Linguistics |

Citations: | 70 - 3 self |

### BibTeX

@ARTICLE{Alshawi00learningdependency,

author = {Hiyan Alshawi and Shona Douglas and Srinivas Bangalore},

title = {Learning Dependency Translation Models as Collections of Finite State Head Transducers},

journal = {Computational Linguistics},

year = {2000}

}

### Years of Citing Articles

### OpenURL

### Abstract

The paper defines weighted head transducers,finite-state machines that perform middle-out string transduction. These transducers are strictly more expressive than the special case of standard leftto-right finite-state transducers. Dependency transduction models are then defined as collections of weighted head transducers that are applied hierarchically. A dynamic programming search algorithm is described for finding the optimal transduction of an input string with respect to a dependency transduction model. A method for automatically training a dependency transduction model from a set of input-output example strings is presented. The method first searches for hierarchical alignments of the training examples guided by correlation statistics, and then constructs the transitions of head transducers that are consistent with these alignments. Experimental results are given for applying the training method to translation from English to Spanish and Japanese. 1.

### Citations

673 | An ecient context-free parsing algorithm
- Earley
- 1970
(Show Context)
Citation Context ...l derivation. This algorithm can take as input either word strings, or word lattices produced by a speech recognizer. The algorithm is similar to those for context-free parsing such as chart parsing (=-=Earley 1970-=-) and the CKY algorithm (Younger 1967). Since word string input is a special case of word lattice input, we need only describe the case of lattices. We now present a sketch of the transduction algorit... |

608 | A statistical approach to machine translation
- Brown, Cocke, et al.
- 1990
(Show Context)
Citation Context ...hen complete derivations are not available, partial derivations tend to have meaningful headwords. At the same time, we believe our method has advantages over the approach developed initially at IBM (=-=Brown et al. 1990-=-; Brown et al. 1993) for training translation systems automatically. One advantage is that our method attempts to model the natural decomposition of sentences into phrases. Another is that the compila... |

457 | Stochastic inversion transduction grammars and bilingual parsing of parallel corpora - Wu - 1997 |

325 | An Introduction to Machine Translation - Hutchins, Somers - 1992 |

214 | Word Grammar - Hudson - 1984 |

158 | Identifying word correspondences in parallel texts
- Gale, Church
- 1991
(Show Context)
Citation Context ...between source and target words, which we assume is indicative of carrying the same semantic content. Our preferred choice of statistical measure for assigning the costs is the ~ correlation measure (=-=Gale and Church 1991-=-). We apply this statistic to co-occurrence of the source word with all its possible translations in the data set examples. We have found that, at least for our data, this measure leads to better perf... |

132 |
The mathematics of machine translation: Parameter estimation
- Brown, Pietra, et al.
- 1993
(Show Context)
Citation Context ...ranslations in the data set examples. We have found that, at least for our data, this measure leads to better performance than the use of the log probabilities of target words given source words (cf. =-=Brown et al. 1993-=-). In addition to the correlation measure, the cost for a pairing includes a distance measure component that penalizes pairings proportionately to the difference between the (normalized) positions of ... |

104 | Machine translation divergences: a formal description and proposed solution
- Dorr
- 1994
(Show Context)
Citation Context ...anguages are chosen to force a synchronized alignment (for better or worse) in order to simplify cases involving so-called head-switching. This contrasts with one of the traditional approaches (e.g., =-=Dorr 1994-=-; Watanabe 1995) to posing the translation problem, i.e., the approach in which translation problems are seen in terms of bridging the gap between the most natural monolingual representations underlyi... |

73 |
Syntax-directed transduction
- Lewis, Stearns
- 1968
(Show Context)
Citation Context ... suggesting that the learning curve is relatively shallow beyond the current size of corpus. 6. Concluding Remarks Formalisms for finite-state and context-free transduction have a long history (e.g., =-=Lewis and Stearns 1968-=-; Aho and Ullman 1972), and such formalisms have been applied to the machine translation problem, both in the finite-state case (e.g., Vilar et al. 1996) and the context-free case (e.g., Wu 1997). In ... |

55 | Dependency theory: A formalism and some observations, Language 40: 511– 525. DOI: 10.2307/411934 - Hays - 1964 |

21 | automata for speech translation
- Alshawi
- 1996
(Show Context)
Citation Context ...ite-State Head Transducers In this section we describe the basic structure and operation of a weighted head transducer. In some respects, this description is simpler than earlier presentations (e.g., =-=Alshawi 1996-=-); for example, here final states are simply a subset of the transducer states whereas in other work we have described the more general case in which final states are specified by a probability distri... |

15 | Learning dependency transduction models from unannotated examples - Alshawi, Douglas - 2000 |

6 | Learning Phrase-based Head Transduction Models for Translation of Spoken Utterances - Alshawi, Bangalore, et al. - 1998 |

6 |
A Model of a Bi-Directional Transfer Mechanism Using Rule Combinations
- Watanabe
- 1995
(Show Context)
Citation Context ...e chosen to force a synchronized alignment (for better or worse) in order to simplify cases involving so-called head-switching. This contrasts with one of the traditional approaches (e.g., Dorr 1994; =-=Watanabe 1995-=-) to posing the translation problem, i.e., the approach in which translation problems are seen in terms of bridging the gap between the most natural monolingual representations underlying the sentence... |

2 | English-to-Mandarin speech translation with head transducers
- Alshawi, Xia
- 1997
(Show Context)
Citation Context ...ame minimal cost.) In the transducers produced by the training method described in this paper, the source and target positions are in the set {-1,0,1}, though we have also used handcoded transducers (=-=Alshawi and Xia 1997-=-) and automatically trained transducers (A1shawl and Douglas 2000) with a larger range of positions. 2.2 Relationship to Standard FSTs The operation of a traditional left-to-right transducer can be si... |

1 |
The Theory o/Parsing, Translation, and Compiling
- Aho, Ullman
- 1972
(Show Context)
Citation Context ...rning curve is relatively shallow beyond the current size of corpus. 6. Concluding Remarks Formalisms for finite-state and context-free transduction have a long history (e.g., Lewis and Stearns 1968; =-=Aho and Ullman 1972-=-), and such formalisms have been applied to the machine translation problem, both in the finite-state case (e.g., Vilar et al. 1996) and the context-free case (e.g., Wu 1997). In this paper we have ad... |

1 | Evaluation of machine translation system based on a statistical method by using spontaneous speech transcription - Tsukada, Alshawi, et al. - 1999 |