## Training Tree Transducers (2004)

### Cached

### Download Links

- [www.isi.edu]
- [www.aclweb.org]
- [aclweb.org]
- [wing.comp.nus.edu.sg]
- [aclweb.org]
- DBLP

### Other Repositories/Bibliography

Venue: | IN HLT-NAACL |

Citations: | 103 - 10 self |

### BibTeX

@INPROCEEDINGS{Graehl04trainingtree,

author = {Jonathan Graehl and Kevin Knight},

title = {Training Tree Transducers},

booktitle = {IN HLT-NAACL},

year = {2004},

pages = {105--112},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

Many probabilistic models for natural language are now written in terms of hierarchical tree structure. Tree-based modeling still lacks many of the standard tools taken for granted in (finite-state) string-based modeling. The theory of tree transducer automata provides a possible framework to draw on, as it has been worked out in an extensive literature. We motivate the use of tree transducers for natural language and address the training problem for probabilistic tree-totree and tree-to-string transducers.

### Citations

8093 | Maximum likelihood from incomplete data via the EM algorithm - Dempster, Laird, et al. - 1977 |

3837 |
Introduction to Automata Theory, Languages and Computation
- Hopcroft
- 1979
(Show Context)
Citation Context ... end USE(n) ≡ begin A[n] ← true for n ′ s.t. ∃(n, t, w) ∈ R : n ′ ∈ yieldt(N) do /* for n ′ that are in the rhs of rules whose lhs is n */ if ¬A[n′] ∧ B[n ′] then USE(n ′) end productions from a CFG (=-=Hopcroft and Ullman 1979-=-). 7 We eliminate all the remains of failed subforests, by removing all nonterminals n, and any productions involving n, where Algorithm 2 gives A[n] = false. In the next section, we show how to compu... |

1165 | Error bounds for convolutional codes and an asymptotically optimum decoding algorithm - Viterbi - 1967 |

682 | Accurate unlexicalized parsing - Klein, Manning - 2003 |

634 | Statistical Phrase-Based Translation - Koehn, Marcu - 2003 |

632 | Synchronous tree adjoining grammars
- Shieber, Schabes
- 1990
(Show Context)
Citation Context ... because RTGs distinguish states and tree symbols, which are conflated in TSGs at the elementary tree root. Regular tree languages are strictly contained in tree sets of tree adjoining grammars (TAG; =-=Joshi and Schabes 1997-=-), which generate string languages strictly between the context-free and indexed languages. RTGs are essentially TAGs without auxiliary trees 399Computational Linguistics Volume 34, Number 3 and thei... |

503 | Three generative, lexicalised models for statistical parsing
- Collins
- 1997
(Show Context)
Citation Context ...ight, and Marcu 2003), natural language generation (Langkilde and Knight 1998; Bangalore and Rambow 2000; Corston-Oliver et al. 2002), parsing, and language modeling (Baker 1979; Lari and Young 1990; =-=Collins 1997-=-; Chelba and Jelinek 2000; Charniak 2001; Klein ∗ Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292. E-mail: graehl@isi.edu. ∗∗ Information Sciences Institute, 4676 Admiralt... |

427 | Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora - Wu - 1997 |

373 |
The estimation of stochastic context-free grammars using the Inside–Outside algorithm. Computer Speech and Language
- Lari, Young
- 1990
(Show Context)
Citation Context ...araphrasing (Pang, Knight, and Marcu 2003), natural language generation (Langkilde and Knight 1998; Bangalore and Rambow 2000; Corston-Oliver et al. 2002), parsing, and language modeling (Baker 1979; =-=Lari and Young 1990-=-; Collins 1997; Chelba and Jelinek 2000; Charniak 2001; Klein ∗ Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292. E-mail: graehl@isi.edu. ∗∗ Information Sciences Institute,... |

268 |
Trainable grammars for speech recognition
- Baker
- 1979
(Show Context)
Citation Context ...rcu 2002), paraphrasing (Pang, Knight, and Marcu 2003), natural language generation (Langkilde and Knight 1998; Bangalore and Rambow 2000; Corston-Oliver et al. 2002), parsing, and language modeling (=-=Baker 1979-=-; Lari and Young 1990; Collins 1997; Chelba and Jelinek 2000; Charniak 2001; Klein ∗ Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292. E-mail: graehl@isi.edu. ∗∗ Informatio... |

236 | Tree automata techniques and applications. Available on: http: //www.grappa.univ-lille3.fr/tata - Comon, Dauchet, et al. - 1997 |

235 |
Tree Automata. Akadémiai Kiadó
- Gécseg, Steinby
- 1984
(Show Context)
Citation Context ...ht FST, except that it works top-down, pursuing subtrees independently, with each subtree transformed depending only on its own passed-down state. This class of transducer, called R in earlier works (=-=Gécseg and Steinby 1984-=-; Graehl and Knight 2004) for “root-to-frontier,” is often nowadays called T, for “top-down”. Rounds uses a mathematics-oriented example of a T transducer, which we repeat in Figure 1. At each point i... |

214 |
An Inequality with Applications to Statistical Estimation for Probabilistic Functions of a Markov Process and to a Model for Ecology
- Baum, Egon
- 1967
(Show Context)
Citation Context ...ing acoustic sequences to word sequences is neatly captured by left-to-right stateful substitution. Many conceptual tools exist, such as Viterbi decoding (Viterbi 1967) and forward–backward training (=-=Baum and Eagon 1967-=-), as well as software toolkits like the AT&T FSM Library and USC/ISI’s Carmel. 1 Moreover, a surprising variety of problems are attackable with FSTs, from part-of-speech tagging to letter-to-sound co... |

199 | Generation that exploits corpus-based statistical knowledge
- Langkilde, Knight
- 1998
(Show Context)
Citation Context ...ore, and Douglas 2000; Yamada and Knight 2001; Eisner 2003; Gildea 2003), but also for summarization (Knight and Marcu 2002), paraphrasing (Pang, Knight, and Marcu 2003), natural language generation (=-=Langkilde and Knight 1998-=-; Bangalore and Rambow 2000; Corston-Oliver et al. 2002), parsing, and language modeling (Baker 1979; Lari and Young 1990; Collins 1997; Chelba and Jelinek 2000; Charniak 2001; Klein ∗ Information Sci... |

169 | Tree acceptors and some of their applications - Doner - 1970 |

153 | Mathematical and Computational Aspects of Lexicalized Grammars - Schabes - 1990 |

144 | Summarization beyond sentence extraction: A probabilistic approach to sentence compression
- Knight, Marcu
(Show Context)
Citation Context ...ic tree-based models have been proposed not only for machine translation (Wu 1997; Alshawi, Bangalore, and Douglas 2000; Yamada and Knight 2001; Eisner 2003; Gildea 2003), but also for summarization (=-=Knight and Marcu 2002-=-), paraphrasing (Pang, Knight, and Marcu 2003), natural language generation (Langkilde and Knight 1998; Bangalore and Rambow 2000; Corston-Oliver et al. 2002), parsing, and language modeling (Baker 19... |

119 | Mappings and grammars on trees - Rounds - 1970 |

114 | Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences - Pang, Knight, et al. - 2003 |

87 | Immediate-head parsing for language models - Charniak |

72 | Exploiting a probabilistic hierarchical model for generation - Bangalore, Rambow - 2000 |

71 | Structured language modeling - Chelba, Jelinek |

66 | Learning Dependency Translation Models as Collections of Finite State Head Transducers - Alshawi, Bangalore, et al. |

65 | Top-down tree transducers with regular look-ahead - Engelfriet - 1977 |

56 |
Translations on a context-free grammar
- Aho, Ullman
- 1971
(Show Context)
Citation Context ... 3). 8. Strings We have covered tree-to-tree transducers; we now turn to tree-to-string transducers. In the automata literature, such transductions are called generalized syntax-directed translation (=-=Aho and Ullman 1971-=-), and are used to specify compilers that (deterministically) transform high-level source-language trees into linear target-language code. Tree-to-string transducers have also been applied to the mach... |

54 | Forest-based statistical sentence generation - Langkilde - 2000 |

49 |
On attributed tree transducers
- Fulop
- 1981
(Show Context)
Citation Context ...trees given the input trees. As with the forward–backward algorithm, we seek at least a local maximum. Tree transducers with weights have been studied (Kuich 1999; Engelfriet, Fülöp, and Vogler 2004; =-=Fülöp and Vogler 2004-=-) but we know of no existing training procedure. Sections 2–4 of this article define basic concepts and recall the notions of relevant automata and grammars. Sections 5–7 describe a novel tree transdu... |

44 | Translation with finite-state devices - Knight, Al-Onaizan - 1998 |

42 |
Structured language modeling, Computer Speech and Language 14(4):283–332
- Chelba, Jelinek
- 2000
(Show Context)
Citation Context ...u 2003), natural language generation (Langkilde and Knight 1998; Bangalore and Rambow 2000; Corston-Oliver et al. 2002), parsing, and language modeling (Baker 1979; Lari and Young 1990; Collins 1997; =-=Chelba and Jelinek 2000-=-; Charniak 2001; Klein ∗ Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA 90292. E-mail: graehl@isi.edu. ∗∗ Information Sciences Institute, 4676 Admiralty Way, Marina del Rey, CA... |

39 | Generalized sequential machine maps - Thatcher - 1970 |

37 | Bottom-up and Topdown Tree Series Transformations - Engelfriet, Fülöp, et al. |

32 | An overview of amalgam: A machine-learned generation module - Corston-Oliver, Gamon, et al. - 2002 |

30 | A weighted finite state transducer implementation of the alignment template model for statistical machine translation - Kumar, Byrne - 2003 |

29 |
Tree transducers and formal tree series
- Kuich
- 1999
(Show Context)
Citation Context ... that we maximize the probability of the output trees given the input trees. As with the forward–backward algorithm, we seek at least a local maximum. Tree transducers with weights have been studied (=-=Kuich 1999-=-; Engelfriet, Fülöp, and Vogler 2004; Fülöp and Vogler 2004) but we know of no existing training procedure. Sections 2–4 of this article define basic concepts and recall the notions of relevant automa... |

26 | Bottom-up and top-down tree transformations—A comparison - Engelfriet - 1975 |

8 | Tree Automata. Akadmiai Kiad - Gcseg, Steinby - 1984 |

3 | Parsing non-recursive CFGs - Nederhof, Satta - 2002 |

1 | 425 Linguistics Volume 34, Number 3 Comon - H, Gilleron, et al. - 1997 |

1 | Release of 12 October 2007 - Corston-Oliver, Gamon, et al. |

1 |
Carmel finite-state toolkit. Available at http://www.isi.edu/ licensed-sw/carmel
- Graehl
- 1997
(Show Context)
Citation Context ...ransducers, which is in turn a generalization of the original forward–backward algorithm for Hidden Markov Models. Eisner (2002) describes string-based training under different semirings, and Carmel (=-=Graehl 1997-=-) implements FST string-to-string training. In our tree-based training algorithm, inside–outside weights replace forward–backward, and paths in trees replace positions in strings. Explicit constructio... |