## Bayesian inference for finite-state transducers (2010)

### Cached

### Download Links

Venue: | in HLT-NAACL |

Citations: | 7 - 4 self |

### BibTeX

@INPROCEEDINGS{Chiang10bayesianinference,

author = {David Chiang and Jonathan Graehl and Kevin Knight and Adam Pauls and Sujith Ravi},

title = {Bayesian inference for finite-state transducers},

booktitle = {in HLT-NAACL},

year = {2010}

}

### OpenURL

### Abstract

We describe a Bayesian inference algorithm that can be used to train any cascade of weighted finite-state transducers on end-toend data. We also investigate the problem of automatically selecting from among multiple training runs. Our experiments on four different tasks demonstrate the genericity of this framework, and, where applicable, large improvements in performance over EM. We also show, for unsupervised part-of-speech tagging, that automatic run selection gives a large improvement over previous Bayesian approaches. 1

### Citations

3719 |
Stochastic Relaxation, Gibbs Distributions and the Bayesian Restoration of Images
- Geman, Geman
- 1984
(Show Context)
Citation Context ...t to new data, obtaining (for example) the k-best output strings for an input string. 3 Generic Bayesian Training Bayesian learning is a wide-ranging field. We focus on training using Gibbs sampling (=-=Geman and Geman, 1984-=-), because it has been popularly applied in the natural language literature, e.g., (Finkel et al., 2005; DeNero et al., 2008; Blunsom et al., 2009). Our overall plan is to give a generic algorithm for... |

383 | Incorporating non-local information into information extraction systems by gibbs sampling
- Finkel, Grenager, et al.
- 2005
(Show Context)
Citation Context ...Training Bayesian learning is a wide-ranging field. We focus on training using Gibbs sampling (Geman and Geman, 1984), because it has been popularly applied in the natural language literature, e.g., (=-=Finkel et al., 2005-=-; DeNero et al., 2008; Blunsom et al., 2009). Our overall plan is to give a generic algorithm for Bayesian training that is a “drop-in replacement” for EM training. That is, we input an FST cascade an... |

244 | Tagging english text with a probabilistic model
- Mérialdo
- 1993
(Show Context)
Citation Context ...elect among multiple training runs in order to achieve the best possible task accuracy. The natural language applications we consider in this paper are: (1) unsupervised part-of-speech (POS) tagging (=-=Merialdo, 1994-=-; Goldwater and Griffiths, 2007), (2) letter substitution decipherment (Peleg and Rosenfeld, 1979; Knight et al., 2006; Ravi and Knight, 2008), (3) segmentation of space-free English (Goldwater et al.... |

235 |
Tree Automata. Akadémiai Kiadó
- Gécseg, Steinby
- 1984
(Show Context)
Citation Context ...scades of tree transducers? It is straightforward to adapt our methods to train a single tree transducer (Graehl et al., 2008), but as most types of tree transducers are not closed under composition (=-=Gécseg and Steinby, 1984-=-), the compose/de-compose method cannot be directly applied to train cascades. Third, what is the best way to extend the FST formalism to represent non-parametric Bayesian models? Consider the English... |

130 | Machine Transliteration
- Knight, Graehl
- 1997
(Show Context)
Citation Context ...tion decipherment (Peleg and Rosenfeld, 1979; Knight et al., 2006; Ravi and Knight, 2008), (3) segmentation of space-free English (Goldwater et al., 2009), and (4) Japanese/English phoneme alignment (=-=Knight and Graehl, 1998-=-; Ravi and Knight, 2009a). Figure 1 shows how each of these problems can be represented as a cascade of finite-state acceptors (FSAs) and finite-state transducers (FSTs). 2 Generic EM Training We firs... |

124 | Speech Recognition by Composition of Weighted Finite Automata - Pereira, Riley - 1997 |

115 | A fully Bayesian approach to unsupervised part-of-speech tagging - Goldwater, Griffiths - 2007 |

103 | Training Tree Transducers
- Graehl, Knight, et al.
(Show Context)
Citation Context ...e of tasks? Second, can generic methods similar to the ones described here be developed for cascades of tree transducers? It is straightforward to adapt our methods to train a single tree transducer (=-=Graehl et al., 2008-=-), but as most types of tree transducers are not closed under composition (Gécseg and Steinby, 1984), the compose/de-compose method cannot be directly applied to train cascades. Third, what is the bes... |

63 |
An Overview of Probabilistic Tree Transducers for Natural Language Processing
- Knight, Graehl
- 2005
(Show Context)
Citation Context ...rence code. This leads to faster scientific experimentation with fewer bugs. Weighted tree transducers play the same role for problems that involve the creation and transformation of tree structures (=-=Knight and Graehl, 2005-=-). Of course, many problems do not fit either the finitestate string or tree transducer framework, but in this paper, we concentrate on those that do. Bayesian inference schemes have become popular re... |

44 |
Translation with finite-state devices
- Knight, Al-Onaizan
- 1998
(Show Context)
Citation Context ...stigate Bayesian inference for weighted finite-state transducers (WFSTs). Many natural language models can be captured by weighted finite-state transducers (Pereira et al., 1994; Sproat et al., 1996; =-=Knight and Al-Onaizan, 1998-=-; Clark, 2002; Kolak et al., 2003; Mathias and Byrne, 2006), which offer several benefits: • WFSTs provide a uniform knowledge representation. • Complex problems can be broken down into a cascade of s... |

34 | Comparison of Bayesian Estimators for unsupervised Hidden Markov Model POS Taggers - Gao, Johnson - 2008 |

27 | Minimized models for unsupervised part-of-speech tagging - Ravi, Knight - 2009 |

25 |
Breaking substitution ciphers using a relaxation algorithm
- Peleg, Rosenfeld
- 1979
(Show Context)
Citation Context .... The natural language applications we consider in this paper are: (1) unsupervised part-of-speech (POS) tagging (Merialdo, 1994; Goldwater and Griffiths, 2007), (2) letter substitution decipherment (=-=Peleg and Rosenfeld, 1979-=-; Knight et al., 2006; Ravi and Knight, 2008), (3) segmentation of space-free English (Goldwater et al., 2009), and (4) Japanese/English phoneme alignment (Knight and Graehl, 1998; Ravi and Knight, 20... |

17 |
Statistical phrase-based speech translation
- Mathias, Byrne
- 2006
(Show Context)
Citation Context ...rs (WFSTs). Many natural language models can be captured by weighted finite-state transducers (Pereira et al., 1994; Sproat et al., 1996; Knight and Al-Onaizan, 1998; Clark, 2002; Kolak et al., 2003; =-=Mathias and Byrne, 2006-=-), which offer several benefits: • WFSTs provide a uniform knowledge representation. • Complex problems can be broken down into a cascade of simple WFSTs. • Input- and output-epsilon transitions allow... |

15 | A generative probabilistic ocr model for nlp applications
- Kolak, Byrne, et al.
- 2003
(Show Context)
Citation Context ...nite-state transducers (WFSTs). Many natural language models can be captured by weighted finite-state transducers (Pereira et al., 1994; Sproat et al., 1996; Knight and Al-Onaizan, 1998; Clark, 2002; =-=Kolak et al., 2003-=-; Mathias and Byrne, 2006), which offer several benefits: • WFSTs provide a uniform knowledge representation. • Complex problems can be broken down into a cascade of simple WFSTs. • Input- and output-... |

14 | Unsupervised analysis for decipherment problems
- Knight, Nair, et al.
- 2006
(Show Context)
Citation Context ...ications we consider in this paper are: (1) unsupervised part-of-speech (POS) tagging (Merialdo, 1994; Goldwater and Griffiths, 2007), (2) letter substitution decipherment (Peleg and Rosenfeld, 1979; =-=Knight et al., 2006-=-; Ravi and Knight, 2008), (3) segmentation of space-free English (Goldwater et al., 2009), and (4) Japanese/English phoneme alignment (Knight and Graehl, 1998; Ravi and Knight, 2009a). Figure 1 shows ... |

13 | Memory-based learning of morphology with stochastic transducers - Clark - 2002 |

10 | Attacking decipherment problems optimally with low-order n-gram models
- Ravi, Knight
- 2008
(Show Context)
Citation Context ...in this paper are: (1) unsupervised part-of-speech (POS) tagging (Merialdo, 1994; Goldwater and Griffiths, 2007), (2) letter substitution decipherment (Peleg and Rosenfeld, 1979; Knight et al., 2006; =-=Ravi and Knight, 2008-=-), (3) segmentation of space-free English (Goldwater et al., 2009), and (4) Japanese/English phoneme alignment (Knight and Graehl, 1998; Ravi and Knight, 2009a). Figure 1 shows how each of these probl... |

4 | Ghahraman Z (2003): The variational Bayesian EM algorithms for incomplete data: With application to scoring graphical model structures - Beal |

1 | Learning phoneme mappings for transliteration without parallel data - 2009a - 2009 |