## MaximumLikelihoodEstimationofFeature-basedDistributions

### Cached

### Download Links

### BibTeX

@MISC{_maximumlikelihoodestimationoffeature-baseddistributions,

author = {},

title = {MaximumLikelihoodEstimationofFeature-basedDistributions},

year = {}

}

### OpenURL

### Abstract

Motivated by recent work in phonotactic learning (Hayes and Wilson 2008, Albright 2009), this paper shows how to define feature-based probability distributions whose parameters can be provably efficiently estimated. The main idea is that these distributions are defined as a product of simpler distributions (cf. Ghahramani and Jordan 1997). One advantage of this framework is it draws attention to what is minimally necessary to describe and learn phonological feature interactions in phonotactic patterns. The “bottom-up” approach adopted here is contrasted with the “top-down ” approach in Hayes and Wilson (2008), and it is argued that the bottom-up approach is more analytically transparent. 1

### Citations

534 | Factorial hidden Markov models
- Ghahramani, Jordan
- 1997
(Show Context)
Citation Context ...define feature-based probability distributions whose parameters can be provably efficiently estimated. The main idea is that these distributions are defined as a product of simpler distributions (cf. =-=Ghahramani and Jordan 1997-=-). One advantage of this framework is it draws attention to what is minimally necessary to describe and learn phonological feature interactions in phonotactic patterns. The “bottom-up” approach adopte... |

142 | Learning dynamic Bayesian networks
- Ghahramani
- 1998
(Show Context)
Citation Context ...gh,+low]) and were zeroed out. features word-initially. The details of these procedures are left for future research and are likely to draw from the rich literature on Bayesian networks (Pearl, 1989; =-=Ghahramani, 1998-=-). More important, however, is this framework allows researchers to construct the independence assumptions they want into the model in at least two ways. First, universally incompatible features can b... |

138 |
Internal organization of Speech Sounds
- Clements, Hume
- 1996
(Show Context)
Citation Context ... prohibit fourfeature interactions, or models where only certain features are permitted to interact but not others (perhaps because they belong to the same node in a feature geometry (Clements, 1985; =-=Clements and Hume, 1995-=-). 8 7 HayesandWilson(2008) This section introduces the Hayes and Wilson (2008) (henceforth HW) phonotactic learner and shows that the contribution features play in generalization is not as clear as p... |

118 |
Preliminaries to Speech Analysis
- Jakobson, Halle
- 1952
(Show Context)
Citation Context ...m-up approach is more analytically transparent. 1 Introduction The hypothesis that the atomic units of phonology are phonological features, and not segments, is one of the tenets of modern phonology (=-=Jakobson et al., 1952-=-; Chomsky and Halle, 1968). According to this hypothesis, segments are essentially epiphenomenal and exist only by virtue of being a shorthand description of a collection of more primitive units—the f... |

93 | A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry 39
- Hayes, Wilson
- 2008
(Show Context)
Citation Context ...hoodEstimationofFeature-basedDistributions JeffreyHeinzandCesarKoirala University of Delaware Newark, Delaware, USA {heinz,koirala}@udel.edu Abstract Motivated by recent work in phonotactic learning (=-=Hayes and Wilson 2008-=-, Albright 2009), this paper shows how to define feature-based probability distributions whose parameters can be provably efficiently estimated. The main idea is that these distributions are defined a... |

67 | Mixed memory markov models: Decomposing complex stochastic processes as mixtures of simpler ones
- Saul, Jordan
- 1999
(Show Context)
Citation Context ... estimation occurs with less data and, relatedly, the family contains fewer distributions. This idea is not new. It is explicit in Factorial Hidden Markov Models (FHMMs) (Ghahramani and Jordan, 1997; =-=Saul and Jordan, 1999-=-), and more recently underlies approaches to describing and inferring regular string transductions (Dreyer et al., 2008; Dreyer and Eisner, 2009). Although HMMs and probabilistic finite-state automata... |

31 | Learning Bias and PhonologicalRule Induction
- Gildea, Jurafsky
- 1996
(Show Context)
Citation Context ...tue of being a shorthand description of a collection of more primitive units—the features. Incorporating this hypothesis into phonological learning models has been the focus of much influential work (=-=Gildea and Jurafsky, 1996-=-; Wilson, 2006; Hayes and Wilson, 2008; Moreton, 2008; Albright, 2009). This paper makes three contributions. The first contribution is a framework within which: 1. researchers can choose which statis... |

19 |
Learning locally testable languages in the strict sense
- García, Vidal, et al.
- 1990
(Show Context)
Citation Context ...uences (McNaughton and Papert, 1971; Rogers and Pullum, to appear; Rogers et al., 2009). They are also the categorical counterpart to stochastic languages describable with ngram models (where n = k) (=-=Garcia et al., 1990-=-; Jurafsky and Martin, 2008). Since stochastic languages are distributions, we refer to strictly klocal stochastic languages as strictly k-local distri4 Technically, M is neither a simple DFA or PDFA;... |

14 | On languages piecewise testable in the strict sense
- Rogers, Heinz, et al.
- 2010
(Show Context)
Citation Context ...ing phonological features; 2. feature systems can be fully integrated into strictly local (McNaughton and Papert, 1971) (i.e. n-gram models (Jurafsky and Martin, 2008)) and strictly piecewise models (=-=Rogers et al., 2009-=-; Heinz and Rogers, 2010) in order to define families of provably wellformed, feature-based probability distributions that are provably efficiently estimable. The main idea is to define a family of di... |

8 | 2010. Estimating strictly piecewise distributions
- Heinz, Rogers
(Show Context)
Citation Context ...ures; 2. feature systems can be fully integrated into strictly local (McNaughton and Papert, 1971) (i.e. n-gram models (Jurafsky and Martin, 2008)) and strictly piecewise models (Rogers et al., 2009; =-=Heinz and Rogers, 2010-=-) in order to define families of provably wellformed, feature-based probability distributions that are provably efficiently estimable. The main idea is to define a family of distributions as the norma... |

5 |
Probabilistic Reasoning in IntelligentSystems: NetworksofPlausibleInference
- Pearl
- 1989
(Show Context)
Citation Context ...le (e.g. [+high,+low]) and were zeroed out. features word-initially. The details of these procedures are left for future research and are likely to draw from the rich literature on Bayesian networks (=-=Pearl, 1989-=-; Ghahramani, 1998). More important, however, is this framework allows researchers to construct the independence assumptions they want into the model in at least two ways. First, universally incompati... |

4 | Probabilistic finite-state machines-part I - 2005a |

4 | Probabilistic finite-state machines-part - 2005b |

2 | and Marieke Obdeyn. 2009. Simplifying subsidiary theory: statistical evidence from arabic, muna, shona, and wargamay - Wilson |

1 |
The geometry of phonological features.PhonologyYearbook
- Clements
- 1985
(Show Context)
Citation Context ...teract but which prohibit fourfeature interactions, or models where only certain features are permitted to interact but not others (perhaps because they belong to the same node in a feature geometry (=-=Clements, 1985-=-; Clements and Hume, 1995). 8 7 HayesandWilson(2008) This section introduces the Hayes and Wilson (2008) (henceforth HW) phonotactic learner and shows that the contribution features play in generaliza... |

1 |
GrammaticalInference: Learning Automata and Grammars
- Higuera
- 2010
(Show Context)
Citation Context ...the states). The maximum likelihood (ML) estimation of regular deterministic distributions is a solved problem when the structure of the PDFA is known (Vidal et al., 2005a; Vidal et al., 2005b; de la =-=Higuera, 2010-=-). Let S be a finite sample of words drawn from a regular deterministic distribution D. The problem is to estimate parameters T and F of 2 Note that restricting δ to cases when σ1 = σ2 obtains the sta... |

1 |
Graphical models over multiple strings. InProceedingsofthe ConferenceonEmpiricalMethodsinNaturalLanguageProcessing(EMNLP
- Dreyer, Eisner
- 2009
(Show Context)
Citation Context ...idden Markov Models (FHMMs) (Ghahramani and Jordan, 1997; Saul and Jordan, 1999), and more recently underlies approaches to describing and inferring regular string transductions (Dreyer et al., 2008; =-=Dreyer and Eisner, 2009-=-). Although HMMs and probabilistic finite-state automata describe the same class of distributions (Vidal et al., 2005a; Vidal et al., 2005b), this paper presents these ideas in formal language-theoret... |

1 |
Speech andLanguageProcessing
- Jurafsky, Martin
- 2008
(Show Context)
Citation Context ...e which statistical independence assumptions to make regarding phonological features; 2. feature systems can be fully integrated into strictly local (McNaughton and Papert, 1971) (i.e. n-gram models (=-=Jurafsky and Martin, 2008-=-)) and strictly piecewise models (Rogers et al., 2009; Heinz and Rogers, 2010) in order to define families of provably wellformed, feature-based probability distributions that are provably efficiently... |

1 |
Analytic bias and phonological typology.Phonology
- Moreton
- 2008
(Show Context)
Citation Context ...itive units—the features. Incorporating this hypothesis into phonological learning models has been the focus of much influential work (Gildea and Jurafsky, 1996; Wilson, 2006; Hayes and Wilson, 2008; =-=Moreton, 2008-=-; Albright, 2009). This paper makes three contributions. The first contribution is a framework within which: 1. researchers can choose which statistical independence assumptions to make regarding phon... |

1 | The Hague. Enrique Vidal, Franck Thollard, Colin de la - Mouton |