## A Hybrid MaxEnt/HMM based ASR System (2005)

Citations: | 2 - 1 self |

### BibTeX

@MISC{Hifny05ahybrid,

author = {Yasser Hifny and Steve Renals and Neil D. Lawrence},

title = {A Hybrid MaxEnt/HMM based ASR System},

year = {2005}

}

### OpenURL

### Abstract

The aim of this work is to develop a practical framework, which extends the classical Hidden Markov Models (HMM) for continuous speech recognition based on the Maximum Entropy (MaxEnt) principle. The MaxEnt models can estimate the posterior probabilities directly as with Hybrid NN/HMM connectionist speech recognition systems. In particular, a new acoustic modelling based on discriminative MaxEnt models is formulated and is being developed to replace the generative Gaussian Mixture Models (GMM) commonly used to model acoustic variability. Initial experimental results using the TIMIT phone task are reported.

### Citations

8086 | Maximum likelihood from incomplete data
- Dempster, Laird, et al.
(Show Context)
Citation Context ...s is optional: in this work we have used finite GMMs, which are a flexible model with a strong and rich history in speech recognition. The diagonal GMMs are estimated per state using the EM algorithm =-=[15]-=-. The mixture weights are then ignored as they are not related to discrimination. Hence, the resulting GM models will estimate the likelihood score for an observation, which (5)swill take the role of ... |

4823 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ...ian framework. Weight decay regularizer form, Ω(Λ)2 =sumn�λi� 2 , is commonly used to control the complexity and it implies zero mean gaussian priors over the model parameters in the Bayesian setting =-=[16]-=-. However, the gaussian prior does not lead to a sparse solution as the parameter values do not approach zero after the training procedure. Also, the gaussian prior implies that the MaxEnt parameters ... |

1083 | A maximum entropy approach to natural language processing
- Berger, Pietra, et al.
- 1996
(Show Context)
Citation Context ...um entropy permitted by the information we do have. MaxEnt has been used in the field of Natural Language Processing (NLP) as a principled way to combine multiple sources in a probabilistic framework =-=[4]-=-. In speech recognition, MaxEnt has been applied to language modelling [5], but there has been relatively little work in acoustic modelling: Likhododev and Gao [6] developed a rank based direct model ... |

693 |
The Elements of Statistical Learning Data Mining Inference and Prediction
- Hastie, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ... =1, Ω(Λ)1 =sumn�λi�is often used to increase the spareness of the model. This prior implies an independent double exponential (or Laplace) distribution for each parameter, with density β exp(−β|λi|) =-=[17]-=-. When λi >= 0, the double expo2 nential distribution will be an exponential distribution. Adding the complexity term to the CML criterion will lead to a minor modification to the original update equa... |

553 | Inducing features of random fields
- Pietra, Pietra, et al.
- 1997
(Show Context)
Citation Context ...rameter Estimation The purpose of the parameter estimation algorithm is to estimate the parameters λ1..λn using numerical methods. A modified version of the Improved Iterative Scaling (IIS) algorithm =-=[11]-=- was used to estimate the parameters. It was suggested to us by John Lafferty [12] to support constraints that may take negative values, which was a restriction of the original algorithm. Further deta... |

431 |
Generalized iterative scaling for log-linear models
- Darroch, Ratcliff
- 1972
(Show Context)
Citation Context ...n method. Hence, it was chosen for parallel computing facilities as it is very simple. Indeed, this equation is a special case of Generalized Iterative Scaling (GIS) developed by Darroach and Ratliff,=-=[14]-=-, where maxT M(o, s) =1. 4. Parametric Constraints The description of the constraining characterizing functions is an optional implementation issue in which the prior knowledge for different applicati... |

242 | A maximum entropy approach to adaptive statistical learning modeling
- Rosenfeld
- 1996
(Show Context)
Citation Context ...n the field of Natural Language Processing (NLP) as a principled way to combine multiple sources in a probabilistic framework [4]. In speech recognition, MaxEnt has been applied to language modelling =-=[5]-=-, but there has been relatively little work in acoustic modelling: Likhododev and Gao [6] developed a rank based direct model for speech recognition whose parameters were estimated by MaxEnt, and Mach... |

192 | An application of recurrent nets to phone probability estimation
- Robinson
- 1994
(Show Context)
Citation Context ...to many published results on TIMIT phone task. However, these results are still lower those reported by the GMM/HMM HTK system (72.3%) [18] and the Recurrent Neural Network (RNN) phone accuracy (75%) =-=[19]-=-. 8. Conclusions In this paper we present an approach to model the acoustic spaces through for the MaxEnt modelling framework. The work aims to relax the inaccurate assumptions associated with the sta... |

183 |
On the rationale of maximum entropy methods
- Jaynes
- 1982
(Show Context)
Citation Context ...ble probability distributions that satisfy the constraints. E. T. Jaynes suggested maximizing Shannon’s entropy criterion subject to the given constraints to choose a suitable distribution as follows =-=[3]-=-: When we make inferences based on incomplete information, we should draw them from that probability distribution that has the maximum entropy permitted by the information we do have. MaxEnt has been ... |

169 |
Entropy Optimization Principles with Applications
- Kesavan, Kapur
- 1992
(Show Context)
Citation Context ...bilities summation, commonly called the partition function, and given by ZΛ(o) = � � � � exp λigi(o, s) s The entropy is a concave function of the mean values of the characterizing constraints ˜p(gi) =-=[9]-=-. Hence, the MaxEnt solution is unique given the empirical mean values of the constraints. Practically this means that the solution is not sensitive to the initial values of the model parameters and t... |

46 |
Continuous speech recognition: An introduction to the hybrid hmm/connectionist approach
- Bourlard, Morgan
- 1995
(Show Context)
Citation Context ...mates of posterior probabilities of the classes given the acoustics. One of the most useful methods to overcome this problem was to replace GMM likelihoods by Neural Networks (NN) acoustic classifier =-=[1, 2]-=-. The maximum entropy (MaxEnt) principle encourages us to choose the most unbiased distribution that is simultaneously consistent with a set of constraints. Typically, the available information about ... |

27 |
The principle of maximum entropy
- Guiasu, Shenitzer
- 1985
(Show Context)
Citation Context ...ion and why it has been frequently used in the application of statistical inference and why it deserves the adjective “normal”, where this distribution is the most uncertain and maximizes the entropy =-=[10]-=-. The strong assumption that the data is normally distributed for the two constraints µ and σ 2 is relaxed by introducing the concept of the parametric constraints in section 4. 3. MaxEnt Optimization... |

20 | A survey of hybrid ANN/HMM models for automatic speech recognition
- Trentin, Gori
(Show Context)
Citation Context ...mates of posterior probabilities of the classes given the acoustics. One of the most useful methods to overcome this problem was to replace GMM likelihoods by Neural Networks (NN) acoustic classifier =-=[1, 2]-=-. The maximum entropy (MaxEnt) principle encourages us to choose the most unbiased distribution that is simultaneously consistent with a set of constraints. Typically, the available information about ... |

14 |
State clustering in HMM-based continuous speech recognition
- Young, Woodland
- 1994
(Show Context)
Citation Context ... 10.2 6.7 66.2 The reported phone accuracy (66.2%) is comparable to many published results on TIMIT phone task. However, these results are still lower those reported by the GMM/HMM HTK system (72.3%) =-=[18]-=- and the Recurrent Neural Network (RNN) phone accuracy (75%) [19]. 8. Conclusions In this paper we present an approach to model the acoustic spaces through for the MaxEnt modelling framework. The work... |

7 | A comparative study on maximum entropy and discriminative training for acoustic modeling in automatic speech recognition
- Macherey, Ney
- 2003
(Show Context)
Citation Context ...as been relatively little work in acoustic modelling: Likhododev and Gao [6] developed a rank based direct model for speech recognition whose parameters were estimated by MaxEnt, and Macherey and Ney =-=[7]-=- discriminatively estimated the parameters of a Gaussian model based speech recognizer using MaxEnt. In a previous work, we evaluated the importance of acoustic features using MaxEnt incremental acous... |

6 |
Direct models for phoneme recognition
- Likhododev, Gao
- 2002
(Show Context)
Citation Context ...sources in a probabilistic framework [4]. In speech recognition, MaxEnt has been applied to language modelling [5], but there has been relatively little work in acoustic modelling: Likhododev and Gao =-=[6]-=- developed a rank based direct model for speech recognition whose parameters were estimated by MaxEnt, and Macherey and Ney [7] discriminatively estimated the parameters of a Gaussian model based spee... |

6 |
private communication
- Rockmore
(Show Context)
Citation Context ...e the parameters λ1..λn using numerical methods. A modified version of the Improved Iterative Scaling (IIS) algorithm [11] was used to estimate the parameters. It was suggested to us by John Lafferty =-=[12]-=- to support constraints that may take negative values, which was a restriction of the original algorithm. Further details about the mathematical derivation are reported in [13]. The basic idea behind ... |

3 | Acoustic space dimensionality selection and combination using the maximum entropy principle
- Abdel-Haleem, Renals, et al.
- 2004
(Show Context)
Citation Context ...ussian model based speech recognizer using MaxEnt. In a previous work, we evaluated the importance of acoustic features using MaxEnt incremental acoustic space dimensionality selection and combination=-=[8]-=-. ∗Yasser Hifny is sponsored by a Motorola Studentship. In this paper, a high dimensional acoustic space is constructed by a large number of acoustic constraints. This aims to simplify the acoustic cl... |

2 |
The use of maximum entropy principle in continuous speech recognition
- Abdel-Haleem
- 2003
(Show Context)
Citation Context ... to us by John Lafferty [12] to support constraints that may take negative values, which was a restriction of the original algorithm. Further details about the mathematical derivation are reported in =-=[13]-=-. The basic idea behind the IIS algorithm is to make use of an auxiliary function, which bounds the change in divergence from below after each iteration. The Generalized Improved Iterative Scaling (GI... |