## Optimal Linear Feature Transformations For Semi-Continuous Hidden Markov Models (1995)

Citations: | 6 - 0 self |

### BibTeX

@MISC{Schukat-Talamazzini95optimallinear,

author = {Gunter Schukat-Talamazzini and Joachim Hornegger and Heinrich Niemann},

title = {Optimal Linear Feature Transformations For Semi-Continuous Hidden Markov Models},

year = {1995}

}

### Years of Citing Articles

### OpenURL

### Abstract

Linear discriminant or Karhunen-Lo`eve transforms are established techniques for mapping features into a lower dimensional subspace. This paper introduces a uniform statistical framework, where the computation of the optimal feature reduction is formalized as a Maximum-Likelihood estimation problem. The experimental evaluation of this suggested extension of linear selection methods shows a slight improvement of the recognition accuracy. 1. INTRODUCTION It is the ultimate goal of any probabilistic approach to speech recognition to capture the entire process of word production within one single homogeneous model. The statistical parameters of this model then can be optimized with respect to a large training sample of speech using standard parameter estimation techniques. A first step in this direction was the introduction of the (discrete density) hidden Markov model (HMM, [1]), which constitutes a simple probabilistic description of acoustical word realization down to the level of labe...

### Citations

8089 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...r (q; k) represents that at time t the state sqt uses the density component gkt for generating an output vector. The estimation of the FTHMM parameters is performed iteratively using the EM Algorithm =-=[7]-=-. For that purpose wehaveto maximize the Kullback-Leibler statistics Q( ; ^ ) = XX P (X; q; k j ) log P (X; q; k j q k ^ ) with respect to ^. After expanding the log likelihood expression of the right... |

2649 | Introduction to Statistical Pattern Recognitionâ€ť, 2nd edition - Fukunaga - 1990 |

757 |
Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- Davis, Mermelstein
- 1980
(Show Context)
Citation Context ... cients of an appropriate number of neighboring speech frames. The second stage is formed by a linear mapping involving standard feature extraction activities like, for instance, the cosine transform =-=[4]-=- and rotations of the coordinate system (Karhunen-Loeve or linear discriminant transform, [5, 6]); nally, the features belonging to the d rst coordinate axes are selected to be fed into the vector qua... |

275 | Hidden Markov models for speech recognition
- Huang, Ariki, et al.
- 1990
(Show Context)
Citation Context ...the scope of the word models by a phonetic classi er or a vector quantizer. The undesirable exclusion of the feature level from word modelling stopped with the advent of semicontinuous models (SCHMM) =-=[2]-=- which incorporated the formerly external vector quantizer into the speech production model. Thus, in a SCHMM-based speech recognizer the entire processing sequence from the feature vectors to the wor... |

181 |
An introduction to the application of the theory of probabilistic functions of a return process to automatic speech recognition
- Levinson, Rabiner
- 1983
(Show Context)
Citation Context ...with respect to a large training sample of speech using standard parameter estimation techniques. A rst step in this direction was the introduction of the (discrete density) hidden Markov model (HMM, =-=[1]-=-), which constitutes a simple probabilistic description of acoustical word realization down to the level of labelled speech frames. Frame labelling was performed outside the scope of the word models b... |

172 |
Speaker-independent isolated word recognition using dynamic features of speech spectrum
- Furui
- 1986
(Show Context)
Citation Context ...age can be thought of as computation of the log power mel-spectral coe cients, followed by the enlargement of the resulting short-time parameter vector using rst and second order temporal derivatives =-=[3]-=-, or di erences, or simply the spectral coe cients of an appropriate number of neighboring speech frames. The second stage is formed by a linear mapping involving standard feature extraction activitie... |

58 |
On some invariant criteria for grouping data
- Friedman, Rubin
- 1967
(Show Context)
Citation Context ...a linear mapping involving standard feature extraction activities like, for instance, the cosine transform [4] and rotations of the coordinate system (Karhunen-Loeve or linear discriminant transform, =-=[5, 6]-=-); nally, the features belonging to the d rst coordinate axes are selected to be fed into the vector quantizer. The basic idea presented in this paper is to incorporate the feature rotation step toget... |

3 |
Some Approaches to Optimum Feature Extraction
- Tou, Heydorn
- 1967
(Show Context)
Citation Context ...a linear mapping involving standard feature extraction activities like, for instance, the cosine transform [4] and rotations of the coordinate system (Karhunen-Loeve or linear discriminant transform, =-=[5, 6]-=-); nally, the features belonging to the d rst coordinate axes are selected to be fed into the vector quantizer. The basic idea presented in this paper is to incorporate the feature rotation step toget... |

1 |
Continuous Optimization Models. Walter de Gruyter
- Eiselt, Pederzoli, et al.
- 1987
(Show Context)
Citation Context ...al optimization is a questionable strategy when confronted with an objective function as rugged as `. Thus we moved to global minimizers without reference to derivatives such as the simplex algorithm =-=[9]-=-orcombinatorial optimization procedures; the above result relates to the great deluge algorithm [10]. : 6. CONCLUSION The FTHMM outlined above is an extension of the SCHMM formalism which incorporates... |

1 |
Tolerance Threshold and Great Deluge: New Ideas for Optimization (in German). Spektrum der Wissenschaft
- Dueck, Scheuer, et al.
- 1993
(Show Context)
Citation Context ...`. Thus we moved to global minimizers without reference to derivatives such as the simplex algorithm [9]orcombinatorial optimization procedures; the above result relates to the great deluge algorithm =-=[10]-=-. : 6. CONCLUSION The FTHMM outlined above is an extension of the SCHMM formalism which incorporates a feature rotation matrix C as a Maximum-Likelihood (ML) trainable component of the probabilistic m... |