• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition

by Qiang Huo , Chorkin Chan
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 14
Next 10 →

Automatic Person Verification Using Speech and Face Information

by Conrad Sanderson , 2003
"... Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one pre ..."
Abstract - Cited by 23 (7 self) - Add to MetaCart
Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. This scheme suffers from a major drawback: only the validity of the combination of a certain possession (the ATM card) and certain knowledge (the password) is verified. The ATM card can be lost or stolen, and the password can be compromised. Thus new verification methods have emerged, where the password has either been replaced by, or used in addition to, biometrics such as the person's speech, face image or fingerprints. Apart from the ATM example described above, biometrics can be applied to other areas, such as telephone & internet based banking, airline reservations & check-in, as well as forensic work and law enforcement applications. Biometric systems

Using Self-Organizing Maps and Learning Vector Quantization for Mixture Density Hidden Markov Models

by Mikko Kurimo , 1997
"... This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the col ..."
Abstract - Cited by 19 (8 self) - Add to MetaCart
This work presents experiments to recognize pattern sequences using hidden Markov models (HMMs). The pattern sequences in the experiments are computed from speech signals and the recognition task is to decode the corresponding phoneme sequences. The training of the HMMs of the phonemes using the collected speech samples is a difficult task because of the natural variation in the speech. Two neural computing paradigms, the Self-Organizing Map (SOM) and the Learning Vector Quantization (LVQ) are used in the experiments to improve the recognition performance of the models. A HMM consists of sequential states which are trained to model the feature changes in the signal produced during the modeled process. The output densities applied in this work are mixtures of Gaussian density functions. SOMs are applied to initialize and train the mixtures to give a smooth and faithful presentation of the feature vector space defined by the corresponding training samples. The SOM maps similar feature vect...

On adaptive decision rules and decision parameter adaptation for automatic speech recognition

by Chin-hui Lee, Qiang Huo - Proc. IEEE , 2000
"... Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and ..."
Abstract - Cited by 16 (3 self) - Add to MetaCart
Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine

On-Line Adaptive Learning Of The Correlated Continuous Density Hidden Markov Models For Speech Recognition

by Qiang Huo, Chin-hui Lee - IEEE Trans. on Speech and Audio Processing
"... We extend our previously proposed quasi-Bayes adaptive learning framework to cope with the correlated continuous density hidden Markov models with Gaussian mixture state observation densities in which all mean vectors are assumed to be correlated and have a joint prior distribution. A successive app ..."
Abstract - Cited by 14 (2 self) - Add to MetaCart
We extend our previously proposed quasi-Bayes adaptive learning framework to cope with the correlated continuous density hidden Markov models with Gaussian mixture state observation densities in which all mean vectors are assumed to be correlated and have a joint prior distribution. A successive approximation algorithm is proposed to implement the correlated mean vectors' updating. As an example, by applying the method to on-line speaker adaptation application, the algorithm is experimentally shown to be asymptotic convergent as well as being able to enhance the efficiency and the effectiveness of the Bayes learning by taking into account the correlation information between different models. The technique can be used to cope with the time-varying nature of some acoustic and environmental variabilities, including mismatches caused by changing speakers, channels, transducers, environments and so on.

Online Bayesian Estimation of Transition Probabilities for Markovian Jump Systems

by Vesselin P. Jilkov, X. Rong Li - IEEE TRANSACTIONS ON SIGNAL PROCESSING , 2004
"... Markovian jump systems (MJSs) evolve in a jump-wise manner by switching among simpler models, according to a finite Markov chain, whose parameters are commonly assumed known. This paper addresses the problem of state estimation of MJS with unknown transition probability matrix (TPM) of the embedded ..."
Abstract - Cited by 10 (2 self) - Add to MetaCart
Markovian jump systems (MJSs) evolve in a jump-wise manner by switching among simpler models, according to a finite Markov chain, whose parameters are commonly assumed known. This paper addresses the problem of state estimation of MJS with unknown transition probability matrix (TPM) of the embedded Markov chain governing the jumps. Under the assumption of a time-invariant but random TPM, an approximate recursion for the TPMs posterior probability density function (PDF) within the Bayesian framework is obtained. Based on this recursion, four algorithms for online minimum mean-square error (MMSE) estimation of the TPM are derived. The first algorithm (for the case of a two-state Markov chain) computes the MMSE estimate exactly, if the likelihood of the TPM is linear in the transition probabilities. Its computational load is, however, increasing with the data length. To limit the computational cost, three alternative algorithms are further developed based on different approximation techniques---truncation of high order moments, quasi-Bayesian approximation, and numerical integration, respectively. The proposed

Enhancements to Transformation-Based Speaker Adaptation: Principal Component and Inter-Class Maximum Likelihood Linear Regression

by Sam-joo Doh , 2000
"... iii Abstract In this thesis we improve speech recognition accuracy by obtaining better estimation of linear transformation functions with a small amount of adaptation data in speaker adaptation. The major contributions of this thesis are the developments of two new adaptation algorithms to improve ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
iii Abstract In this thesis we improve speech recognition accuracy by obtaining better estimation of linear transformation functions with a small amount of adaptation data in speaker adaptation. The major contributions of this thesis are the developments of two new adaptation algorithms to improve maximum likelihood linear regression. The first one is called principal component MLLR (PC-MLLR), and it reduces the variance of the estimate of the MLLR matrix using principal component analysis. The second one is called inter-class MLLR, and it utilizes relationships among different transformation functions to achieve more reliable estimates of MLLR parameters across multiple classes. The main idea of PC-MLLR is that if we estimate the MLLR matrix in the eigendomain, the variances of the components of the estimates are inversely proportional to their eigenvalues. Therefore we can select more reliable components to reduce the variances of the resulting estimates and to improve speech recognition accuracy. PC-MLLR eliminates highly variable components and chooses the principal components corresponding to the largest eigenvalues. If all the component are used, PC-MLLR becomes the same as conventional MLLR. Choosing fewer principal components increases the bias of the estimates which can reduce recognition accuracy. To compensate for this problem, we developed weighted principal component MLLR (WPC-MLLR). Instead of eliminating some of the components, all the components in WPC-MLLR are used after applying weights that minimize the mean square error. The component corresponding to a larger eigenvalue has a larger weight than the component corresponding to a smaller eigenvalue. As more adaptation data become available, the benefits from these methods may become smaller because ...

Use of model transformations for distributed speech recognition

by Naveen Srinivasamurthy, Shrikanth Narayanan, Antonio Ortega - in ISCA ITR-Workshop 2001 (Adaptation Methods for Speech Recognition), (Sophia-Antipolis , 2001
"... Due to bandwidth limitations, the speech recognizer in distributed speech recognition (DSR) applications has to use encoded speech – either traditional speech encoding or speech encoding optimized for recognition. The penalty incurred in reducing the bitrate is degradation in speech recognition perf ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Due to bandwidth limitations, the speech recognizer in distributed speech recognition (DSR) applications has to use encoded speech – either traditional speech encoding or speech encoding optimized for recognition. The penalty incurred in reducing the bitrate is degradation in speech recognition performance. The diversity of the applications using DSR implies that a variety of speech encoders can be used to compress speech. By treating the encoder variability as a mismatch we propose using model transformation to reduce the speech recognition performance degradation. The advantage of using model transformation is that only a single model set needs to be trained at the server, which can be adapted on the fly to the input speech data. We were able to reduce the word error rate by 61.9 %, 63.3 % and 56.3 % for MELP, GSM and MFCC-encoded data, respectively, by using MAP adaptation, which shows the generality of our proposed scheme. 1.

Bayesian Adaptation of Speech Recognizers to Field Speech Data

by Carmelo Giammarco Miglietta, Chafic Mokbel, Denis JOUVET, Jean Monné - In: Proc. ICSLP'96 , 1996
"... This work studies a Bayesian (or Maximum A Posteriori MAP) approach to the adaptation of Continuous Density Hidden Markov Models (CDHMMs) to a specific condition of a speech recognition application. In order to improve the model robustness, CDHMMs formerly trained from laboratory data are then adapt ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
This work studies a Bayesian (or Maximum A Posteriori MAP) approach to the adaptation of Continuous Density Hidden Markov Models (CDHMMs) to a specific condition of a speech recognition application. In order to improve the model robustness, CDHMMs formerly trained from laboratory data are then adapted using context dependent field utterances. Two specific problems have to be faced when using the MAP approach: the estimation of the a priori distribution parameters and the lack of field adaptation data for some distributions of the CDHMM.

SPEECH AND

by Zheng-yu Niu A, Dong-hong Ji A, Chew Lim Tan B , 2007
"... COMPUTER ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Abstract not found

Fitting a Conditional Linear Gaussian Distribution

by Kevin P. Murphy , 1998
"... We consider the problem of finding the maximum likelihood (ML) estimates of the parameters of a conditional ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
We consider the problem of finding the maximum likelihood (ML) estimates of the parameters of a conditional
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University