• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models (1995)

by C J Leggetter, P C Woodland
Venue:Computer, Speech and Language
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 347
Next 10 →

Audio-visual automatic speech recognition: An overview

by Gerasimos Potamianos, Chalapathy Neti, Juergen Luettin, Iain Matthews - Issues in Visual and Audio-visual Speech Processing , 2004
"... We have made significant progress in automatic speech recognition (ASR) for well-defined applications like dictation and medium vocabulary transaction processing tasks in relatively controlled environments. However, ASR performance has yet to reach the level required for speech to become a truly per ..."
Abstract - Cited by 41 (0 self) - Add to MetaCart
We have made significant progress in automatic speech recognition (ASR) for well-defined applications like dictation and medium vocabulary transaction processing tasks in relatively controlled environments. However, ASR performance has yet to reach the level required for speech to become a truly pervasive user interface. Indeed, even in “clean ” acoustic environments, and for a variety of tasks, state of the art ASR system

Variance Compensation Within The Mllr Framework For Robust Speech Recognition And Speaker Adaptation

by Gales Pye Woodland , 1996
"... This paper investigates the use of maximum likelihood linear regression (MLLR) for both speaker and environment adaptation. MLLR transforms the mean and variance parameters of a set of HMMs. In this paper a number of different types of linear transformations of the variances are examined including f ..."
Abstract - Cited by 36 (7 self) - Add to MetaCart
This paper investigates the use of maximum likelihood linear regression (MLLR) for both speaker and environment adaptation. MLLR transforms the mean and variance parameters of a set of HMMs. In this paper a number of different types of linear transformations of the variances are examined including full, block diagonal, and diagonal transformation matrices. Experiments on large vocabulary speaker independent data sets are described. On all the data sets examined the use of MLLR mean and variance compensation reduced the error rate compared to mean-only compensation. Furthermore, the use of a block diagonal or full transformation of the variances on the clean data task showed slight improvements over the diagonal case. However, when some environmental mismatch was present there was no difference in performance between using multiple diagonal variance transformations and a more complex single variance transform.

Cluster Adaptive Training Of Hidden Markov Models

by M.J.F. Gales - IEEE Transactions on Speech and Audio Processing , 1999
"... When performing speaker adaptation there are two conicting requirements. First the transform must be powerful enough to represent the speaker. Second the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt ..."
Abstract - Cited by 36 (11 self) - Add to MetaCart
When performing speaker adaptation there are two conicting requirements. First the transform must be powerful enough to represent the speaker. Second the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt the models to be representative of an individual speaker. This limits how rapidly the models may be adapted to a new speaker or acoustic environment. This paper examines an adaptation scheme requiring very few parameters, cluster adaptive training (CAT). CAT may be viewed as a simple extension to speaker clustering. Rather than selecting a single cluster as representative of a particular speaker, a linear interpolation of all the cluster means is used as the mean of the particular speaker. This scheme naturally falls into an adaptive training framework. Maximum likelihood estimates of the interpolation weights are given. Furthermore, simple re-estimation formulae for cluster means, represented both explicitly and by sets of transforms of some canonical mean, are given. On a speakerindependent task CAT reduced the word error rate using very little adaptation data. In addition when combined with other adaptation schemes it gave a 5% reduction in word error rate over adapting a speaker-independent model set. 2 1

Maximum Likelihood and Minimum Classification Error Factor Analysis for Automatic Speech Recognition

by Lawrence Saul, Mazin Rahim - IEEE Transactions on Speech and Audio Processing , 1997
"... Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the short-time properties of speech. Correlations between features can arise when the speech signal is non-stationary or corrupted by noise. We investigate how to model these correlatio ..."
Abstract - Cited by 34 (3 self) - Add to MetaCart
Hidden Markov models (HMMs) for automatic speech recognition rely on high dimensional feature vectors to summarize the short-time properties of speech. Correlations between features can arise when the speech signal is non-stationary or corrupted by noise. We investigate how to model these correlations using factor analysis, a statistical method for dimensionality reduction. Factor analysis uses a small number of parameters to model the covariance structure of high dimensional data. These parameters can be chosen in two ways: (i) to maximize the likelihood of observed speech signals, or (ii) to minimize the number of classification errors. We derive an Expectation-Maximization (EM) algorithm for maximum likelihood estimation and a gradient descent algorithm for improved class discrimination. Speech recognizers are evaluated on two tasks, one small-sized vocabulary (connected alpha-digits) and one medium-sized vocabulary (New Jersey town names). We find that modeling feature correlations...

Lightly Supervised and Unsupervised Acoustic Model Training

by Lori Lamel, Jean-luc Gauvain, Gilles Adda - Computer Speech and Language , 2002
"... The last decade has witnessed substantial progress in speech recognition technology, with todays state-of-the-art systems being able to transcribe unrestricted broadcast news audio data with a word error of about 20%. ..."
Abstract - Cited by 34 (2 self) - Add to MetaCart
The last decade has witnessed substantial progress in speech recognition technology, with todays state-of-the-art systems being able to transcribe unrestricted broadcast news audio data with a word error of about 20%.

An overview of text-independent speaker recognition: from features to supervectors

by Tomi Kinnunen, Haizhou Li , 2009
"... This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of ..."
Abstract - Cited by 31 (14 self) - Add to MetaCart
This paper gives an overview of automatic speaker recognition technology, with an emphasis on text-independent recognition. Speaker recognition has been studied actively for several decades. We give an overview of both the classical and the state-of-the-art methods. We start with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling. We elaborate advanced computational techniques to address robustness and session variability. The recent progress from vectors towards supervectors opens up a new area of exploration and represents a technology trend. We also provide an overview of this recent development and discuss the evaluation methodology of speaker recognition systems. We conclude the paper with discussion on future directions.

A comparative study of adaptation methods for speaker verification

by Johnny Mariéthoz, Samy Bengio - in ICSLP, 2002
"... Real-life speaker verification systems are often implemented using client model adaptation methods, since the amount of data available for each client is often too low to consider plain Maximum Likelihood methods. While the Bayesian Maximum A Posteriori (MAP) adaptation method is commonly used in sp ..."
Abstract - Cited by 29 (12 self) - Add to MetaCart
Real-life speaker verification systems are often implemented using client model adaptation methods, since the amount of data available for each client is often too low to consider plain Maximum Likelihood methods. While the Bayesian Maximum A Posteriori (MAP) adaptation method is commonly used in speaker verification, other methods have proven to be successful in related domains such as speech recognition. This paper reports on experimental comparison between three well-known adaptation methods, namely MAP, Maximum Likelihood Linear Regression, and finally Eigen-Voices. All three methods are compared to the more classical Maximum Likelihood method, and results are given for a subset of the 1999 NIST Speaker Recognition Evaluation database. 1.

Connectionist speech recognition of Broadcast News

by A. J. Robinson , G. D. Cook , D. P. W. Ellis , E. Fosler-Lussier , S. J. Renals , D. A. G. Williams , 2002
"... This paper describes connectionist techniques for recognition of Broadcast News. The fundamental difference between connectionist systems and more conventional mixture-of-Gaussian systems is that connectionist models directly estimate posterior probabilities as opposed to likelihoods. Access to post ..."
Abstract - Cited by 28 (10 self) - Add to MetaCart
This paper describes connectionist techniques for recognition of Broadcast News. The fundamental difference between connectionist systems and more conventional mixture-of-Gaussian systems is that connectionist models directly estimate posterior probabilities as opposed to likelihoods. Access to posterior probabilities has enabled us to develop a number of novel approaches to confidence estimation, pronunciation modelling and search. In addition we have investigated a new feature extraction technique based on the modulation-filtered spectrogram (MSG), and methods for combining multiple information sources. We have incorporated all of these techniques into a system for the transcription

Machine Learning Techniques for the Computer Security Domain of Anomaly Detection

by Terran D. Lane , 2000
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xv 1 ..."
Abstract - Cited by 27 (1 self) - Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xv 1

Classification and Regression using Mixtures of Experts

by Steven Richard Waterhouse , 1997
"... ..."
Abstract - Cited by 27 (0 self) - Add to MetaCart
Abstract not found
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University