Results 1  10
of
341
SemiTied Covariance Matrices For Hidden Markov Models
 IEEE Transactions on Speech and Audio Processing
, 1999
"... There is normally a simple choice made in the form of the covariance matrix to be used with continuousdensity HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or blockdiagonal matrix is used, where all ..."
Abstract

Cited by 181 (27 self)
 Add to MetaCart
There is normally a simple choice made in the form of the covariance matrix to be used with continuousdensity HMMs. Either a diagonal covariance matrix is used, with the underlying assumption that elements of the feature vector are independent, or a full or blockdiagonal matrix is used, where all or some of the correlations are explicitly modelled. Unfortunately when using full or blockdiagonal covariance matrices there tends to be a dramatic increase in the number of parameters per Gaussian component, limiting the number of components which may be robustly estimated. This paper introduces a new form of covariance matrix which allows a few \full" covariance matrices to be shared over many distributions, whilst each distribution maintains its own \diagonal" covariance matrix. In contrast to other schemes which have hypothesised a similar form, this technique ts within the standard maximumlikelihood criterion used for training HMMs. The new form of covariance matrix is evaluated on a largevocabulary speechrecognition task. In initial experiments the performance of the standard system was achieved using approximately half the number of parameters. Moreover, a 10% reduction in word error rate compared to a standard system can be achieved with less than a 1% increase in the number of parameters and little increase in recognition time. 2 1
Parametric Hidden Markov Models for Gesture Recognition
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1999
"... AbstractÐA new method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture we mean gestures that exhibit a systematic spatial variation; one example is a point gesture where the relevant parameter is the twodimensional direction. Ou ..."
Abstract

Cited by 144 (3 self)
 Add to MetaCart
AbstractÐA new method for the representation, recognition, and interpretation of parameterized gesture is presented. By parameterized gesture we mean gestures that exhibit a systematic spatial variation; one example is a point gesture where the relevant parameter is the twodimensional direction. Our approach is to extend the standard hidden Markov model method of gesture recognition by including a global parametric variation in the output probabilities of the HMM states. Using a linear model of dependence, we formulate an expectationmaximization (EM) method for training the parametric HMM. During testing, a similar EM algorithm simultaneously maximizes the output likelihood of the PHMM for the given sequence and estimates the quantifying parameters. Using visually derived and directly measured threedimensional hand position measurements as input, we present results that demonstrate the recognition superiority of the PHMM over standard HMM techniques, as well as greater robustness in parameter estimation with respect to noise in the input features. Last, we extend the PHMM to handle arbitrary smooth (nonlinear) dependencies. The nonlinear formulation requires the use of a generalized expectationmaximization (GEM) algorithm for both training and the simultaneous recognition of the gesture and estimation of the value of the parameter. We present results on a pointing gesture, where the nonlinear approach permits the natural spherical coordinate parameterization of pointing direction. Index TermsÐGesture recognition, hidden Markov models, expectationmaximization algorithm, timeseries modeling, computer vision. 1
Maximum Likelihood Modeling With Gaussian Distributions For Classification
 Proceedings of ICASSP
, 1998
"... Maximum Likelihood (ML) modeling of multiclass data for classication often suers from the following problems: a) data insuciency implying overtrained or unreliable models b) large storage requirement c) large computational requirement and/or d) ML is not discriminating between classes. Sharing param ..."
Abstract

Cited by 99 (26 self)
 Add to MetaCart
Maximum Likelihood (ML) modeling of multiclass data for classication often suers from the following problems: a) data insuciency implying overtrained or unreliable models b) large storage requirement c) large computational requirement and/or d) ML is not discriminating between classes. Sharing parameters across classes (or constraining the parameters) clearly tends to alleviate the rst three problems. It this paper we show that in some cases it can also lead to better discrimination (as evidenced by reduced misclassication error). The parameters considered are the means and variances of the gaussians and linear transformations of the feature space (or equivalently the gaussian means). Some constraints on the parameters are shown to lead to Linear Discrimination Analysis (a wellknown result) while others are shown to lead to optimal feature spaces (a relatively new result) . Applications of some of these ideas to the speech recognition problem are also given. 1.
Channel compensation for SVM speaker recognition
 in Proceedings of Odyssey04, The Speaker and Language Recognition Workshop
"... One of the major remaining challenges to improving accuracy in stateoftheart speaker recognition algorithms is reducing the impact of channel and handset variations on system performance. For Gaussian Mixture Model based speaker recognition systems, a variety of channeladaptation techniques are ..."
Abstract

Cited by 77 (13 self)
 Add to MetaCart
One of the major remaining challenges to improving accuracy in stateoftheart speaker recognition algorithms is reducing the impact of channel and handset variations on system performance. For Gaussian Mixture Model based speaker recognition systems, a variety of channeladaptation techniques are known and available for adapting models between different channel conditions, but for the much more recent Support Vector Machine (SVM) based approaches to this problem, much less is known about the best way to handle this issue. In this paper we explore techniques that are specific to the SVM framework in order to derive fully nonlinear channel compensations. The result is a system that is less sensitive to specific kinds of labeled channel variations observed in training. 1.
Large Scale Discriminative Training For Speech Recognition
, 2000
"... This paper describes, and evaluates on a large scale, the lattice based framework for discriminative training of large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models (HMMs). The paper concentrates on the maximum mutual information estimation (MMIE) criterion whi ..."
Abstract

Cited by 71 (5 self)
 Add to MetaCart
This paper describes, and evaluates on a large scale, the lattice based framework for discriminative training of large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models (HMMs). The paper concentrates on the maximum mutual information estimation (MMIE) criterion which has been used to train HMM systems for conversational telephone speech transcription using up to 265 hours of training data. These experiments represent the largestscale application of discriminative training techniques for speech recognition of which the authors are aware, and have led to significant reductions in word error rate for both triphone and quinphone HMMs compared to our best models trained using maximum likelihood estimation. The MMIE latticebased implementation used; techniques for ensuring improved generalisation; and interactions with maximum likelihood based adaptation are all discussed. Furthermore several variations to the MMIE training scheme are introduced with the a...
Cluster Adaptive Training Of Hidden Markov Models
 IEEE Transactions on Speech and Audio Processing
, 1999
"... When performing speaker adaptation there are two conicting requirements. First the transform must be powerful enough to represent the speaker. Second the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt ..."
Abstract

Cited by 57 (15 self)
 Add to MetaCart
When performing speaker adaptation there are two conicting requirements. First the transform must be powerful enough to represent the speaker. Second the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt the models to be representative of an individual speaker. This limits how rapidly the models may be adapted to a new speaker or acoustic environment. This paper examines an adaptation scheme requiring very few parameters, cluster adaptive training (CAT). CAT may be viewed as a simple extension to speaker clustering. Rather than selecting a single cluster as representative of a particular speaker, a linear interpolation of all the cluster means is used as the mean of the particular speaker. This scheme naturally falls into an adaptive training framework. Maximum likelihood estimates of the interpolation weights are given. Furthermore, simple reestimation formulae for cluster means, represented both explicitly and by sets of transforms of some canonical mean, are given. On a speakerindependent task CAT reduced the word error rate using very little adaptation data. In addition when combined with other adaptation schemes it gave a 5% reduction in word error rate over adapting a speakerindependent model set. 2 1
Unsupervised language model adaptation
 In IEEE Int’l Conf. on Acoustics, Speech, and Signal Processing
, 2003
"... This paper investigates unsupervised language model adaptation, from ASR transcripts. Ngram counts from these transcripts can be used either to adapt an existing ngram model or to build an ngram model from scratch. Various experimental results are reported on a particular domain adaptation task, ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
This paper investigates unsupervised language model adaptation, from ASR transcripts. Ngram counts from these transcripts can be used either to adapt an existing ngram model or to build an ngram model from scratch. Various experimental results are reported on a particular domain adaptation task, namely building a customer care application starting from a general voicemail transcription system. The experiments investigate the effectiveness of various adaptation strategies, including iterative adaptation and selfadaptation on the test data. They show an error rate reduction of 3.9 % over the unadapted baseline performance, from 28 % to 24.1%, using 17 hours of unsupervised adaptation material. This is 51 % of the 7.7 % adaptation gain obtained by supervised adaptation. Selfadaptation on the test data resulted in a 1.3 % improvement over the baseline. 1.
Supervised and unsupervised PCFG adaptation to novel domains
, 2003
"... This paper investigates adapting a lexicalized probabilistic contextfree grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is general enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
This paper investigates adapting a lexicalized probabilistic contextfree grammar (PCFG) to a novel domain, using maximum a posteriori (MAP) estimation. The MAP framework is general enough to include some previous model adaptation approaches, such as corpus mixing in Gildea (2001), for example. Other approaches falling within this framework are more effective. In contrast to the results
Uncertainty decoding for noise robust speech recognition
 in Proc. Interspeech
, 2004
"... This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings ..."
Abstract

Cited by 36 (12 self)
 Add to MetaCart
This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings
Robust speakeradaptive HMMbased texttospeech synthesis
 IEEE Trans. on Audio, Speech and Language Processing
, 2009
"... Abstract—This paper describes a speakeradaptive HMMbased speech synthesis system. The new system, called “HTS2007, ” employs speaker adaptation (CSMAPLR+MAP), featurespace adaptive training, mixedgender modeling, and fullcovariance modeling using CSMAPLR transforms, in addition to several othe ..."
Abstract

Cited by 35 (14 self)
 Add to MetaCart
Abstract—This paper describes a speakeradaptive HMMbased speech synthesis system. The new system, called “HTS2007, ” employs speaker adaptation (CSMAPLR+MAP), featurespace adaptive training, mixedgender modeling, and fullcovariance modeling using CSMAPLR transforms, in addition to several other techniques that have proved effective in our previous systems. Subjective evaluation results show that the new system generates significantly better quality synthetic speech than speakerdependent approaches with realistic amounts of speech data, and that it bears comparison with speakerdependent approaches even when large amounts of speech data are available. In addition, a comparison study with several speech synthesis techniques shows the new system is very robust: It is able to build voices from lessthanideal speech data and synthesize goodquality speech even for outofdomain sentences. Index Terms—Average voice, HMMbased speech synthesis, HMM Speech Synthesis System, HTS, speaker adaptation, speech synthesis, voice conversion.