Results 1  10
of
20
MMIE training of large vocabulary recognition systems
, 1997
"... This paper describes a framework for optimising the structure and parameters of a continuous density HMMbased large Z. vocabulary recognition system using the Maximum Mutual Information Estimation MMIE criterion. To reduce the computational complexity of the MMIE training algorithm, confusable seg ..."
Abstract

Cited by 49 (3 self)
 Add to MetaCart
This paper describes a framework for optimising the structure and parameters of a continuous density HMMbased large Z. vocabulary recognition system using the Maximum Mutual Information Estimation MMIE criterion. To reduce the computational complexity of the MMIE training algorithm, confusable segments of speech are identified and stored as word lattices of alternative utterance hypotheses. An iterative mixture splitting procedure is also employed to adjust the number of mixture components in each state during training such that the optimal balance between the number of parameters and the available training data is achieved. Experiments are presented on various test sets from the Wall Street Journal database using up to 66 hours of acoustic training data. These demonstrate that the use of lattices makes MMIE training practicable for very complex recognition systems and large training sets. Furthermore, the experimental results show that MMIE optimisation of system structure and param...
Discriminative Training of Hidden Markov Models
, 1998
"... vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
vi Abbreviations vii Notation viii 1 Introduction 1 2 Hidden Markov Models 4 2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 HMM Modelling Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 HMM Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4 Finding the Best Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.5 Setting the Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3 Objective Functions 19 3.1 Properties of Maximum Likelihood Estimators . . . . . . . . . . . . . . . . . . . 19 3.2 Maximum Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3 Maximum Mutual Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.4 Frame Discrimination . . . . . . . . . . . . . . . . ....
Hierarchical largemargin Gaussian mixture models for phonetic classification
 IEEE Workshop on ASRU
, 2007
"... In this paper we present a hierarchical largemargin Gaussian mixture modeling framework and evaluate it on the task of phonetic classification. A twostage hierarchical classifier is trained by alternately updating parameters at different levels in the tree to maximize the joint margin of the overa ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
In this paper we present a hierarchical largemargin Gaussian mixture modeling framework and evaluate it on the task of phonetic classification. A twostage hierarchical classifier is trained by alternately updating parameters at different levels in the tree to maximize the joint margin of the overall classification. Since the loss function required in the training is convex to the parameter space the problem of spurious local minima is avoided. The model achieves good performance with fewer parameters than singlelevel classifiers. In the TIMIT benchmark task of contextindependent phonetic classification, the proposed modeling scheme achieves a stateoftheart phonetic classification error of 16.7 % on the core test set. This is an absolute reduction of 1.6 % from the best previously reported result on this task, and 45 % lower than a variety of classifiers that have been recently examined on this task. Index Terms — hierarchical classifier, committee classifier, large margin GMM, phonetic classification 1.
SELECTIVE TRAINING FOR HIDDEN MARKOV MODELS with APPLICATIONS to SPEECH CLASSIFICATION
, 1997
"... Traditional maximum likelihood estimation of hidden Markov model parameters aims at maximizing the overall probability across the training tokens of a given speech unit. Therefore, it disregards any interaction and biases across the models in the training procedure. Often the resulting model paramet ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Traditional maximum likelihood estimation of hidden Markov model parameters aims at maximizing the overall probability across the training tokens of a given speech unit. Therefore, it disregards any interaction and biases across the models in the training procedure. Often the resulting model parameters do not result in minimum error classification in the training set. A new selective training method is proposed which controls the influence of outliers in the training data on the generated models. The resulting models are shown to possess feature statistics which are more clearly separated for confusable patterns. The proposed selective training procedure is used for hidden Markov model training, with application to foreign accent classification, language identification, and speech recognition using the Eset alphabet. The resulting error rates are measurably improved over traditional ForwardBackward training under open test conditions. The proposed method is similar in terms of its go...
An overview of discriminative training for speech recognition
"... This paper gives an overview of discriminative training as it pertains to the speech recognition problem. The basic theory of discriminative training will be discussed and an explanation of maximum mutual information (MMI) given. Common problems inherent to discriminative training will be explored a ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
This paper gives an overview of discriminative training as it pertains to the speech recognition problem. The basic theory of discriminative training will be discussed and an explanation of maximum mutual information (MMI) given. Common problems inherent to discriminative training will be explored as well as practicalities associated with implementing discriminative training for large vocabulary recognition. Alternatives to the MMI objective function such as minimum word error (MWE) and minimum phone error (MPE) will be discussed. The application of discriminative techniques for adaptation will be described. Finally, possible future avenues of research will be given. 1.
Automatic Model Complexity Control Using Marginalized Discriminative Growth Functions
 In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU
, 2003
"... Designing a large vocabulary speech recognition system is a highly complex problem. Many techniques affect both the system complexity and recognition performance. Automatic complexity control criteria are needed to quickly predict the recognition performance ranking of systems with varying complexit ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Designing a large vocabulary speech recognition system is a highly complex problem. Many techniques affect both the system complexity and recognition performance. Automatic complexity control criteria are needed to quickly predict the recognition performance ranking of systems with varying complexity, in order to select an optimal model structure with the minimum word error. In this paper a novel complexity control technique is proposed by using the marginalization of discriminative growth functions. A two stage approach is adopted to make the marginalization efficient. First a lower bound, related to the auxiliary function, is used to remove the dependence on the latent variables. Second a Laplace approximation is used for the integration. Experimental results on a spontaneous speech recognition task show that marginalized MMI growth function outperforms using held out data likelihood and standard Bayesian schemes in terms of both recognition performance ranking error and word error.
Adaptive Training for Large Vocabulary Continuous Speech Recognition
, 2006
"... Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational te ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational telephone speech. Hence, it typically has greater variability in terms of speaker and acoustic conditions than specially collected data. Thus, in addition to the desired speech variability required to discriminate between words, it also includes various nonspeech variabilities, for example, the change of speakers or acoustic environments. The standard approach to handle this type of data is to train hidden Markov models (HMMs) on the whole data set as if all data comes from a single acoustic condition. This is referred to as multistyle training, for example speakerindependent training. Effectively, the nonspeech variabilities are ignored. Though good performance has been obtained with multistyle systems, these systems account for all variabilities. Improvement may be obtained if the two types of variabilities in the found data are modelled separately. Adaptive training has been proposed for this purpose. In contrast to multistyle training, a set of transforms is used to represent the nonspeech variabilities. A canonical
Broad Phonetic Class Recognition in a Hidden Markov Model Framework using Extended BaumWelch Transformations
 in Proc. ASRU
, 2007
"... In many pattern recognition tasks, given some input data and a model, a probabilistic likelihood score is often computed to measure how well the model describes the data. Extended BaumWelch (EBW) transformations are most commonly used as a discriminative technique for estimating parameters of Gauss ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
In many pattern recognition tasks, given some input data and a model, a probabilistic likelihood score is often computed to measure how well the model describes the data. Extended BaumWelch (EBW) transformations are most commonly used as a discriminative technique for estimating parameters of Gaussian mixtures, though recently they have been used to derive a gradient steepness measurement to evaluate the quality of the model to match the distribution of the data. In this paper, we explore applying the EBW gradient steepness metric in the context of Hidden Markov Models (HMMs) for recognition of broad phonetic classes and present a detailed analysis and results on the use of this gradient metric on the TIMIT corpus. We find that our gradient metric is able to outperform the baseline likelihood method, and offers improvements in noisy conditions.
Discriminative Training with Tied Covariance Matrices
 Proc. of the 8th International Conference on Spoken Language Processing (ICSLP 2004), Jeju Island, Korea
, 2004
"... Discriminative training techniques have proved to be a powerful method for improving large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models. Typically, the optimization of discriminative objective functions is done using the extended Baum algorithm. Since for cont ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Discriminative training techniques have proved to be a powerful method for improving large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models. Typically, the optimization of discriminative objective functions is done using the extended Baum algorithm. Since for continuous distributions no proof of fast and stable convergence is known up to now, parameter reestimation depends on setting the iteration constants in the update rules heuristically, ensuring that the new variances are positive definite. In case of density specific variances this leads to a system of quadratic inequalities. However, if tied variances are used, the inequalities become more complicated and often the resulting constants are too large to be appropriate for discriminative training. In this paper we present an alternative approach to setting the iteration constants to alleviate this problem. First experimental results show that the new method leads to improved convergence speed and test set performance.
Structured Precision Matrix Modelling for Speech Recognition
, 2006
"... Declaration This dissertation is the result of my own work and includes nothing which is the outcome of the work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices i ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing which is the outcome of the work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices is approximately 53,000 words. ii Summary The most extensively and successfully applied acoustic model for speech recognition is the Hidden Markov Model (HMM). In particular, a multivariate Gaussian Mixture Model (GMM) is typically used to represent the output density function of each HMM state. For reasons of efficiency, the covariance matrix associated with each Gaussian component is assumed diagonal and the probability of successive observations is assumed independent given the HMM state sequence. Consequently, the spectral (intraframe) and temporal (interframe) correlations are poorly modelled. This thesis investigates ways of improving these aspects by extending the standard HMM. Parameters for these extended models are estimated discriminatively using the