Results 1 - 10
of
14
Novel Estimation Methods for Unsupervised Discovery of Latent Structure in Natural Language Text
, 2006
"... This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likel ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a “neighborhood” of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations)
Discriminative linear transforms for feature normalization and speaker adaptation in HMM estimation
, 2002
"... Linear transforms have been used extensively for training and adaptation of HMM-based ASR systems. Recently procedures have been developed for the estimation of linear transforms under the Maximum Mutual Information (MMI) criterion. In this paper we introduce discriminative training procedures that ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Linear transforms have been used extensively for training and adaptation of HMM-based ASR systems. Recently procedures have been developed for the estimation of linear transforms under the Maximum Mutual Information (MMI) criterion. In this paper we introduce discriminative training procedures that employ linear transforms for feature normalization and for speaker adaptive training. We integrate these discriminative linear transforms into MMI estimation of HMM parameters for improvement of large vocabulary conversational speech recognition systems. 1.
MPE-Based Discriminative Linear Transform for Speaker Adaptation
- in International Conference on Acoustics, Speech, and Signal Processing
, 2004
"... In this paper, we present a discriminative method for speaker adaptation, where the minimum phone error (MPE) criterion is used to estimate the discriminative linear transform (DLT), including mean and diagonal variance transforms. The I-smoothing technique is essential to improve the generalization ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this paper, we present a discriminative method for speaker adaptation, where the minimum phone error (MPE) criterion is used to estimate the discriminative linear transform (DLT), including mean and diagonal variance transforms. The I-smoothing technique is essential to improve the generalization of DLTs. Experiments on supervised adaptation for non-native speakers on the North American Business (NAB) Spoke 3 task show that MPEbased DLT outperforms both MLLR and previously proposed discriminative method for transform estimation. Preliminary experiments on unsupervised DLT estimation are plotted on conversational telephone speech transcription.
Discriminative estimation of subspace precision and mean (SPAM) models
- in Proc. Eurospeech
, 2003
"... The SPAM model was recently proposed as a very general method for modeling Gaussians with constrained means and covariances. It has been shown to yield significant error rate improvements over other methods of constraining covariances such as diagonal, semi-tied covariances, and extended maximum lik ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
The SPAM model was recently proposed as a very general method for modeling Gaussians with constrained means and covariances. It has been shown to yield significant error rate improvements over other methods of constraining covariances such as diagonal, semi-tied covariances, and extended maximum likelihood linear transformations. In this paper we address the problem of discriminative estimation of SPAM model parameters, in an attempt to further improve its performance. We present discriminative estimation under two criteria: maximum mutual information (MMI) and an “error-weighted ” training. We show that both these methods individually result in over 20 % relative reduction in word error rate on a digit task over maximum likelihood (ML) estimated SPAM model parameters. We also show that a gain of as much as 28 % relative can be achieved by combining these two discriminative estimation techniques. The techniques developed in this paper also apply directly to an extension of SPAM called subspace constrained exponential models. 1.
Discriminative Cluster Adaptive Training
, 2004
"... Abstract—Multiple-cluster schemes, such as cluster adaptive training (CAT) or eigenvoice systems, are a popular approach for rapid speaker and environment adaptation. Interpolation weights are used to transform a multiple-cluster, canonical, model to a standard hidden Markov model (HMM) set represen ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Abstract—Multiple-cluster schemes, such as cluster adaptive training (CAT) or eigenvoice systems, are a popular approach for rapid speaker and environment adaptation. Interpolation weights are used to transform a multiple-cluster, canonical, model to a standard hidden Markov model (HMM) set representative of an individual speaker or acoustic environment. Maximum likelihood training for CAT has previously been investigated. However, in state-of-the-art large vocabulary continuous speech recognition systems, discriminative training is commonly employed. This paper investigates applying discriminative training to multiple-cluster systems. In particular, minimum phone error (MPE) update formulae for CAT systems are derived. In order to use MPE in this case, modifications to the standard MPE smoothing function and the prior distribution associated with MPE training are required. A more complex adaptive training scheme combining both interpolation weights and linear transforms, a structured transform (ST), is also discussed within the MPE training framework. Discriminatively trained CAT and ST systems were evaluated on a state-of-the-art conversational telephone speech task. These multiple-cluster systems were found to outperform both standard and adaptively trained systems. Index Terms—Cluster adaptive training (CAT), discriminative training, eigenvoices, minimum phone error (MPE), multiple-cluster HMM. I.
Adaptive Training for Large Vocabulary Continuous Speech Recognition
, 2006
"... Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational te ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational telephone speech. Hence, it typically has greater variability in terms of speaker and acoustic conditions than specially collected data. Thus, in addition to the desired speech variability required to discriminate between words, it also includes various non-speech variabil-ities, for example, the change of speakers or acoustic environments. The standard approach to handle this type of data is to train hidden Markov models (HMMs) on the whole data set as if all data comes from a single acoustic condition. This is referred to as multi-style training, for exam-ple speaker-independent training. Effectively, the non-speech variabilities are ignored. Though good performance has been obtained with multi-style systems, these systems account for all variabilities. Improvement may be obtained if the two types of variabilities in the found data are modelled separately. Adaptive training has been proposed for this purpose. In contrast to multi-style training, a set of transforms is used to represent the non-speech variabilities. A canonical
MAXIMUM CONDITIONAL LIKELIHOOD LINEAR REGRESSION AND MAXIMUM A POSTERIORI FOR HIDDEN CONDITIONAL RANDOM FIELDS SPEAKER ADAPTATION
"... This paper shows how to improve Hidden Conditional Random Fields (HCRFs) for phone classification by applying various speaker adaptation techniques. These include Maximum A Posteriori (MAP) adaptation as well as a new technique we introduce called Maximum Conditional Likelihood Linear Regression (MC ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper shows how to improve Hidden Conditional Random Fields (HCRFs) for phone classification by applying various speaker adaptation techniques. These include Maximum A Posteriori (MAP) adaptation as well as a new technique we introduce called Maximum Conditional Likelihood Linear Regression (MCLLR), a discriminative variant of the widely used MLLR algorithm. In previous work, we and others have shown that HCRFs outperform even discriminatively trained HMMs. In this paper we show that HCRFs adapted via MCLLR or via MAP adaptation also work better than similarly adapted HMMs. We also compare MCLLR and MAP adaptation performance with different amounts of adaptation data. MCLLR adaptation performs better when the amount of adaptation data is relatively small, while MAP adaptation outperforms MCLLR with larger amounts of adaptation.
Discriminative Training with Tied Covariance Matrices
- Proc. of the 8th International Conference on Spoken Language Processing (ICSLP 2004), Jeju Island, Korea
, 2004
"... Discriminative training techniques have proved to be a powerful method for improving large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models. Typically, the optimization of discriminative objective functions is done using the extended Baum algorithm. Since for cont ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Discriminative training techniques have proved to be a powerful method for improving large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models. Typically, the optimization of discriminative objective functions is done using the extended Baum algorithm. Since for continuous distributions no proof of fast and stable convergence is known up to now, parameter re-estimation depends on setting the iteration constants in the update rules heuristically, ensuring that the new variances are positive definite. In case of density specific variances this leads to a system of quadratic inequalities. However, if tied variances are used, the inequalities become more complicated and often the resulting constants are too large to be appropriate for discriminative training. In this paper we present an alternative approach to setting the iteration constants to alleviate this problem. First experimental results show that the new method leads to improved convergence speed and test set performance.
Unsupervised Adaptation With Discriminative Mapping Transforms
"... Abstract—The most commonly used approaches to speaker adaptation are based on linear transforms, as these can be robustly estimated using limited adaptation data. Although significant gains can be obtained using discriminative criteria for training acoustic models, maximum-likelihood (ML) estimated ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—The most commonly used approaches to speaker adaptation are based on linear transforms, as these can be robustly estimated using limited adaptation data. Although significant gains can be obtained using discriminative criteria for training acoustic models, maximum-likelihood (ML) estimated transforms are still used for unsupervised adaptation. This is because discriminatively trained transforms are highly sensitive to errors in the adaptation supervision hypothesis. This paper describes a new framework for estimating transforms that are discriminative in nature, but are less sensitive to this hypothesis issue. A speaker-independent discriminative mapping transformation (DMT) is estimated during training. This transform is obtained after a speaker-specific ML-estimated transform of each training speaker has been applied. During recognition an ML speaker-specific transform is found for each test-set speaker and the speaker-independent DMT then applied. This allows a transform which is discriminative in nature to be indirectly estimated, while only requiring an ML speaker-specific transform to be found during recognition. The DMT technique is evaluated on an English conversational telephone speech task. Experiments showed that using DMT in unsupervised adaptation led to significant gains over both standard ML and discriminatively trained transforms. Index Terms—Criterion mapping function, discriminative mapping transform, discriminative training, unsupervised adaptation. I.
Discriminative Estimation of Subspace Constrained Gaussian Mixture Models for Speech Recognition
"... Abstract — In this paper we study discriminative training of acoustic models for speech recognition under two criteria: maximum mutual information (MMI) and a novel “error weighted” training technique. We present a proof that the standard MMI training technique is valid for a very general class of a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract — In this paper we study discriminative training of acoustic models for speech recognition under two criteria: maximum mutual information (MMI) and a novel “error weighted” training technique. We present a proof that the standard MMI training technique is valid for a very general class of acoustic models with any kind of parameter tying. We report experimental results for subspace constrained Gaussian mixture models (SCG-MMs), where the exponential model weights of all Gaussians are required to belong to a common “tied ” subspace, as well as for Subspace Precision and Mean (SPAM) models which impose separate subspace constraints on the precision matrices (i.e. inverse covariance matrices) and means. It has been shown previously that SCGMMs and SPAM models generalize and yield significant error rate improvements over previously considered model classes such as diagonal models, models with semi-tied covariances, and EMLLT (extended maximum likelihood linear transformation) models. We show here that MMI and error weighted training each individually result in over 20 % relative reduction in word error rate on a digit task over maximum likelihood (ML) training. We also show that a gain of as much as 28 % relative can be achieved by combining these two discriminative estimation techniques. I.

