Results 1 - 10
of
24
Domain adaptation with structural correspondence learning
- In EMNLP
, 2006
"... Discriminative learning methods are widely used in natural language processing. These methods work best when their training and test data are drawn from the same distribution. For many NLP tasks, however, we are confronted with new domains in which labeled data is scarce or non-existent. In such cas ..."
Abstract
-
Cited by 91 (9 self)
- Add to MetaCart
Discriminative learning methods are widely used in natural language processing. These methods work best when their training and test data are drawn from the same distribution. For many NLP tasks, however, we are confronted with new domains in which labeled data is scarce or non-existent. In such cases, we seek to adapt existing models from a resourcerich source domain to a resource-poor target domain. We introduce structural correspondence learning to automatically induce correspondences among features from different domains. We test our technique on part of speech tagging and show performance gains for varying amounts of source and target training data, as well as improvements in target domain parsing accuracy using our improved tagger. 1
2000. Rapid speaker adaptation in eigenvoice space
- IEEE Transations on Speech and Audio Processing 8
"... Abstract—This paper describes a new model-based speaker adaptation algorithm called the eigenvoice approach. The approach constrains the adapted model to be a linear combination of a small number of basis vectors obtained offline from a set of reference speakers, and thus greatly reduces the number ..."
Abstract
-
Cited by 65 (6 self)
- Add to MetaCart
Abstract—This paper describes a new model-based speaker adaptation algorithm called the eigenvoice approach. The approach constrains the adapted model to be a linear combination of a small number of basis vectors obtained offline from a set of reference speakers, and thus greatly reduces the number of free parameters to be estimated from adaptation data. These “eigenvoice ” basis vectors are orthogonal to each other and guaranteed to represent the most important components of variation between the reference speakers. Experimental results for a small-vocabulary task (letter recognition) given in the paper show that the approach yields major improvements in performance for tiny amounts of adaptation data. For instance, we obtained 16% relative improvement in error rate with one letter of supervised adaptation data, and 26 % relative improvement with four letters of supervised adaptation data. After a comparison of the eigenvoice approach with other speaker adaptation algorithms, the paper concludes with a discussion of future work. Index Terms—Eigenvoice approach, principal component analysis, speaker adaptation, speaker clustering. I.
Cluster Adaptive Training Of Hidden Markov Models
- IEEE Transactions on Speech and Audio Processing
, 1999
"... When performing speaker adaptation there are two conicting requirements. First the transform must be powerful enough to represent the speaker. Second the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt ..."
Abstract
-
Cited by 36 (11 self)
- Add to MetaCart
When performing speaker adaptation there are two conicting requirements. First the transform must be powerful enough to represent the speaker. Second the transform must be quickly and easily estimated for any particular speaker. The most popular adaptation schemes have used many parameters to adapt the models to be representative of an individual speaker. This limits how rapidly the models may be adapted to a new speaker or acoustic environment. This paper examines an adaptation scheme requiring very few parameters, cluster adaptive training (CAT). CAT may be viewed as a simple extension to speaker clustering. Rather than selecting a single cluster as representative of a particular speaker, a linear interpolation of all the cluster means is used as the mean of the particular speaker. This scheme naturally falls into an adaptive training framework. Maximum likelihood estimates of the interpolation weights are given. Furthermore, simple re-estimation formulae for cluster means, represented both explicitly and by sets of transforms of some canonical mean, are given. On a speakerindependent task CAT reduced the word error rate using very little adaptation data. In addition when combined with other adaptation schemes it gave a 5% reduction in word error rate over adapting a speaker-independent model set. 2 1
A comparative study of adaptation methods for speaker verification
- in ICSLP, 2002
"... Real-life speaker verification systems are often implemented using client model adaptation methods, since the amount of data available for each client is often too low to consider plain Maximum Likelihood methods. While the Bayesian Maximum A Posteriori (MAP) adaptation method is commonly used in sp ..."
Abstract
-
Cited by 29 (12 self)
- Add to MetaCart
Real-life speaker verification systems are often implemented using client model adaptation methods, since the amount of data available for each client is often too low to consider plain Maximum Likelihood methods. While the Bayesian Maximum A Posteriori (MAP) adaptation method is commonly used in speaker verification, other methods have proven to be successful in related domains such as speech recognition. This paper reports on experimental comparison between three well-known adaptation methods, namely MAP, Maximum Likelihood Linear Regression, and finally Eigen-Voices. All three methods are compared to the more classical Maximum Likelihood method, and results are given for a subset of the 1999 NIST Speaker Recognition Evaluation database. 1.
Maximum likelihood Eigenspace and MLLR for speech recognition in noisy environments
- in Noisy Environments”, Eurospeech-99, V. 6
, 1999
"... Abstract – A technique for rapid speaker adaptation, called eigenvoices, was introduced recently. The key idea is to confine models in a very low-dimensional linear vector space. This space summarizes a priori knowledge that we have about speaker models. In many practical systems, however, there is ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
Abstract – A technique for rapid speaker adaptation, called eigenvoices, was introduced recently. The key idea is to confine models in a very low-dimensional linear vector space. This space summarizes a priori knowledge that we have about speaker models. In many practical systems, however, there is a mismatch between the conditions in which the training data were collected and test conditions: prior knowledge becomes improper. Furthermore, prior statistics or models of this mismatch may not be available. We expose two key results: first, we use a maximum-likelihood estimator of prior information in matched conditions, called MLES, leading to an improvement of adaptation by a relative %, and second, we show how we can apply a blind scheme for learning noise, MLLR, achieving an additional % relative improvement in noisy conditions. I.
Discriminative Cluster Adaptive Training
, 2004
"... Abstract—Multiple-cluster schemes, such as cluster adaptive training (CAT) or eigenvoice systems, are a popular approach for rapid speaker and environment adaptation. Interpolation weights are used to transform a multiple-cluster, canonical, model to a standard hidden Markov model (HMM) set represen ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Abstract—Multiple-cluster schemes, such as cluster adaptive training (CAT) or eigenvoice systems, are a popular approach for rapid speaker and environment adaptation. Interpolation weights are used to transform a multiple-cluster, canonical, model to a standard hidden Markov model (HMM) set representative of an individual speaker or acoustic environment. Maximum likelihood training for CAT has previously been investigated. However, in state-of-the-art large vocabulary continuous speech recognition systems, discriminative training is commonly employed. This paper investigates applying discriminative training to multiple-cluster systems. In particular, minimum phone error (MPE) update formulae for CAT systems are derived. In order to use MPE in this case, modifications to the standard MPE smoothing function and the prior distribution associated with MPE training are required. A more complex adaptive training scheme combining both interpolation weights and linear transforms, a structured transform (ST), is also discussed within the MPE training framework. Discriminatively trained CAT and ST systems were evaluated on a state-of-the-art conversational telephone speech task. These multiple-cluster systems were found to outperform both standard and adaptively trained systems. Index Terms—Cluster adaptive training (CAT), discriminative training, eigenvoices, minimum phone error (MPE), multiple-cluster HMM. I.
Adaptive Training for Large Vocabulary Continuous Speech Recognition
, 2006
"... Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational te ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational telephone speech. Hence, it typically has greater variability in terms of speaker and acoustic conditions than specially collected data. Thus, in addition to the desired speech variability required to discriminate between words, it also includes various non-speech variabil-ities, for example, the change of speakers or acoustic environments. The standard approach to handle this type of data is to train hidden Markov models (HMMs) on the whole data set as if all data comes from a single acoustic condition. This is referred to as multi-style training, for exam-ple speaker-independent training. Effectively, the non-speech variabilities are ignored. Though good performance has been obtained with multi-style systems, these systems account for all variabilities. Improvement may be obtained if the two types of variabilities in the found data are modelled separately. Adaptive training has been proposed for this purpose. In contrast to multi-style training, a set of transforms is used to represent the non-speech variabilities. A canonical
Multiple-Cluster Adaptive Training Schemes
- IN PROC. ICASSP
, 2001
"... This paper examines the training of multiple-cluster systems using adaptive training schemes. Various forms of transformation and canonical model are described in a consistent framework allowing re-estimation formulae for all cases to be simply derived. Initial experiments using these various scheme ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
This paper examines the training of multiple-cluster systems using adaptive training schemes. Various forms of transformation and canonical model are described in a consistent framework allowing re-estimation formulae for all cases to be simply derived. Initial experiments using these various schemes on a large vocabulary speech recognition task are presented. The initial experiments indicate that to achieve best performance when adapting these multiple-cluster systems requires the use of adaptive training schemes rather than using simpler cluster initialisation schemes.
Robust Automatic Speech Recognition With Unreliable Data
, 1999
"... Theoretical and practical issues of some of the problems in robust automatic speech recognition (ASR) and some of the techniques that address them are presented in this report. The problem of the robustness of the ASR in real--life (as opposed to laboratory) conditions is paramount to the widespread ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Theoretical and practical issues of some of the problems in robust automatic speech recognition (ASR) and some of the techniques that address them are presented in this report. The problem of the robustness of the ASR in real--life (as opposed to laboratory) conditions is paramount to the widespread deployment of speech enabled products. The report reviews techniques used so far for robust ASR, ranging from simple spectrum subtraction to various types of model adaptation. A possible connection of robust ASR with the computational auditory scene analysis (CASA), methods for local Signal--to--Noise Ratio (SNR) estimation and classification/scoring with on--line adapted statistical models is discussed. The main focus is on the techniques that would allow for incorporation of CASA and local SNR estimates (used as methods for speech/non--speech separation) into the present prevailing stochastic pattern matching paradigms -- Hidden Markov models (HMM) and artificial neural networks (ANN). Th...
A Gaussian Mixture Model Spectral Representation for Speech Recognition
"... Summary Most modern speech recognition systems use either Mel-frequency cepstral coefficients or per-ceptual linear prediction as acoustic features. Recently, there has been some interest in alter-native speech parameterisations based on using formant features. Formants are the resonant frequencies ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Summary Most modern speech recognition systems use either Mel-frequency cepstral coefficients or per-ceptual linear prediction as acoustic features. Recently, there has been some interest in alter-native speech parameterisations based on using formant features. Formants are the resonant frequencies in the vocal tract which form the characteristic shape of the speech spectrum. How-ever, formants are difficult to reliably and robustly estimate from the speech signal and in some cases may not be clearly present. Rather than estimating the resonant frequencies, formant-like features can be used instead. Formant-like features use the characteristics of the spectral peaks to represent the spectrum. In this work, novel features are developed based on estimating a Gaussian mixture model (GMM) from the speech spectrum. This approach has previously been used sucessfully as a speech codec. The EM algorithm is used to estimate the parameters of the GMM. The extracted parameters: the means, standard deviations and component weights can be related to the for-mant locations, bandwidths and magnitudes. As the features directly represent the linear spec-trum, it is possibly to apply techniques for vocal tract length normalisation and additive noise

