Results 11 - 20
of
50
Writer adaptation techniques in HMM based Off-Line Cursive Script Recognition
- PATTERN RECOGNITION LETTERS
, 2002
"... This work presents the application of HMM adaptation techniques to the problem of Off-Line Cursive Script Recognition. Instead of training a new model for each writer, one first creates a unique model with a mixed database and then adapts it for each different writer using his own small dataset. Exp ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
This work presents the application of HMM adaptation techniques to the problem of Off-Line Cursive Script Recognition. Instead of training a new model for each writer, one first creates a unique model with a mixed database and then adapts it for each different writer using his own small dataset. Experiments on a publicly available benchmark database show that an adapted system has an accuracy higher than 80% even when less than 30 word samples are used during adaptation, while a system trained using the data of the single writer only needs at least 200 words in order to achieve the same performance as the adapted models.
Robust Speech Recognition for Multiple Topological Scenarios of the GSM Mobile Phone System
, 1998
"... This paper deals with robust speech recognition in the GSM mobile environment. Our focus is on the voice degradation due to the losses in the GSM coding scheme. Thus, we initially propose an experimental framework of network topologies that consists of various coding-decoding systems placed in tande ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper deals with robust speech recognition in the GSM mobile environment. Our focus is on the voice degradation due to the losses in the GSM coding scheme. Thus, we initially propose an experimental framework of network topologies that consists of various coding-decoding systems placed in tandem. After measuring the recognition performance for each of these network scenarios, we try to increase recognition accuracy by using feature compensation and model adaptation algorithms. We first compare the different methods for all the network topologies assuming the topology is known. We then investigate the more realistic case, in which we don't know the network topology the voice has passed through. The results show that robustness can be achieved even in this case.
Speaker Normalization and Speaker Adaptation - a Combination for Conversational Speech Recognition
- Proceedings of Eurospeech Conference
, 1997
"... Speaker normalization and speaker adaptation are two strategies to tackle the variations from speaker, channel, and environment. The vocal tract length normalization (VTLN) is an e ective speaker normalization approach to compensate for the variations of vocal tract shapes. The Maximum Likelihood Li ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Speaker normalization and speaker adaptation are two strategies to tackle the variations from speaker, channel, and environment. The vocal tract length normalization (VTLN) is an e ective speaker normalization approach to compensate for the variations of vocal tract shapes. The Maximum Likelihood Linear Regression(MLLR) is a recent proposed method for speaker-adaptation. In this paper, we propose a speaker-speci c Bark scale VTLN method, investigate the combination of the VTLN with MLLR, and present an iterative procedure for decoding the combined system of VTLN and MLLR. The results show that: (1) the new VTLN method is very e ective with which the word error rate can be reduced up to 11%; (2) the combination of VTLN and MLLR can provide up to 15 % word error reduction; (3) both VTLN and MLLR are more e ective for the push-to-talk data than for the cross-talk data. 1
Nonparallel training for voice conversion based on a parameter adaptation approach
- IEEE Trans. Audio, Speech and Language Processing
, 2006
"... permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotiona ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to
Unsupervised discriminative adaptation using discriminative mapping transforms
- IN PROC. ICASSP, LAS VEGAS, NV
, 2008
"... The most commonly used approaches to speaker adaptation are based on linear transforms, as these can be robustly estimated using limited adaptation data. Although significant gains can be obtained using discriminative criteria for training acoustic models, maximum likelihood (ML) estimated transform ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
The most commonly used approaches to speaker adaptation are based on linear transforms, as these can be robustly estimated using limited adaptation data. Although significant gains can be obtained using discriminative criteria for training acoustic models, maximum likelihood (ML) estimated transforms are used for unsupervised adaptation. This is because discriminatively trained transforms are highly sensitive to errors in the adaptation hypothesis. This paper describes a new framework for estimating transforms that are discriminative in nature, but are less sensitive to this hypothesis issue. A discriminative, speaker-independent, mapping transformation is estimated during training. This transform is obtained after a speaker-specific ML-estimated transform has been applied. During recognition an ML speaker-specific transform is found and the speaker-independent discriminative mapping transform then applied. This allows a transform which is discriminative in nature to be indirectly estimated, whilst only requiring an ML speaker-specific transform to be found during recognition. The scheme is evaluated on an English conversational telephone speech task, where it significantly outperforms both standard ML and discriminatively trained transforms.
Use of Speech Recognition in Computer-assisted Language Learning
, 1999
"... inear Model Combination and Model Merging. These algorithms are based on the assumption that the mother-tongue of a non-native speaker is known. The basic idea underlying most ndings of this thesis is that non-native speech can be modeled with a mixture of sounds of a speaker's native language and t ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
inear Model Combination and Model Merging. These algorithms are based on the assumption that the mother-tongue of a non-native speaker is known. The basic idea underlying most ndings of this thesis is that non-native speech can be modeled with a mixture of sounds of a speaker's native language and the target language. The newly developed speaker adaptation algorithms combine the acoustic models of the source and target language of a nonnative speaker. The algorithms only dier with regard to the details how the model sets are combined. A database of non-native English was recorded for the purpose of testing these adaptation algorithms. This database mostly consists of utterances of Japanese and Latin-American Spanish accented English. The recordings were transcribed by trained phoneticians to obtain transcriptions corresponding to the actual phoneme sequence uttered by the student as opposed to canonical transcriptions obtained ii iii from a standard
Techniques for modelling Phonological Processes in Automatic Speech Recognition
, 2001
"... Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does not exceed 29,500 words and includes no more than 40 figures. 1 Systems which automatically transcribe carefully dictated speech are now commercially available, but their performance degrades dramatically when the speaking style of users becomes more relaxed or conversational. This dissertation focuses on techniques that aim to improve the robustness of statistical speech transcription systems to conversational speaking styles. The dissertation shows first that the performance degradation occuring as speech becomes more conversational is severe and is partially attributable to differences in the acoustic realizations of sentences. Hypothesizing that the quantifiably wider range of
Adaptive Training for Large Vocabulary Continuous Speech Recognition
, 2006
"... Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational te ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Summary In recent years, there has been a trend towards training large vocabulary continuous speech recognition (LVCSR) systems on a large amount of found data. Found data is recorded from spontaneous speech without careful control of the recording acoustic conditions, for example, conversational telephone speech. Hence, it typically has greater variability in terms of speaker and acoustic conditions than specially collected data. Thus, in addition to the desired speech variability required to discriminate between words, it also includes various non-speech variabil-ities, for example, the change of speakers or acoustic environments. The standard approach to handle this type of data is to train hidden Markov models (HMMs) on the whole data set as if all data comes from a single acoustic condition. This is referred to as multi-style training, for exam-ple speaker-independent training. Effectively, the non-speech variabilities are ignored. Though good performance has been obtained with multi-style systems, these systems account for all variabilities. Improvement may be obtained if the two types of variabilities in the found data are modelled separately. Adaptive training has been proposed for this purpose. In contrast to multi-style training, a set of transforms is used to represent the non-speech variabilities. A canonical
Online Bayesian tree-structured transformation of HMMs with optimal model selection for speaker adaptation
- IEEE Trans. Speech and Audio Proc
, 2001
"... Abstract—This paper presents a new recursive Bayesian learning approach for transformation parameter estimation in speaker adaptation. Our goal is to incrementally transform or adapt a set of hidden Markov model (HMM) parameters for a new speaker and gain large performance improvement from a small a ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Abstract—This paper presents a new recursive Bayesian learning approach for transformation parameter estimation in speaker adaptation. Our goal is to incrementally transform or adapt a set of hidden Markov model (HMM) parameters for a new speaker and gain large performance improvement from a small amount of adaptation data. By constructing a clustering tree of HMM Gaussian mixture components, the linear regression (LR) or affine transformation parameters for HMM Gaussian mixture components are dynamically searched. An online Bayesian learning technique is proposed for recursive maximum a posteriori (MAP) estimation of LR and affine transformation parameters. This technique has the advantages of being able to accommodate flexible forms of transformation functions as well as a priori probability density functions (pdfs). To balance between model complexity and goodness of fit to adaptation data, a dynamic programming algorithm is developed for selecting models using a Bayesian variant of the “minimum description length ” (MDL) principle. Speaker adaptation experiments with a 26-letter English alphabet vocabulary were conducted, and the results confirmed effectiveness of the online learning framework. Index Terms—Affine transformation, Bayesian model selection, hidden Markov models (HMMs), linear regression (LR), model
Covariance Modelling for Noise-Robust Speech Recognition
"... Model compensation is a standard way of improving speech recognisers’ robustness to noise. Most model compensation techniques produce diagonal covariances. However, this fails to handle any changes in the feature correlations due to the noise. This paper presents a scheme that allows full-covariance ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
Model compensation is a standard way of improving speech recognisers’ robustness to noise. Most model compensation techniques produce diagonal covariances. However, this fails to handle any changes in the feature correlations due to the noise. This paper presents a scheme that allows full-covariance matrices to be estimated. One problem is that full covariance matrix estimation will be more sensitive approximations, those for the dynamic parameters are known to crude. In this paper a linear transformation of a window of consecutive frames is used as the basis for dynamic parameter compensation. A second problem is that the resulting full covariance matrices slow down decoding. This is addressed by using predictive linear transforms that decorrelate the feature space, so that the decoder can then use diagonal covariance matrices. On a noise-corrupted Resource Management task, the proposed scheme outperformed the standard VTS compensation scheme.

