Results 11 - 20
of
27
Towards Automatic Corpus Preparation For A German Broadcast News Transcription System
- Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing
, 2002
"... When setting up a speech recognition system for a new domain, a lot of manual effort is spent on corpus preparation, i.e., data acquisition, cutting and segmentation of the audio material, generation of pronunciation lexica, as well as the definition of suitable training and test sets. In this paper ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
When setting up a speech recognition system for a new domain, a lot of manual effort is spent on corpus preparation, i.e., data acquisition, cutting and segmentation of the audio material, generation of pronunciation lexica, as well as the definition of suitable training and test sets. In this paper we describe several methods that help to automate and thus to speed up this procedure. For this purpose, we assume that only a preliminary, partially incorrect textual transcription is available. The effectivity of the proposed methods is demonstrated with the development of a transcription system for the recognition of German broadcast news.
Discriminative Training with Tied Covariance Matrices
- Proc. of the 8th International Conference on Spoken Language Processing (ICSLP 2004), Jeju Island, Korea
, 2004
"... Discriminative training techniques have proved to be a powerful method for improving large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models. Typically, the optimization of discriminative objective functions is done using the extended Baum algorithm. Since for cont ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Discriminative training techniques have proved to be a powerful method for improving large vocabulary speech recognition systems based on Gaussian mixture hidden Markov models. Typically, the optimization of discriminative objective functions is done using the extended Baum algorithm. Since for continuous distributions no proof of fast and stable convergence is known up to now, parameter re-estimation depends on setting the iteration constants in the update rules heuristically, ensuring that the new variances are positive definite. In case of density specific variances this leads to a system of quadratic inequalities. However, if tied variances are used, the inequalities become more complicated and often the resulting constants are too large to be appropriate for discriminative training. In this paper we present an alternative approach to setting the iteration constants to alleviate this problem. First experimental results show that the new method leads to improved convergence speed and test set performance.
Noisy CMLLR for noise-robust speech recognition
"... Adaptive training is a widely used technique for building speech recognition systems on non-homogeneous training data. Recently there has been interest in applying these approaches for situations where there is significant levels of background noise. Various schemes for adaptive training are based o ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Adaptive training is a widely used technique for building speech recognition systems on non-homogeneous training data. Recently there has been interest in applying these approaches for situations where there is significant levels of background noise. Various schemes for adaptive training are based on noise, or speaker, specific transforms of the observed noise-corrupted speech to yield estimates of the clean speech. However when there are high levels of background noise, these clean speech estimates may be poor resulting in degradations in performance. In this work, a new approach for adaptive training on noise-corrupted training data is presented. It extends a popular form of linear transform for model-based adaptation and adaptive training, constrained MLLR (CMLLR), to reflect additional uncertainty from noise-corrupted observations. This new form of transform is called noisy CMLLR (NCM-LLR). NCMLLR uses a modified version of generative model between clean speech and noisy observation, similar to factor analysis (FA). However in contrast in FA here the generative model describes a transformation, rather than a covariance matrix structure. The use of NCMLLR for adaptation and adaptive training using an expectation-maximisation approach is described. Discriminative adaptive training with NCMLLR is also presented based on the minimum phone error criterion. Experiments are conducted on noise-corrupted version of Resource Management and in-car recorded digit data. In preliminary experiments this new approach achieves improvements in recognition performance over the standard approach in low signal-to-noise ratio conditions. In addition the need for adaptive training when there are a range of noise conditions in the training data is shown. 2 1
Discriminative training of Acoustic Models in a Segment-Based Speech Recognizer
, 2000
"... This thesis explores the use of discriminative training to improve acoustic modeling in a segment-based speech recognizer. In contrast with the more commonly used Maximum Likelihood training, discriminative training considers the likelihoods of competing classes when determining the parameters for a ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This thesis explores the use of discriminative training to improve acoustic modeling in a segment-based speech recognizer. In contrast with the more commonly used Maximum Likelihood training, discriminative training considers the likelihoods of competing classes when determining the parameters for a given class's model. Thus, discriminative training works directly to minimize the number of errors made in the recognition of the training data.
LANGUAGE IDENTIFICATION AND MULTILINGUAL SPEECH RECOGNITION USING DISCRIMINATIVELY TRAINED ACOUSTIC MODELS
"... We perform language identification experiments for four prominent South-African languages using a multilingual speech recognition system. Specifically, we show how successfully Afrikaans, English, Xhosa and Zulu may be identified using a single set of HMMs and a single recognition pass. We further d ..."
Abstract
- Add to MetaCart
We perform language identification experiments for four prominent South-African languages using a multilingual speech recognition system. Specifically, we show how successfully Afrikaans, English, Xhosa and Zulu may be identified using a single set of HMMs and a single recognition pass. We further demonstrate the effect of language identification-specific discriminative acoustic model training on both the per-language recognition accuracy as well as the accuracy of the language identification process. Experiments indicate that discriminative training leads to a small overall improvement in language identification accuracy while not affecting the speech recognition performance strongly. Furthermore, language identification is found to be more error prone and discriminative training less effective for code-mixed utterances, indicating that these may require special treatment within a multilingual speech recognition system. 1.
Optimizing Boosting with Discriminative Criteria
"... We describe the use of discriminative criteria to optimize Boosting based ensembles. Boosting algorithms may create hundreds of individual classifiers in order to fit the training data. However, this strategy isn’t feasible and necessary for complex classification problems, such as real-time continu ..."
Abstract
- Add to MetaCart
We describe the use of discriminative criteria to optimize Boosting based ensembles. Boosting algorithms may create hundreds of individual classifiers in order to fit the training data. However, this strategy isn’t feasible and necessary for complex classification problems, such as real-time continuous speech recognition, in which only the combination of a few of acoustic models is practical. How to improve the classification accuracy for small size of ensemble is the focus of this paper. Two discriminative criteria that attempt to minimize the true Bayes error rate are investigated. Improvements are observed over a variety of datasets including image and speech recognition, indicating the prospective utility of these two criteria. 1.
AUTOMATIC LANGUAGE IDENTIFICATION SYSTEM
"... This paper presents the language identification (LID) system developed in ..."
Abstract
- Add to MetaCart
This paper presents the language identification (LID) system developed in
Telephone Speech Recognition via the Combination of Knowledge Sources in a Segmental Speech Model
"... The currently dominant speech recognition methodology, Hidden Markov Modeling, treats speech as a stochastic random process with very simple mathematical properties. The simplistic assumptions of the model, and especially that of the independence of the observation vectors have been criticized by ma ..."
Abstract
- Add to MetaCart
The currently dominant speech recognition methodology, Hidden Markov Modeling, treats speech as a stochastic random process with very simple mathematical properties. The simplistic assumptions of the model, and especially that of the independence of the observation vectors have been criticized by many in the literature, and alternative solutions have been proposed. One such alternative is segmental modeling, and the OASIS recognizer we have been working on in the recent years belongs to this category. In this paper we go one step further and suggest that we should consider speech recognition as a knowledge source combination problem. We offer a generalized algorithmic framework for this approach and show that both hidden Markov and segmental modeling are a special case of this decoding scheme. In the second part of the paper we describe the current components of the OASIS system and evaluate its performance on a very difficult recognition task, the phonetically balanced sentences of the MTBA Hungarian Telephone Speech Database. Our results show that OASIS outperforms a traditional HMM system in phoneme classification and achieves practically the same recognition scores at the sentence level. 1
Kernel Methods for Text-Independent Speaker Verification
, 2010
"... Dissertation submitted to the University of Cambridge ..."

