Results 1  10
of
20
Discriminative classifiers with adaptive kernels for noise robust speech recognition
 Comput. Speech Lang
, 2010
"... Discriminative classifiers are a popular approach to solving classification problems. However one of the problems with these approaches, in particular kernel based classifiers such as Support Vector Machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. Thi ..."
Abstract

Cited by 24 (18 self)
 Add to MetaCart
Discriminative classifiers are a popular approach to solving classification problems. However one of the problems with these approaches, in particular kernel based classifiers such as Support Vector Machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. This paper describes a scheme for overcoming this problem for speech recognition in noise by adapting the kernel rather than the SVM decision boundary. Generative kernels, defined using generative models, are one type of kernel that allows SVMs to handle sequence data. By compensating the parameters of the generative models for each noise condition noisespecific generative kernels can be obtained. These can be used to train a noiseindependent SVM on a range of noise conditions, which can then be used with a testset noise kernel for classification. The noisespecific kernels used in this paper are based on Vector Taylor Series (VTS) modelbased compensation. VTS allows all the model parameters to be compensated and the background noise to be estimated in a maximum likelihood fashion. A brief discussion of VTS, and the optimisation of the mismatch function representing the impact of noise on the clean speech, is also included. Experiments using these VTSbased testset noise kernels were run on the AURORA 2 continuous digit task. The proposed SVM rescoring scheme yields large gains in performance over the VTS compensated models. Key words: speech recognition, noise robustness, support vector machines, generative kernels
Discriminative models for speech recognition
 In Information Theory and Applications Workshop
, 1997
"... Abstract — The vast majority of automatic speech recognition systems use Hidden Markov Models (HMMs) as the underlying acoustic model. Initially these models were trained based on the maximum likelihood criterion. Significant performance gains have been obtained by using discriminative training crit ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
(Show Context)
Abstract — The vast majority of automatic speech recognition systems use Hidden Markov Models (HMMs) as the underlying acoustic model. Initially these models were trained based on the maximum likelihood criterion. Significant performance gains have been obtained by using discriminative training criteria, such as maximum mutual information and minimum phone error. However, the underlying acoustic model is still generative, with the associated constraints on the state and transition probability distributions, and classification is based on Bayes ’ decision rule. Recently, there has been interest in examining discriminative, or direct, models for speech recognition. This paper briefly reviews the forms of discriminative models that have been investigated. These include maximum entropy Markov models, hidden conditional random fields and conditional augmented models. The relationships between the various models and issues with applying them to large vocabulary continuous speech recognition will be discussed. I.
Structured log linear models for noise robust speech recognition
 Signal Processing Letters, IEEE
, 2010
"... [ The use of discriminative models for structured classification tasks, such as automatic speech recognition is becoming increasingly popular. The major contribution of this work is we proposed a large margin structured loglinear model for noise robust continuous ASR. 1 An important aspect of logl ..."
Abstract

Cited by 19 (10 self)
 Add to MetaCart
(Show Context)
[ The use of discriminative models for structured classification tasks, such as automatic speech recognition is becoming increasingly popular. The major contribution of this work is we proposed a large margin structured loglinear model for noise robust continuous ASR. 1 An important aspect of loglinear models is the form of the features. The features used in our structured log linear model are derived from generative kernels. This provides an elegant way of combining generative and discriminative models to handle timevarying data. Additionally, since the features are based on the generative models, modelbased compensation can be easily performed for noise robustness. Third, the designed joint feature space can be decomposed at the arc level. This allows efficient decoding and training with lattices, which is important for any larger vocabulary extensions. Previous work in this area is extended in two important directions. First, instead of using CML training which is commonly used for discriminative models, this paper describes efficient large margin training for sentencelevel log linear models based on lattices. Depending on the nature of the joint featurespace and labels, we have proved that this form of model is closely related to structured SVMs and Multiclass SVMs. Second, efficient latticebased classification of continuous data is also performed incorporating a joint feature space. This novel model combines generative kernels, discriminative models, efficient latticebased large margin training and modelbased noise compensation. It is evaluated on a noise corrupted continuous digit task: AURORA 2.0. Results on the AURORA 2 demonstrate that modelling the structure information yields significant improvements.]
Combining Derivative and Parametric Kernels for Speaker Verification
, 2007
"... Support Vector Machinebased speaker verification (SV) has become a standard approach in recent years. These systems typically use dynamic kernels to handle the dynamic nature of the speech utterances. This paper shows that many of these kernels fall into one of two general classes, derivative and p ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
Support Vector Machinebased speaker verification (SV) has become a standard approach in recent years. These systems typically use dynamic kernels to handle the dynamic nature of the speech utterances. This paper shows that many of these kernels fall into one of two general classes, derivative and parametric kernels. The attributes of these classes are contrasted and the conditions under which the two forms of kernel are identical are described. By avoiding these conditions gains may be obtained by combining derivative and parametric kernels. One combination strategy is to combine at the kernel level. This paper describes a maximummargin based scheme for learning kernel weights for the SV task. Various dynamic kernels and combinations were evaluated on the NIST 2002 SRE task, including derivative and parametric kernels based upon different model structures. The best overall performance was 7.78 % EER achieved when combining five kernels.
Derivative kernels for noise robust ASR
 in Proc. of ASRU’11, 2011
"... Abstract—Recently there has been interest in combined generative/discriminative classifiers. In these classifiers features for the discriminative models are derived from generative kernels. One advantage of using generative kernels is that systematic approaches exist how to introduce complex depende ..."
Abstract

Cited by 13 (7 self)
 Add to MetaCart
Abstract—Recently there has been interest in combined generative/discriminative classifiers. In these classifiers features for the discriminative models are derived from generative kernels. One advantage of using generative kernels is that systematic approaches exist how to introduce complex dependencies beyond conditional independence assumptions. Furthermore, by using generative kernels modelbased compensation/adaptation techniques can be applied to make discriminative models robust to noise/speaker conditions. This paper extends previous work with combined generative/discriminative classifiers in several directions. First, it introduces derivative kernels based on contextdependent generative models. Second, it describes how derivative kernels can be incorporated in continuous discriminative models. Third, it addresses the issues associated with large number of classes and parameters when contextdependent models and highdimensional features of derivative kernels are used. The approach is evaluated on two noisecorrupted tasks: small vocabulary AURORA 2 and mediumtolarge vocabulary AURORA 4 task. I.
Structured discriminative models for noise robust continuous speech recognition
 in Proc. ICASSP, Prague, Czech Repubic
, 2011
"... Recently there has been interest in structured discriminative models for speech recognition. In these models sentence posteriors are directly modelled, given a set of features extracted from the observation sequence, and hypothesised word sequence. In previous work these discriminative models have b ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
(Show Context)
Recently there has been interest in structured discriminative models for speech recognition. In these models sentence posteriors are directly modelled, given a set of features extracted from the observation sequence, and hypothesised word sequence. In previous work these discriminative models have been combined with features derived from generative models for noiserobust speech recognition for continuous digits. This paper extends this work to medium to large vocabulary tasks. The form of the scorespace extracted using the generative models, and parameter tying of the discriminative model, are both discussed. Update formulae for both conditional maximum likelihood and minimum Bayes ’ risk training are described. Experimental results are presented on small and medium to large vocabulary noisecorrupted speech recognition
Support vector machines for noise robust ASR
 In: Proc. ASRU
, 2009
"... Abstract—Using discriminative classifiers, such as Support Vector Machines (SVMs) in combination with, or as an alternative to, Hidden Markov Models (HMMs) has a number of advantages for difficult speech recognition tasks. For example, the models can make use of additional dependencies in the observ ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Abstract—Using discriminative classifiers, such as Support Vector Machines (SVMs) in combination with, or as an alternative to, Hidden Markov Models (HMMs) has a number of advantages for difficult speech recognition tasks. For example, the models can make use of additional dependencies in the observation sequences than HMMs provided the appropriate form of kernel is used. However standard SVMs are binary classifiers, and speech is a multiclass problem. Furthermore, to train SVMs to distinguish word pairs requires that each word appears in the training data. This paper examines both of these limitations. Treebased reduction approaches for multiclass classification are described, as well as some of the issues in applying them to dynamic data, such as speech. To address the training data issues, a simplified version of HMMbased synthesis can be used, which allows data for any wordpair to be generated. These approaches are evaluated on two noise corrupted digit sequence tasks: AURORA 2.0; and actual incar collected data. I.
Discriminative Classifiers with Generative Kernels for Noise Robust ASR
"... Discriminative classifiers are a popular approach to solving classification problems. However one of the problems with these approaches, in particular kernel based classifiers such as Support Vector Machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. Thi ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
(Show Context)
Discriminative classifiers are a popular approach to solving classification problems. However one of the problems with these approaches, in particular kernel based classifiers such as Support Vector Machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. This paper describes a scheme for overcoming this problem for speech recognition in noise. Generative kernels, defined using generative models, allow SVMs to handle sequence data. By compensating the generative models for the noise conditions noisespecific generative kernels can be obtained. These can be used to train a noiseindependent SVM on a range of noise conditions, which can then be used with a testset noise kernel for classification. Initial experiments using an idealised version of modelbased compensation were run on the AURORA 2.0 continuous digit task. The proposed scheme yielded large gains in performance over the compensated models.
EFFICIENT DECODING WITH GENERATIVE SCORESPACES USING THE EXPECTATION SEMIRING
"... Stateoftheart speech recognisers are usually based on hidden Markov models (HMMs). They model a hidden symbol sequence with a Markov process, with the observations independent given that sequence. These assumptions yield efficient algorithms, but limit the power of the model. An alternative model ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Stateoftheart speech recognisers are usually based on hidden Markov models (HMMs). They model a hidden symbol sequence with a Markov process, with the observations independent given that sequence. These assumptions yield efficient algorithms, but limit the power of the model. An alternative model that allows a wide range of features, including word and phonelevel features, is a loglinear model. To handle, for example, wordlevel variablelength features, the original feature vectors must be segmented into words. Thus, decoding must find the optimal combination of segmentation of the utterance into words and word sequence. Features must therefore be extracted for each possible segment of audio. For many types of features, this becomes slow. In this paper, longspan features are derived from the likelihoods of word HMMs. Derivatives of the loglikelihoods, which break the Markov assumption, are appended. Previously, decoding with this model took cubic time in the length of the sequence, and longer for higherorder derivatives. This paper shows how to decode in quadratic time. Index Terms — Speech recognition, loglinear models, weighted finitestate transducers, expectation semiring.
INFERENCE ALGORITHMS FOR GENERATIVE SCORESPACES
"... Using generative models, for example hidden Markov models (HMM), to derive features for a discriminative classifier has a number of advantages including the ability to make the features robust to speaker and noise changes. An interesting attribute of the derived features is that they may not have th ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
(Show Context)
Using generative models, for example hidden Markov models (HMM), to derive features for a discriminative classifier has a number of advantages including the ability to make the features robust to speaker and noise changes. An interesting attribute of the derived features is that they may not have the same conditional independence assumptions as the underlying generative models, which are typically firstorder Markovian. For efficiency these features are derived given a particular segmentation. This paper describes a general algorithm for obtaining the optimal segmentation with combined generative and discriminative models. Previous results, where the features were constrained to have firstorder Markovian dependencies, are extended to allow derivative features to be used which are nonMarkovian in nature. As an example, inference with zero and firstorder HMM scorespaces is considered. Experimental results are presented on a noisecorrupted continuous digit string recognition task: AURORA 2. Index Terms — Structured discriminative model, generative scorespace, inference 1.