• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Support vector machines for segmental minimum Bayes risk decoding of continuous speech,” ASRU (2003)

by V Venkataramani, W Byrne
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 16
Next 10 →

Augmented statistical models for speech recognition

by M. I. Layton, M. J. F. Gales - in Proc. ICASSP , 2006
"... Recently there has been significant interest in developing new acoustic models for speech recognition. One such model, that allows complex dependencies to be represented, is the augmented statistical model. This incorporates additional dependencies by constructing a local exponential expansion of a ..."
Abstract - Cited by 16 (9 self) - Add to MetaCart
Recently there has been significant interest in developing new acoustic models for speech recognition. One such model, that allows complex dependencies to be represented, is the augmented statistical model. This incorporates additional dependencies by constructing a local exponential expansion of a standard HMM. Unfortunately, the resulting model often has an intractable normalisation term, rendering training difficult for all but binary classification tasks. In this paper, conditional augmented (C-Aug) models are proposed as an attractive alternative. Instead of modelling utterance likelihoods and inferring decision boundaries, C-Aug models directly model the posterior probability of class labels, conditioned on the utterance. The resulting model is easy to normalise and can be trained using conditional maximum likelihood estimation. In addition, as a convex model, the optimisation converges to a global maximum. 1.

LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 Johns Hopkins Summer Workshop

by Mark Hasegawa-Johnson ,James Baker, Steven Greenberg, Katrin Kirchhoff, Jennifer Muller, Kemal Sönmez, Sarah Borys, Ken Chen, Amit Juneja, Karen Livescu, Srividya Mohan, Emily Coogan, Tianyu Wang , 2005
"... ..."
Abstract - Cited by 14 (1 self) - Add to MetaCart
Abstract not found

Discriminative classifiers with adaptive kernels for noise robust speech recognition

by M. J. F. Gales, F. Flego - Comput. Speech Lang , 2010
"... Discriminative classifiers are a popular approach to solving classification problems. However one of the problems with these approaches, in particular kernel based classifiers such as Support Vector Machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. Thi ..."
Abstract - Cited by 12 (10 self) - Add to MetaCart
Discriminative classifiers are a popular approach to solving classification problems. However one of the problems with these approaches, in particular kernel based classifiers such as Support Vector Machines (SVMs), is that they are hard to adapt to mismatches between the training and test data. This paper describes a scheme for overcoming this problem for speech recognition in noise by adapting the kernel rather than the SVM decision boundary. Generative kernels, defined using generative models, are one type of kernel that allows SVMs to handle sequence data. By compensating the parameters of the generative models for each noise condition noise-specific generative kernels can be obtained. These can be used to train a noiseindependent SVM on a range of noise conditions, which can then be used with a test-set noise kernel for classification. The noise-specific kernels used in this paper are based on Vector Taylor Series (VTS) model-based compensation. VTS allows all the model parameters to be compensated and the background noise to be estimated in a maximum likelihood fashion. A brief discussion of VTS, and the optimisation of the mismatch function representing the impact of noise on the clean speech, is also included. Experiments using these VTS-based test-set noise kernels were run on the AURORA 2 continuous digit task. The proposed SVM rescoring scheme yields large gains in performance over the VTS compensated models. Key words: speech recognition, noise robustness, support vector machines, generative kernels

Minimum Bayes risk estimation and decoding in large vocabulary continuous speech recognition. In: Minimum Bayes risk estimation and decoding in large vocabulary continuous speech recognition

by William Byrne , 2006
"... Minimum risk estimation and decoding strategies based on lattice segmentation techniques can be used to refine large vocabulary continuous speech recognition systems through the estimation of the parameters of the underlying hidden Mark models and through the identification of smaller recognition ta ..."
Abstract - Cited by 8 (1 self) - Add to MetaCart
Minimum risk estimation and decoding strategies based on lattice segmentation techniques can be used to refine large vocabulary continuous speech recognition systems through the estimation of the parameters of the underlying hidden Mark models and through the identification of smaller recognition tasks which provides the opportunity to incorporate novel modeling and decoding procedures in LVCSR. These techniques are discussed in the context of going ‘beyond HMMs’. 1.

SVMs, score-spaces and maximum margin statistical models

by M. J. F. Gales, M. Layton - in Beyond HMM workshop, ATR , 2004
"... There has been significant interest in developing new forms of acoustic model, in particular models which allow additional dependencies to be represented than allowed within a standard hidden Markov model (HMM). This paper discusses one such class of models, augmented statistical models. Here a loca ..."
Abstract - Cited by 7 (5 self) - Add to MetaCart
There has been significant interest in developing new forms of acoustic model, in particular models which allow additional dependencies to be represented than allowed within a standard hidden Markov model (HMM). This paper discusses one such class of models, augmented statistical models. Here a locally exponential approximation is made about some point on a base distribution. This allows additional dependencies within the data to be modelled than are represented in the base distribution. Augmented models based on Gaussian mixture models (GMMs) and HMMs are briefly described. These augmented models are then related to generative kernels, one approach used for allowing support vector machines (SVMs) to be applied to variable length data. The training of augmented statistical models within an SVM, generative kernel, framework is then discussed. This may be viewed as using maximum margin training to estimate statistical models. Augmented Gaussian mixture models are then evaluated using rescoring on a large vocabulary speech recognition task. 1.

Acoustic modelling using continuous rational kernels

by M. I. Layton, M. J. F. Gales - in MLSP , 2005
"... There has been significant interest in developing alternatives to hidden Markov models (HMMs) for speech recognition. In particular, interest has been focused upon models that allow additional dependencies to be incorporated. One such model is the Augmented Statistical Model. Here a local exponentia ..."
Abstract - Cited by 7 (4 self) - Add to MetaCart
There has been significant interest in developing alternatives to hidden Markov models (HMMs) for speech recognition. In particular, interest has been focused upon models that allow additional dependencies to be incorporated. One such model is the Augmented Statistical Model. Here a local exponential approximation, based upon derivatives of a base distribution, is made about some distribution of the base model. Augmented statistical models can be trained using a maximum margin criterion, which may be implemented using an SVM with a generative kernel. Calculating derivatives of the base distribution, in particular higherorder derivatives, to form the generative kernel requires complex dynamic programming algorithms. In this paper a new form of rational kernel, a continuous rational kernel is proposed. This allows elements of the generative kernel, including those based on higherorder derivatives, to be computed using standard forms of transducer within a rational kernel framework. In addition, the derivatives are shown to be a principled method of defining marginalised kernels. Continuous rational kernels are evaluated using a large vocabulary continuous speech recognition (LVCSR) task. 1.

Augmented Statistical Models for Classifying Sequence Data

by Martin Layton , 2006
"... Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing that is the outcome of work done in collaboration. It has not been submitted in whole or in part for a degree at any other university. Some of the work has been published previously in conference proceedings [66,69], two journal articles [36,68], two workshop papers [35,67] and a tech-nical report [65]. The length of this thesis including appendices, bibliography, footnotes, tables and equations is approximately 60,000 words. This thesis contains 27 figures and 20 tables. i

Discriminative models for speech recognition

by M. J. F. Gales - In Information Theory and Applications Workshop , 1997
"... Abstract — The vast majority of automatic speech recognition systems use Hidden Markov Models (HMMs) as the underlying acoustic model. Initially these models were trained based on the maximum likelihood criterion. Significant performance gains have been obtained by using discriminative training crit ..."
Abstract - Cited by 6 (1 self) - Add to MetaCart
Abstract — The vast majority of automatic speech recognition systems use Hidden Markov Models (HMMs) as the underlying acoustic model. Initially these models were trained based on the maximum likelihood criterion. Significant performance gains have been obtained by using discriminative training criteria, such as maximum mutual information and minimum phone error. However, the underlying acoustic model is still generative, with the associated constraints on the state and transition probability distributions, and classification is based on Bayes ’ decision rule. Recently, there has been interest in examining discriminative, or direct, models for speech recognition. This paper briefly reviews the forms of discriminative models that have been investigated. These include maximum entropy Markov models, hidden conditional random fields and conditional augmented models. The relationships between the various models and issues with applying them to large vocabulary continuous speech recognition will be discussed. I.

Lattice segmentation and support vector machines for large vocabulary continuous speech recognition

by Veera Venkataramani, William Byrne - In: Proc. ICASSP , 2005
"... Lattice segmentation procedures are used to spot possible recognition errors in first-pass recognition hypotheses produced by a large vocabulary continuous speech recognition system. This approach is analyzed in terms of its ability to reliably identify, and provide good alternatives for, incorrectl ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
Lattice segmentation procedures are used to spot possible recognition errors in first-pass recognition hypotheses produced by a large vocabulary continuous speech recognition system. This approach is analyzed in terms of its ability to reliably identify, and provide good alternatives for, incorrectly hypothesized words. A procedure is described to train and apply Support Vector Machines to strengthen the first pass system where it was found to be weak, resulting in small but statistically significant recognition improvements on a large test set of conversational speech. 1.

Recognition of dialogue acts in multiparty meetings using a switching DBN

by Alfred Dielmann, Steve Renals - IEEE Trans. Audio, Speech, and Language Processing , 2008
"... Abstract—This paper is concerned with the automatic recognition of Dialogue Acts (DAs) in multiparty conversational speech. We present a joint generative model for DA recognition in which segmentation and classification of DAs are carried out in parallel. Our approach to DA recognition is based on a ..."
Abstract - Cited by 4 (2 self) - Add to MetaCart
Abstract—This paper is concerned with the automatic recognition of Dialogue Acts (DAs) in multiparty conversational speech. We present a joint generative model for DA recognition in which segmentation and classification of DAs are carried out in parallel. Our approach to DA recognition is based on a switching dynamic Bayesian network (DBN) architecture. This generative approach models a set of features, related to lexical content and prosody, and incorporates a weighted interpolated factored language model. The switching DBN coordinates the recognition process by integrating the component models. The factored language model, which is estimated from multiple conversational data corpora, is used in conjunction with additional task specific language models. In conjunction with this joint generative model, we have also investigated the use of a discriminative approach, based on conditional random fields, to perform a reclassification of the segmented DAs. We have carried out experiments on the AMI corpus of multimodal meeting recordings, using both manually transcribed speech, and the output of an automatic speech recogniser, and using different configurations of the generative model. Our results indicate that the system performs well both on reference and fully automatic transcriptions. A further significant improvement in recognition accuracy is obtained by the application of the discriminative reranking approach based on conditional random fields.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University