Results 11  20
of
20
Code Breaking for Automatic Speech Recognition
"... Code Breaking is a divide and conquer approach for sequential pattern recognition tasks where we identify weaknesses of an existing system and then use specialized decoders to strengthen the overall system. We study the technique in the context of Automatic Speech Recogniton. Using the lattice cutti ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Code Breaking is a divide and conquer approach for sequential pattern recognition tasks where we identify weaknesses of an existing system and then use specialized decoders to strengthen the overall system. We study the technique in the context of Automatic Speech Recogniton. Using the lattice cutting algorithm, we first analyze lattices generated by a stateoftheart speech recognizer to spot possible errors in its firstpass hypothesis. We then train specialized decoders for each of these problems and apply them to refine the firstpass hypothesis. We study the use of Support Vector Machines (SVMs) as discriminative models over each of these problems. The estimation of a posterior distribution over hypothesis in these regions of acoustic confusion is posed as a logistic regression problem. GiniSVMs, a variant of SVMs, can be used as an approximation technique to estimate the parameters of the logistic regression problem. We first validate our approach on a small vocabulary recognition task, namely, alphadigits. We show that the use of GiniSVMs can substantially improve the performance of a well trained MMIHMM system. We also find that it is possible to derive reliable confidence scores over the GiniSVM hypotheses and that these can be used to good effect in hypothesis combination. We will then analyze lattice cutting in terms of its ability to reliably identify, and provide good alternatives for incorrectly hypothesized words in the Czech MALACH domain, a large vocabulary task. We describe a procedure to train and apply SVMs to strengthen the first pass system, resulting in small but statistically significant recognition improvements. We conclude with a discussion of methods including clustering for obtaining further improvements on large vocabulary tasks.
Sparse Bayesian Methods for Continuous Speech Recognition
, 2002
"... The prominent modeling technique for speech recognition today is the hidden Markov model with Gaussian emission densities. However, they suffer from an inability to learn discriminative information. Artificial neural networks have been proposed as a replacement for the Gaussian emission probabilitie ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
The prominent modeling technique for speech recognition today is the hidden Markov model with Gaussian emission densities. However, they suffer from an inability to learn discriminative information. Artificial neural networks have been proposed as a replacement for the Gaussian emission probabilities under the belief that the ANN models provide better discrimination capabilities. However, the use of ANNs often results in overparameterized models which are prone to overfitting. Techniques such as crossvalidation have been suggested as remedies to the overfitting problem but employing these is wasteful of both resources and computation. Further, crossvalidation does not address the issue of model structure and overparameterization. Recent work on machine learning has moved toward automatic methods for controlling generalization and parameterization. A model that has gained much popularity recently is the support vector machine (SVM). SVMs use the principle of structural risk
Free energy scorespace
"... A score function induced by a generative model of the data can provide a feature vector of a fixed dimension for each data sample. Data samples themselves may be of differing lengths (e.g., speech segments, or other sequence data), but as a score function is based on the properties of the data gener ..."
Abstract
 Add to MetaCart
(Show Context)
A score function induced by a generative model of the data can provide a feature vector of a fixed dimension for each data sample. Data samples themselves may be of differing lengths (e.g., speech segments, or other sequence data), but as a score function is based on the properties of the data generation process, it produces a fixedlength vector in a highly informative space, typically referred to as a “score space”. Discriminative classifiers have been shown to achieve higher performance in appropriately chosen score spaces than is achievable by either the corresponding generative likelihoodbased classifiers, or the discriminative classifiers using standard feature extractors. In this paper, we present a novel score space that exploits the free energy associated with a generative model. The resulting free energy score space (FESS) takes into account latent structure of the data at various levels, and can be trivially shown to lead to classification performance that at least matches the performance of the free energy classifier based on the same generative model, and the same factorization of the posterior. We also show that in several typical vision and computational biology applications the classifiers optimized in FESS outperform the corresponding pure generative approaches, as well as a number of previous approaches to combining discriminating and generative models.
Neurocomputing 101 (2013) 161–169 Contents lists available at SciVerse ScienceDirect
"... journal homepage: www.elsevier.com/locate/neucom Combining information theoretic kernels with generative embeddings ..."
Abstract
 Add to MetaCart
(Show Context)
journal homepage: www.elsevier.com/locate/neucom Combining information theoretic kernels with generative embeddings
By
, 2002
"... The prominent modeling technique for speech recognition today is the hidden Markov model with Gaussian emission densities. However, they suffer from an inability to learn discriminative information. Artificial neural networks have been proposed as a replacement the Gaussian emission probabilities un ..."
Abstract
 Add to MetaCart
(Show Context)
The prominent modeling technique for speech recognition today is the hidden Markov model with Gaussian emission densities. However, they suffer from an inability to learn discriminative information. Artificial neural networks have been proposed as a replacement the Gaussian emission probabilities under the belief that the ANN models provide better discrimination capabilities. However, the use of ANNs often results in overparameterized models which are prone to overfitting. Techniques such as crossvalidation have been suggested as remedies to the overfitting problem but employing these is wasteful of both resources and computation. Further, crossvalidation does not address the issue of model structure and overparameterization. Recent work on machine learning has moved toward automatic methods for controlling generalization and parameterization. A model that has gained much popularity recently is the support vector machine (SVM). SVMs use the principle of structural risk
www.elsevier.com/locate/patcog Face recognition based onmulticlassmapping of Fisher scores
, 2004
"... A new hidden Markov model (HMM) based feature generation scheme is proposed for face recognition (FR) in this paper. In this scheme, HMM method is used to model classes of face images. A set of Fisher scores is calculated through partial derivative analysis of the parameters estimated in each HMM. T ..."
Abstract
 Add to MetaCart
(Show Context)
A new hidden Markov model (HMM) based feature generation scheme is proposed for face recognition (FR) in this paper. In this scheme, HMM method is used to model classes of face images. A set of Fisher scores is calculated through partial derivative analysis of the parameters estimated in each HMM. These Fisher scores are further combined with some traditional features such as loglikelihood and appearance based features to form feature vectors that exploit the strengths of both local and holistic features of human face. Linear discriminant analysis (LDA) is then applied to analyze these feature vectors for FR. Performance improvements are observed over standalone HMM method and Fisher face method which uses appearance based feature vectors. A further study reveals that, by reducing the number of models involved in the training and testing stages of LDA, the proposed feature generation scheme can maintain very high discriminative power at much lower computational complexity comparing to the traditional HMM based FR system. Experimental results on a public available face database are provided to demonstrate the viability of this scheme.
By
, 2002
"... The prominent modeling technique for speech recognition today is the hidden Markov model with Gaussian emission densities. However, they suffer from an inability to learn discriminative information. Artificial neural networks have been proposed as a replacement for the Gaussian emission probabilitie ..."
Abstract
 Add to MetaCart
(Show Context)
The prominent modeling technique for speech recognition today is the hidden Markov model with Gaussian emission densities. However, they suffer from an inability to learn discriminative information. Artificial neural networks have been proposed as a replacement for the Gaussian emission probabilities under the belief that the ANN models provide better discrimination capabilities. However, the use of ANNs often results in overparameterized models which are prone to overfitting. Techniques such as crossvalidation have been suggested as remedies to the overfitting problem but employing these is wasteful of both resources and computation. Further, crossvalidation does not address the issue of model structure and overparameterization. Recent work on machine learning has moved toward automatic methods for controlling generalization and parameterization. A model that has gained much popularity recently is the support vector machine (SVM). SVMs use the principle of structural risk
Journal of VLSI Signal Processing manuscript No. (will be inserted by the editor) Acoustic Modelling using Continuous Rational Kernels
"... Abstract Many discriminative classification algorithms are designed for tasks where samples can be represented by fixedlength vectors. However, many examples in the fields of text processing, computational biology and speech recognition are best represented as variablelength sequences of vectors. ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Many discriminative classification algorithms are designed for tasks where samples can be represented by fixedlength vectors. However, many examples in the fields of text processing, computational biology and speech recognition are best represented as variablelength sequences of vectors. Although several dynamic kernels have been proposed for mapping sequences of discrete observations into fixeddimensional featurespaces, few kernels exist for sequences of continuous observations. This paper introduces continuous rational kernels, an extension of standard rational kernels, as a general framework for classifying sequences of continuous observations. In addition to allowing new taskdependent kernels to be defined, continuous rational kernels allow existing continuous dynamic kernels, such as Fisher and generative kernels, to be calculated using standard weighted finitestate transducer algorithms. Preliminary results on both a large vocabulary continuous speech recognition (LVCSR) task and the TIMIT database are presented. 1
A Multiclass Classification Strategy for Fisher Scores: Application to Signer Independent Sign Language Recognition
"... Fisher kernels combine the powers of discriminative and generative classifiers by mapping the variablelength sequences to a new fixed length feature space, called the Fisher score space. The mapping is based on a single generative model and the classifier is intrinsically binary. We propose a strat ..."
Abstract
 Add to MetaCart
(Show Context)
Fisher kernels combine the powers of discriminative and generative classifiers by mapping the variablelength sequences to a new fixed length feature space, called the Fisher score space. The mapping is based on a single generative model and the classifier is intrinsically binary. We propose a strategy that applies a multiclass classification on each Fisher score space and combines the decisions of multiclass classifiers. We experimentally show that the Fisher scores of one class provide discriminative information for the other classes as well. We compare several multiclass classification strategies for Fisher scores generated from the Hidden Markov Models (HMMs) of sign sequences. The proposed multiclass classification strategy increases the classification accuracy in comparison with the state of the art strategies based on combining binary classifiers. To reduce the computational complexity of the Fisher score extraction and the training phases, we also propose a score space selection method and show that, similar or even higher accuracies can be obtained by using only a subset of the score spaces. Based on the proposed score space selection method, a signer adaptation technique is also presented that does not require any retraining.