Results 11 - 20
of
83
Robust sequential data modeling using an outlier tolerant hidden markov model.
- IEEE Transactions on Pattern Analysis and Machine Intelligence,
, 2009
"... ..."
(Show Context)
Machine Learning Paradigms for Speech Recognition: An Overview
, 2013
"... Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, structured sequence learning, Bayesian learning, and adaptive learning. Moreover, ML can and occasional ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, structured sequence learning, Bayesian learning, and adaptive learning. Moreover, ML can and occasionally does use ASR as a large-scale, realistic application to rigorously test the effectiveness of a given technique, and to inspire new problems arising from the inherently sequential and dynamic nature of speech. On the other hand, even though ASR is available commercially for some applications, it is largely an unsolved problem—for almost all applications, the performance of ASR is not on par with human performance. New insight from modern ML methodology shows great promise to advance the state-of-the-art in ASR technology. This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems. The intent is to foster further cross-pollination between the ML and ASR communities than has occurred in the past. The article is organized according to the major ML paradigms that are either popular already or have potential for making significant contributions to ASR technology. The paradigms presented and elaborated in this overview include: generative and discriminative learning; supervised, unsupervised, semi-supervised, and active learning; adaptive and multi-task learning; and Bayesian learning. These learning paradigms are motivated and discussed in the context of ASR technology and applications. We finally present and analyze recent developments of deep learning and learning with sparse representations, focusing on their direct relevance to advancing ASR technology.
Partially Observed Maximum Entropy Discrimination Markov Networks
"... Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihood-based methods have been extensively explored, to our knowledge, le ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
(Show Context)
Learning graphical models with hidden variables can offer semantic insights to complex data and lead to salient structured predictors without relying on expensive, sometime unattainable fully annotated training data. While likelihood-based methods have been extensively explored, to our knowledge, learning structured prediction models with latent variables based on the max-margin principle remains largely an open problem. In this paper, we present a partially observed Maximum Entropy Discrimination Markov Network (PoMEN) model that attempts to combine the advantages of Bayesian and margin based paradigms for learning Markov networks from partially labeled data. PoMEN leads to an averaging prediction rule that resembles a Bayes predictor that is more robust to overfitting, but is also built on the desirable discriminative laws resemble those of the M 3 N. We develop an EM-style algorithm utilizing existing convex optimization algorithms for M 3 N as a subroutine. We demonstrate competent performance of PoMEN over existing methods on a real-world web data extraction task. 1
A fast online algorithm for large margin training of continuous density hidden markov models
- in Proceedings of Interspeech-2009
, 2009
"... We propose an online learning algorithm for large margin training of continuous density hidden Markov models. The online algorithm updates the model parameters incrementally after the decoding of each training utterance. For large margin training, the algorithm attempts to separate the log-likelihoo ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
We propose an online learning algorithm for large margin training of continuous density hidden Markov models. The online algorithm updates the model parameters incrementally after the decoding of each training utterance. For large margin training, the algorithm attempts to separate the log-likelihoods of correct and incorrect transcriptions by an amount proportional to their Hamming distance. We evaluate this approach to hidden Markov modeling on the TIMIT speech database. We find that the algorithm yields significantly lower phone error rates than other approaches—both online and batch—that do not attempt to enforce a large margin. We also find that the algorithm converges much more quickly than analogous batch optimizations for large margin training. Index Terms: hidden Markov models, online learning, large margin classification, discriminative training, automatic speech recognition 1.
Maximum Mutual Information Estimation with Unlabeled Data for Phonetic Classification
"... This paper proposes a new training framework for mixed labeled and unlabeled data and evaluates it on the task of binary phonetic classification. Our training objective function combines Maximum Mutual Information (MMI) for labeled data and Maximum Likelihood (ML) for unlabeled data. Through the mod ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
This paper proposes a new training framework for mixed labeled and unlabeled data and evaluates it on the task of binary phonetic classification. Our training objective function combines Maximum Mutual Information (MMI) for labeled data and Maximum Likelihood (ML) for unlabeled data. Through the modified training objective, MMI estimates are smoothed with ML estimates obtained from unlabeled data. On the other hand, our training criterion can also help the existing model adapt to new speech characteristics from unlabeled speech. In our experiments of phonetic classification, there is a consistent reduction of error rate from MLE to MMIE with I-smoothing, and then to MMIE with unlabeled-smoothing. Error rates can be further reduced by transductive-MMIE. We also experimented with the gender-mismatched case, in which the best improvement shows MMIE with unlabeled data has a 9.3 % absolute lower error rate than MLE and a 2.35 % absolute lower error rate than MMIE with I-smoothing. Index Terms: unlabeled speech, Maximum mutual information, Gaussian mixture models
RATIO SEMI-DEFINITE CLASSIFIERS
"... We present a novel classification model that is formulated as a ratio of semi-definite polynomials. We derive an efficient learning algorithm for this classifier, and apply it to two separate phoneme classification corpora. Results show that our disciminatively trained model can achieve accuracies c ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
(Show Context)
We present a novel classification model that is formulated as a ratio of semi-definite polynomials. We derive an efficient learning algorithm for this classifier, and apply it to two separate phoneme classification corpora. Results show that our disciminatively trained model can achieve accuracies comparable with state-of-the-art techniques such as multi-layer perceptrons, but does not posses the overconfident bias often found in models based on ratios of exponentials. Index Terms — Pattern recognition, Speech recognition 1.
Large Margin Taxonomy Embedding with an Application to Document Categorization
"... Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings. Recent work has significantly improved the state of the art by moving beyond “flat ” classification through incorporation of class hierarchies [4]. We present a novel algorithm that ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings. Recent work has significantly improved the state of the art by moving beyond “flat ” classification through incorporation of class hierarchies [4]. We present a novel algorithm that goes beyond hierarchical classification and estimates the latent semantic space that underlies the class hierarchy. In this space, each class is represented by a prototype and classification is done with the simple nearest neighbor rule. The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other. We show that our optimization is convex and can be solved efficiently for large data sets. Experiments on the OHSUMED medical journal data base yield state-of-the-art results on topic categorization. 1
Context-sensitive Dynamic Ordinal Regression for Intensity Estimation of Facial Action Units
- IEE Trans Pattern Anal. Mach. Intell
"... Abstract-Modeling intensity of facial action units from spontaneously displayed facial expressions is challenging mainly because of high variability in subject-specific facial expressiveness, head-movements, illumination changes, etc. These factors make the target problem highly context-sensitive. ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
(Show Context)
Abstract-Modeling intensity of facial action units from spontaneously displayed facial expressions is challenging mainly because of high variability in subject-specific facial expressiveness, head-movements, illumination changes, etc. These factors make the target problem highly context-sensitive. However, existing methods usually ignore this context-sensitivity of the target problem. We propose a novel Conditional Ordinal Random Field (CORF) model for context-sensitive modeling of the facial action unit intensity, where the W5+ (who, when, what, where, why and how) definition of the context is used. While the proposed model is general enough to handle all six context questions, in this paper we focus on the context questions: who (the observed subject), how (the changes in facial expressions), and when (the timing of facial expressions and their intensity). The context questions who and how are modeled by means of the newly introduced context-dependent covariate effects, and the context question when is modeled in terms of temporal correlation between the ordinal outputs, i.e., intensity levels of action units. We also introduce a weighted softmax-margin learning of CRFs from data with skewed distribution of the intensity levels, which is commonly encountered in spontaneous facial data. The proposed model is evaluated on intensity estimation of pain and facial action units using two recently published datasets (UNBC Shoulder Pain and DISFA) of spontaneously displayed facial expressions. Our experiments show that the proposed model performs significantly better on the target tasks compared to the state-of-the-art approaches. Furthermore, compared to traditional learning of CRFs, we show that the proposed weighted learning results in more robust parameter estimation from the imbalanced intensity data.
Using Gaussian Mixtures for Hindi Speech Recognition System
- International Journal of Signal Processing, Image Processing and Pattern Recognition
, 2011
"... The goal of automatic speech recognition (ASR) system is to accurately and efficiently convert a speech signal into a text message independent of the device, speaker or the environment. In general the speech signal is captured and pre-processed at front-end for feature extraction and evaluated at ba ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
The goal of automatic speech recognition (ASR) system is to accurately and efficiently convert a speech signal into a text message independent of the device, speaker or the environment. In general the speech signal is captured and pre-processed at front-end for feature extraction and evaluated at back-end using the Gaussian mixture hidden Markov model. In this statistical approach since the evaluation of Gaussian likelihoods dominate the total computational load, the appropriate selection of Gaussian mixtures is very important depending upon the amount of training data. As the small databases are available to train the Indian languages ASR system, the higher range of Gaussian mixtures (i.e. 64 and above), normally used for European languages, cannot be applied for them. This paper reviews the statistical framework and presents an iterative procedure to select an optimum number of Gaussian mixtures that exhibits maximum accuracy in the context of Hindi speech recognition system.