Results 1  10
of
83
Supervised sequence labelling with recurrent neural networks
, 2008
"... Recurrent neural networks are powerful sequence learners. They are able to incorporate context information in a flexible way, and are robust to localised distortions of the input data. These properties make them well suited to sequence labelling, where input sequences are transcribed with streams ..."
Abstract

Cited by 59 (6 self)
 Add to MetaCart
Recurrent neural networks are powerful sequence learners. They are able to incorporate context information in a flexible way, and are robust to localised distortions of the input data. These properties make them well suited to sequence labelling, where input sequences are transcribed with streams of labels. Long shortterm memory is an especially promising recurrent architecture, able to bridge long time delays between relevant input and output events, and thereby access long range context. The aim of this thesis is to advance the stateoftheart in supervised sequence labelling with recurrent networks in general, and long shortterm memory in particular. Its two main contributions are (1) a new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and (2) an extension of long shortterm memory to multidimensional data, such as images and video sequences. Experimental results are presented on speech recognition, online and offline handwriting recognition, keyword spotting, image segmentation and image classification, demonstrating the advantages of advanced recurrent networks over other sequential algorithms, such as hidden Markov Models.
Large Margin Training for Hidden Markov Models with Partially Observed States TrinhMinhTri Do
"... Large margin learning of Continuous Density HMMs with a partially labeled dataset has been extensively studied in the speech and handwriting recognition fields. Yet due to the nonconvexity of the optimization problem, previous works usually rely on severe approximations so that it is still an open ..."
Abstract

Cited by 42 (4 self)
 Add to MetaCart
(Show Context)
Large margin learning of Continuous Density HMMs with a partially labeled dataset has been extensively studied in the speech and handwriting recognition fields. Yet due to the nonconvexity of the optimization problem, previous works usually rely on severe approximations so that it is still an open problem. We propose a new learning algorithm that relies on nonconvex optimization and bundle methods and allows tackling the original optimization problem as is. It is proved to converge to a solution with accuracy ɛ with a rate O(1/ɛ). We provide experimental results gained on speech and handwriting recognition that demonstrate the potential of the method. 1.
A primaldual messagepassing algorithm for approximated large scale structured prediction
 In Advances in Neural Information Processing Systems 23
, 2010
"... In this paper we propose an approximated structured prediction framework for large scale graphical models and derive messagepassing algorithms for learning their parameters efficiently. We first relate CRFs and structured SVMs and show that in CRFs a variant of the logpartition function, known as ..."
Abstract

Cited by 38 (19 self)
 Add to MetaCart
(Show Context)
In this paper we propose an approximated structured prediction framework for large scale graphical models and derive messagepassing algorithms for learning their parameters efficiently. We first relate CRFs and structured SVMs and show that in CRFs a variant of the logpartition function, known as the softmax, smoothly approximates the hinge loss function of structured SVMs. We then propose an intuitive approximation for the structured prediction problem, using duality, based on a local entropy approximation and derive an efficient messagepassing algorithm that is guaranteed to converge. Unlike existing approaches, this allows us to learn efficiently graphical models with cycles and very large number of parameters. 1
Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models
 In Proceedings of ICASSP 2007
, 2007
"... In this paper we compare three frameworks for discriminative training of continuousdensity hidden Markov models (CDHMMs). Specifically, we compare two popular frameworks, based on conditional maximum likelihood (CML) and minimum classification error (MCE), to a new framework based on margin maximi ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
(Show Context)
In this paper we compare three frameworks for discriminative training of continuousdensity hidden Markov models (CDHMMs). Specifically, we compare two popular frameworks, based on conditional maximum likelihood (CML) and minimum classification error (MCE), to a new framework based on margin maximization. Unlike CML and MCE, our formulation of large margin training explicitly penalizes incorrect decodings by an amount proportional to the number of mislabeled hidden states. It also leads to a convex optimization over the parameter space of CDHMMs, thus avoiding the problem of spurious local minima. We used discriminatively trained CDHMMs from all three frameworks to build phonetic recognizers on the TIMIT speech corpus. The different recognizers employed exactly the same acoustic front end and hidden state space, thus enabling us to isolate the effect of different cost functions, parameterizations, and numerical optimizations. Experimentally, we find that our framework for large margin training yields significantly lower error rates than both CML and MCE training. Index Terms — speech recognition, discriminative training, MMI, MCE, large margin, phoneme recognition 1.
Softmaxmargin crfs: Training loglinear models with loss functions
 In Proc. of NAACL
, 2010
"... We describe a method of incorporating taskspecific cost functions into standard conditional loglikelihood (CLL) training of linear structured prediction models. Recently introduced in the speech recognition community, we describe the method generally for structured models, highlight connections to ..."
Abstract

Cited by 30 (3 self)
 Add to MetaCart
We describe a method of incorporating taskspecific cost functions into standard conditional loglikelihood (CLL) training of linear structured prediction models. Recently introduced in the speech recognition community, we describe the method generally for structured models, highlight connections to CLL and maxmargin learning for structured prediction (Taskar et al., 2003), and show that the method optimizes a bound on risk. The approach is simple, efficient, and easy to implement, requiring very little change to an existing CLL implementation. We present experimental results comparing with several commonlyused methods for training structured predictors for namedentity recognition. 1
Discriminative models for speech recognition
 In Information Theory and Applications Workshop
, 1997
"... Abstract — The vast majority of automatic speech recognition systems use Hidden Markov Models (HMMs) as the underlying acoustic model. Initially these models were trained based on the maximum likelihood criterion. Significant performance gains have been obtained by using discriminative training crit ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
(Show Context)
Abstract — The vast majority of automatic speech recognition systems use Hidden Markov Models (HMMs) as the underlying acoustic model. Initially these models were trained based on the maximum likelihood criterion. Significant performance gains have been obtained by using discriminative training criteria, such as maximum mutual information and minimum phone error. However, the underlying acoustic model is still generative, with the associated constraints on the state and transition probability distributions, and classification is based on Bayes ’ decision rule. Recently, there has been interest in examining discriminative, or direct, models for speech recognition. This paper briefly reviews the forms of discriminative models that have been investigated. These include maximum entropy Markov models, hidden conditional random fields and conditional augmented models. The relationships between the various models and issues with applying them to large vocabulary continuous speech recognition will be discussed. I.
Structured log linear models for noise robust speech recognition
 Signal Processing Letters, IEEE
, 2010
"... [ The use of discriminative models for structured classification tasks, such as automatic speech recognition is becoming increasingly popular. The major contribution of this work is we proposed a large margin structured loglinear model for noise robust continuous ASR. 1 An important aspect of logl ..."
Abstract

Cited by 19 (10 self)
 Add to MetaCart
[ The use of discriminative models for structured classification tasks, such as automatic speech recognition is becoming increasingly popular. The major contribution of this work is we proposed a large margin structured loglinear model for noise robust continuous ASR. 1 An important aspect of loglinear models is the form of the features. The features used in our structured log linear model are derived from generative kernels. This provides an elegant way of combining generative and discriminative models to handle timevarying data. Additionally, since the features are based on the generative models, modelbased compensation can be easily performed for noise robustness. Third, the designed joint feature space can be decomposed at the arc level. This allows efficient decoding and training with lattices, which is important for any larger vocabulary extensions. Previous work in this area is extended in two important directions. First, instead of using CML training which is commonly used for discriminative models, this paper describes efficient large margin training for sentencelevel log linear models based on lattices. Depending on the nature of the joint featurespace and labels, we have proved that this form of model is closely related to structured SVMs and Multiclass SVMs. Second, efficient latticebased classification of continuous data is also performed incorporating a joint feature space. This novel model combines generative kernels, discriminative models, efficient latticebased large margin training and modelbased noise compensation. It is evaluated on a noise corrupted continuous digit task: AURORA 2.0. Results on the AURORA 2 demonstrate that modelling the structure information yields significant improvements.]
Conditional random fields for integrating local discriminative classifiers
 Audio, Speech, and Language Processing, IEEE Transactions on
, 2008
"... Abstract—Conditional random fields (CRFs) are a statistical framework that has recently gained in popularity in both the automatic speech recognition (ASR) and natural language processing communities because of the different nature of assumptions that are made in predicting sequences of labels compa ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
Abstract—Conditional random fields (CRFs) are a statistical framework that has recently gained in popularity in both the automatic speech recognition (ASR) and natural language processing communities because of the different nature of assumptions that are made in predicting sequences of labels compared to the more traditional hidden Markov model (HMM). In the ASR community, CRFs have been employed in a method similar to that of HMMs, using the sufficient statistics of input data to compute the probability of label sequences given acoustic input. In this paper, we explore the application of CRFs to combine local posterior estimates provided by multilayer perceptrons (MLPs) corresponding to the framelevel prediction of phone classes and phonological attribute classes. We compare phonetic recognition using CRFs to an HMM system trained on the same input features and show that the monophone label CRF is able to achieve superior performance to a monophonebased HMM and performance comparable to a 16 Gaussian mixture triphonebased HMM; in both of these cases, the CRF obtains these results with far fewer free parameters. The CRF is also able to better combine these posterior estimators, achieving a substantial increase in performance over an HMMbased triphone system by mixing the two highly correlated sets of phone class and phonetic attribute class posteriors. Index Terms—Automatic speech recognition (ASR), random fields. I.
Hybrid Generative/Discriminative Learning for Automatic Image Annotation
"... Automatic image annotation (AIA) raises tremendous challenges to machine learning as it requires modeling of data that are both ambiguous in input and output, e.g., images containing multiple objects and labeled with multiple semantic tags. Even more challenging is that the number of candidate tags ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
Automatic image annotation (AIA) raises tremendous challenges to machine learning as it requires modeling of data that are both ambiguous in input and output, e.g., images containing multiple objects and labeled with multiple semantic tags. Even more challenging is that the number of candidate tags is usually huge (as large as the vocabulary size) yet each image is only related to a few of them. This paper presents a hybrid generativediscriminative classifier to simultaneously address the extreme dataambiguity and overfittingvulnerability issues in tasks such as AIA. Particularly: (1) an ExponentialMultinomial Mixture (EMM) model is established to capture both the input and output ambiguity and in the meanwhile to encourage prediction sparsity; and (2) the prediction ability of the EMM model is explicitly maximized through discriminative learning that integrates variational inference of graphical models and the pairwise formulation of ordinal regression. Experiments show that our approach achieves both superior annotation performance and better tag scalability. 1