#### DMCA

## Large margin hidden Markov models for automatic speech recognition (2007)

### Cached

### Download Links

- [books.nips.cc]
- [www.cs.berkeley.edu]
- [www.cs.ucsd.edu]
- [www-rcf.usc.edu]
- [cseweb.ucsd.edu]
- [wiki.inf.ed.ac.uk]
- [www-bcf.usc.edu]
- [cseweb.ucsd.edu]
- [www-rcf.usc.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | in Advances in Neural Information Processing Systems 19 |

Citations: | 83 - 7 self |

### Citations

13215 | Statistical learning theory
- Vapnik
- 1998
(Show Context)
Citation Context ...y support vector machines (SVMs), the learning algorithm in large margin GMMs is designed to maximize the distance between labeled examples and the decision boundaries that separate different classes =-=[19]-=-. Under mild assumptions, the required optimization is convex, without any spurious local minima. In contrast to SVMs, however, large margin GMMs are very naturally suited to problems in multiway (as ... |

1222 |
Nonlinear Programming. Athena Scientific, 2 edition
- Bertsekas
- 1999
(Show Context)
Citation Context ...ive function: L = � hinge � 1 + z T n (Φyn � � −Φc)zn + γ trace(Ψc), (15) n,c�=yn which is convex in terms of the positive semidefinite matrices Φc. We minimize L using a projected subgradient method =-=[2]-=-, taking steps along the subgradient of L, then projecting the matrices {Φc} back onto the set of positive semidefinite matrices after each update. This method is guaranteed to converge to the global ... |

1100 | Semidefinite programming
- Vandenberghe, Boyd
- 1994
(Show Context)
Citation Context ...we propose a convex optimization that selects the “smallest” parameters that satisfy the large margin constraints in eq. (4). In this case, the optimization is an instance of semidefinite programming =-=[18]-=-: min � c trace(Ψc) s.t. 1 + zT n(Φyn − Φc)zn ≤ 0, ∀c �= yn, n = 1, 2, . . ., N (5) Φc ≻ 0, c = 1, 2, . . ., C Note that the trace of the matrix Ψc appears in the above objective function, as opposed ... |

603 | Max-Margin Markov Networks
- Taskar, Guestrin, et al.
- 2003
(Show Context)
Citation Context ...ining of CD-HMMs builds on ideas from many previous studies in machine learning and ASR. It has similar motivation as recent frameworks for sequential classification in the machine learning community =-=[1, 6, 17]-=-, but differs in its focus on the real-valued acoustic feature representations used in ASR. It has similar motivation as other discriminative paradigms in ASR [3, 4, 5, 11, 13, 20], but differs in its... |

344 | Speaker-independent phone recognition using hidden markov models,”
- Lee, Hon
- 1989
(Show Context)
Citation Context ...ber of margin constraints makes it possible to train on larger data sets. We discuss how to perform the optimization efficiently in appendix A. 3.3 Phoneme recognition We used the TIMIT speech corpus =-=[7, 9, 12]-=- to perform experiments in phonetic recognition. We followed standard practices in preparing the training, development, and test data. Our signal processing front-end computed 39-dimensional acoustic ... |

246 | Hidden markov support vector machines
- Altun, Tsochantaridis, et al.
- 2003
(Show Context)
Citation Context ...ining of CD-HMMs builds on ideas from many previous studies in machine learning and ASR. It has similar motivation as recent frameworks for sequential classification in the machine learning community =-=[1, 6, 17]-=-, but differs in its focus on the real-valued acoustic feature representations used in ASR. It has similar motivation as other discriminative paradigms in ASR [3, 4, 5, 11, 13, 20], but differs in its... |

228 |
Discriminative Learning for Minimum Error Classification
- Juang, Katagiri
- 1992
(Show Context)
Citation Context ... metrics for ASR. Noting this weakness, many researchers in ASR have studied alternative frameworks for parameter estimation based on conditional maximum likelihood [11], minimum classification error =-=[4]-=- and maximum mutual information [20]. The learning algorithms in these frameworks optimize discriminative criteria that more closely track actual error rates, as opposed to the EM algorithm for maximu... |

225 | An application of recurrent nets to phone probability estimation, Neural Networks,
- Robinson, J
- 1994
(Show Context)
Citation Context ...ber of margin constraints makes it possible to train on larger data sets. We discuss how to perform the optimization efficiently in appendix A. 3.3 Phoneme recognition We used the TIMIT speech corpus =-=[7, 9, 12]-=- to perform experiments in phonetic recognition. We followed standard practices in preparing the training, development, and test data. Our signal processing front-end computed 39-dimensional acoustic ... |

179 |
Speech database development: Design and analysis Report no.
- Lemel, Kassel, et al.
- 1986
(Show Context)
Citation Context ...ber of margin constraints makes it possible to train on larger data sets. We discuss how to perform the optimization efficiently in appendix A. 3.3 Phoneme recognition We used the TIMIT speech corpus =-=[7, 9, 12]-=- to perform experiments in phonetic recognition. We followed standard practices in preparing the training, development, and test data. Our signal processing front-end computed 39-dimensional acoustic ... |

123 |
Large scale discriminative training of hidden Markov models for speech recognition,
- Woodland, Povey
- 2002
(Show Context)
Citation Context ...ess, many researchers in ASR have studied alternative frameworks for parameter estimation based on conditional maximum likelihood [11], minimum classification error [4] and maximum mutual information =-=[20]-=-. The learning algorithms in these frameworks optimize discriminative criteria that more closely track actual error rates, as opposed to the EM algorithm for maximum likelihood estimation. These algor... |

117 | An inequality for rational functions with applications to some statistical estimation problems.
- Gopalakrishnan, Kanevsky, et al.
- 1991
(Show Context)
Citation Context ...the machine learning community [1, 6, 17], but differs in its focus on the real-valued acoustic feature representations used in ASR. It has similar motivation as other discriminative paradigms in ASR =-=[3, 4, 5, 11, 13, 20]-=-, but differs in its goal of margin maximization and its formulation of the learning problem as a convex optimization over positive semidefinite matrices. The recent margin-based approach of [10] is c... |

114 | Comparison of learning algorithms for handwritten digit recognition.
- LeCun
- 1995
(Show Context)
Citation Context ...s how to perform the optimization efficiently for large data sets in appendix A. 2.4 Handwritten digit recognition We trained large margin GMMs for multiway classification of MNIST handwritten digits =-=[8]-=-. The MNIST data set has 60000 training examples and 10000 test examples. Table 1 shows that the large margin GMMs yielded significantly lower test error rates than GMMs trained by maximum likelihood ... |

64 |
Large margin Gaussian mixture modeling for phonetic classification and recognition,”
- Sha, Saul
- 2006
(Show Context)
Citation Context ...killfully implemented, they lead to lower error rates [13, 20].sRecently, in a new approach to discriminative acoustic modeling, we proposed the use of “large margin GMMs” for multiway classification =-=[15]-=-. Inspired by support vector machines (SVMs), the learning algorithm in large margin GMMs is designed to maximize the distance between labeled examples and the decision boundaries that separate differ... |

46 |
MMI training for continuous phoneme recognition on the TIMIT database,” in
- Kapadia, Valtchev, et al.
- 1993
(Show Context)
Citation Context ...the machine learning community [1, 6, 17], but differs in its focus on the real-valued acoustic feature representations used in ASR. It has similar motivation as other discriminative paradigms in ASR =-=[3, 4, 5, 11, 13, 20]-=-, but differs in its goal of margin maximization and its formulation of the learning problem as a convex optimization over positive semidefinite matrices. The recent margin-based approach of [10] is c... |

36 | Comparison of large margin training to other discriminative methods for phonetic recognition by hidden markov models,” in ICASSP,
- Sha, Saul
- 2007
(Show Context)
Citation Context ...simple case where the observations in each hidden state are modeled by a single ellipsoid. The extension to multiple mixture components closely follows the approach in section 2.3 and can be found in =-=[14, 16]-=-. Margin-based learning of transition probabilities is likewise straightforward but omitted for brevity. Both these extensions were implemented, however, for the experiments on phonetic recognition in... |

24 |
Large margin HMMs for speech recognition,”
- Li, Jiang, et al.
- 2005
(Show Context)
Citation Context ..., 13, 20], but differs in its goal of margin maximization and its formulation of the learning problem as a convex optimization over positive semidefinite matrices. The recent margin-based approach of =-=[10]-=- is closest in terms of its goals, but entirely different in its mechanics; moreover, its learning is limited to the mean parameters in GMMs. 2 Large margin GMMs for multiway classification Before dev... |

16 |
A decision-theoretic formulation of a training problem in speech recognition and a comparison of training by unconditional versus conditional maximum likelihood
- Nádas
- 1983
(Show Context)
Citation Context ...rror rates, which are more relevant metrics for ASR. Noting this weakness, many researchers in ASR have studied alternative frameworks for parameter estimation based on conditional maximum likelihood =-=[11]-=-, minimum classification error [4] and maximum mutual information [20]. The learning algorithms in these frameworks optimize discriminative criteria that more closely track actual error rates, as oppo... |

13 |
Large margin training of acoustic models for speech recognition
- Sha
- 2006
(Show Context)
Citation Context ...simple case where the observations in each hidden state are modeled by a single ellipsoid. The extension to multiple mixture components closely follows the approach in section 2.3 and can be found in =-=[14, 16]-=-. Margin-based learning of transition probabilities is likewise straightforward but omitted for brevity. Both these extensions were implemented, however, for the experiments on phonetic recognition in... |

8 |
Optimization methods for discriminative training
- Roux, McDermott
(Show Context)
Citation Context ...or maximum likelihood estimation. These algorithms do not enjoy the simple update rules and relatively fast convergence of EM, but carefully and skillfully implemented, they lead to lower error rates =-=[13, 20]-=-.sRecently, in a new approach to discriminative acoustic modeling, we proposed the use of “large margin GMMs” for multiway classification [15]. Inspired by support vector machines (SVMs), the learning... |

6 | Acoustic Modeling for Large Vocabulary Continuous Speech Recognition - Young - 1999 |

4 |
Conditional random fields: Probabilisitc models for segmenting and labeling sequence data
- Lafferty, McCallum, et al.
- 2001
(Show Context)
Citation Context ...ining of CD-HMMs builds on ideas from many previous studies in machine learning and ASR. It has similar motivation as recent frameworks for sequential classification in the machine learning community =-=[1, 6, 17]-=-, but differs in its focus on the real-valued acoustic feature representations used in ASR. It has similar motivation as other discriminative paradigms in ASR [3, 4, 5, 11, 13, 20], but differs in its... |