## Discriminative kernel-based phoneme sequence recognition (2006)

### Cached

### Download Links

Venue: | IN PROC. OF ICSLP |

Citations: | 7 - 2 self |

### BibTeX

@INPROCEEDINGS{Keshet06discriminativekernel-based,

author = {Joseph Keshet and Samy Bengio and Dan Chazan and Shai Shalev-Shwartz and Yoram Singer},

title = {Discriminative kernel-based phoneme sequence recognition},

booktitle = {IN PROC. OF ICSLP},

year = {2006},

publisher = {}

}

### OpenURL

### Abstract

We describe a new method for phoneme sequence recognition given a speech utterance. In contrast to HMM-based approaches, our method uses a kernel-based discriminative training procedure in which the learning process is tailored to the goal of minimizing the Levenshtein distance between the predicted phoneme sequence and the correct sequence. The phoneme sequence predictor is devised by mapping the speech utterance along with a proposed phoneme sequence to a vector-space endowed with an inner-product that is realized by a Mercer kernel. Building on large margin techniques for predicting whole sequences, we are able to devise a learning algorithm which distills to separating the correct phoneme sequence from all other sequences. We describe an iterative algorithm for learning the phoneme sequence recognizer and further describe an efficient implementation of it. We present initial encouraging experimental results with the TIMIT and compare the proposed method to an HMM-based approach.

### Citations

9946 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ... needs to perform. In addition, there is both theoretical and empirical evidence that discriminative learning algorithms are likely to outperform generative models for the same task (see for instance =-=[17, 5]-=-). One of the main goals of this work is to extend the notion of discriminative learning to the complex task of phoneme sequence prediction. Our proposed method is based on recent advances in kernel m... |

1641 |
Fundamental of Speech Recognition
- Rabiner, Juang
- 1993
(Show Context)
Citation Context ...l operator we have presented is decomposable and thus the best phoneme sequence can be found in polynomial time using dynamic programming (similarly to the Viterbi procedure often implemented in HMMs =-=[12]-=-). 5 Experimental Results To validate the effectiveness of the proposed approach we performed experiments with the TIMIT corpus. All the experiments described here have followed the same methodology. ... |

1053 |
An Introduction to Support Vector Machines
- Cristianini, Shawe-Taylor
- 2000
(Show Context)
Citation Context ... needs to perform. In addition, there is both theoretical and empirical evidence that discriminative learning algorithms are likely to outperform generative models for the same task (see for instance =-=[17, 5]-=-). One of the main goals of this work is to extend the notion of discriminative learning to the complex task of phoneme sequence prediction. Our proposed method is based on recent advances in kernel m... |

517 | Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms
- Collins
(Show Context)
Citation Context ...lassification. The phoneme sequence recognition problem is more complex, since we need to predict a whole sequence rather than a single number. Previous kernel machine methods for sequence prediction =-=[12, 8, 13]-=- introduce optimization problems which require long run-time and high memory resources, and are thus problematic for the large datasets that are typically encountered in speech processing. We propose ... |

461 | Max-margin markov networks
- Taskar, Guestrin, et al.
- 2003
(Show Context)
Citation Context ...the notion of discriminative learning to the complex task of phoneme sequence prediction. Our proposed method is based on recent advances in kernel machines and large margin classifiers for sequences =-=[15, 14]-=-, which in turn build on the pioneering work of Vapnik and colleagues [17, 5]. The phoneme sequence recognizer we devise is based on mapping the speech signal along with the target phoneme sequence in... |

329 | Support vector machine learning for interdependent and structured output spaces
- Tsochantaridis, Hofmann, et al.
- 2004
(Show Context)
Citation Context ...lassification. The phoneme sequence recognition problem is more complex, since we need to predict a whole sequence rather than a single number. Previous kernel machine methods for sequence prediction =-=[15, 16]-=- introduce optimization problems which require long run-time and high memory resources, and are thus problematic for the large datasets that are typically encountered in speech processing. We propose ... |

272 |
Speaker-independent Phone Recognition Using Hidden Markov Models
- Lee, Hon
- 1989
(Show Context)
Citation Context ...and compare the proposed method to an HMM-based approach.s2 IDIAP–RR 06-14 1 Introduction Most previous work on phoneme sequence recognition has focused on Hidden Markov Models (HMM). See for example =-=[10, 7, 2]-=- and the references therein. Despite their popularity, HMM-based approaches have several drawbacks such as convergence of the EM procedure to local maximum and overfitting effects due to the large num... |

234 | From HMM To Segment Models: A Unified View of Stochastic Modeling of Speech Recognition
- Ostendorf, Digalakis, et al.
- 1996
(Show Context)
Citation Context ...to the large number of parameters. Moreover, HMMs do not faithfully reflect the underlying structure of speech signals as they assume conditional independence of observations given the state sequence =-=[11]-=- and often require uncorrelated acoustic features [18]. Another problem with HMMs is that they do not directly address discriminative tasks. In particular, for the task of phoneme sequence prediction,... |

140 | On the generalization ability of on-line learning algorithms
- Cesa-Bianchi, Conconi, et al.
- 2002
(Show Context)
Citation Context ...pers demonstrated that, under some mild technical conditions, the cumulative Levenshtein distance of the iterative procedure, � m i=1 γ(¯pi, ¯p ′ i ), is likely to be small. Moreover, it can be shown =-=[1]-=- that if the cumulative Levenshtein distance of the iterative procedure is small, there exists at least one weight vector among the vectors {w1,...,wm} which attains small averaged Levenshtein distanc... |

115 | Torch: a modular machine learning software library
- Collobert, Bengio, et al.
- 2002
(Show Context)
Citation Context ...ation constraints using a forced alignment. Minimum values of the variances for each Gaussian were set to 20% of the global variance of the data. All HMM experiments were done using the Torch package =-=[3]-=-. All hyper-parameters including number of states, number of Gaussians per state, variance flooring factor, were tuned using the validation set. The overall results are given in Table 1. We report the... |

65 |
Online passive aggressive algorithms
- Crammer, Dekel, et al.
- 2006
(Show Context)
Citation Context ...ves as a complexity-accuracy trade-off parameter as in the SVM algorithm (see [5]). The specific definition of αi that we employ is based on an ongoing work on online learning algorithms appearing in =-=[4, 14, 9]-=-. These papers demonstrated that, under some mild technical conditions, the cumulative Levenshtein distance of the iterative procedure, � m i=1 γ(¯pi, ¯p ′ i ), is likely to be small. Moreover, it can... |

20 |
Fast Algorithms for phone classification and recognition using Segment-based Models
- Digalakis, Ostendorf, et al.
- 1992
(Show Context)
Citation Context ...and compare the proposed method to an HMM-based approach.s2 IDIAP–RR 06-14 1 Introduction Most previous work on phoneme sequence recognition has focused on Hidden Markov Models (HMM). See for example =-=[10, 7, 2]-=- and the references therein. Despite their popularity, HMM-based approaches have several drawbacks such as convergence of the EM procedure to local maximum and overfitting effects due to the large num... |

20 | Framewise phone classification using support vector machines
- Salomon, King, et al.
- 2002
(Show Context)
Citation Context ... that is defined by a kernel operator. One of the well-known discriminative learning algorithms is the support vector machine (SVM), which has already been successfully applied in speech applications =-=[8, 13]-=-. Building on techniques used for learning SVMs, our phoneme sequence recognizer distills to a classifier in this vector-space which is aimed at separating correct phoneme sequences from incorrect one... |

19 | Speech trajectory discrimination using the minimum classification error learning
- Rathinavalu, Deng
- 1998
(Show Context)
Citation Context ...and compare the proposed method to an HMM-based approach.s2 IDIAP–RR 06-14 1 Introduction Most previous work on phoneme sequence recognition has focused on Hidden Markov Models (HMM). See for example =-=[10, 7, 2]-=- and the references therein. Despite their popularity, HMM-based approaches have several drawbacks such as convergence of the EM procedure to local maximum and overfitting effects due to the large num... |

19 | Online algorithm for hierarchical phoneme classification
- Dekel, Keshet, et al.
- 2004
(Show Context)
Citation Context ... on discriminative supervised learning and Mercer kernels. The work presented in this report is part of an ongoing research trying to apply discriminative kernel methods to speech processing problems =-=[8, 6, 9]-=-. So far, the experimental results we obtained with our method for the task of phoneme recognition are still inferior to state-of-the-art results obtained by HMMs. However, while there has been extens... |

17 | Learning to align polyphonic music
- Shalev-Shwartz, Keshet, et al.
(Show Context)
Citation Context ...the notion of discriminative learning to the complex task of phoneme sequence prediction. Our proposed method is based on recent advances in kernel machines and large margin classifiers for sequences =-=[15, 14]-=-, which in turn build on the pioneering work of Vapnik and colleagues [17, 5]. The phoneme sequence recognizer we devise is based on mapping the speech signal along with the target phoneme sequence in... |

15 |
A Review of Large-Vocabulary Continuous Speech Recognition
- Young
- 1996
(Show Context)
Citation Context ...ot faithfully reflect the underlying structure of speech signals as they assume conditional independence of observations given the state sequence [11] and often require uncorrelated acoustic features =-=[18]-=-. Another problem with HMMs is that they do not directly address discriminative tasks. In particular, for the task of phoneme sequence prediction, HMMs as well as other generative models, are not trai... |

10 |
Phoneme alignment based on discriminative learning
- Keshet, Shalev-Shwartz, et al.
(Show Context)
Citation Context ...ves as a complexity-accuracy trade-off parameter as in the SVM algorithm (see [5]). The specific definition of αi that we employ is based on an ongoing work on online learning algorithms appearing in =-=[4, 14, 9]-=-. These papers demonstrated that, under some mild technical conditions, the cumulative Levenshtein distance of the iterative procedure, � m i=1 γ(¯pi, ¯p ′ i ), is likely to be small. Moreover, it can... |

9 | Plosive spotting with margin classifiers
- Keshet, Chazan, et al.
- 2001
(Show Context)
Citation Context ... that is defined by a kernel operator. One of the well-known discriminative learning algorithms is the support vector machine (SVM), which has already been successfully applied in speech applications =-=[8, 13]-=-. Building on techniques used for learning SVMs, our phoneme sequence recognizer distills to a classifier in this vector-space which is aimed at separating correct phoneme sequences from incorrect one... |