## Code Breaking for Automatic Speech Recognition

Citations: | 2 - 1 self |

### BibTeX

@MISC{Venkataramani_codebreaking,

author = {Veera Venkataramani},

title = {Code Breaking for Automatic Speech Recognition},

year = {}

}

### OpenURL

### Abstract

Code Breaking is a divide and conquer approach for sequential pattern recognition tasks where we identify weaknesses of an existing system and then use specialized decoders to strengthen the overall system. We study the technique in the context of Automatic Speech Recogniton. Using the lattice cutting algorithm, we first analyze lattices generated by a state-of-the-art speech recognizer to spot possible errors in its first-pass hypothesis. We then train specialized decoders for each of these problems and apply them to refine the first-pass hypothesis. We study the use of Support Vector Machines (SVMs) as discriminative models over each of these problems. The estimation of a posterior distribution over hypoth-esis in these regions of acoustic confusion is posed as a logistic regression problem. GiniSVMs, a variant of SVMs, can be used as an approximation technique to estimate the parameters of the logistic regression problem. We first validate our approach on a small vocabulary recognition task, namely, alphadigits. We show that the use of GiniSVMs can substantially improve the per-formance of a well trained MMI-HMM system. We also find that it is possible to derive reliable confidence scores over the GiniSVM hypotheses and that these can be used to good effect in hypothesis combination. We will then analyze lattice cutting in terms of its ability to reliably identify, and provide good alternatives for incorrectly hypothesized words in the Czech MALACH domain, a large vocabulary task. We describe a procedure to train and apply SVMs to strengthen the first pass system, resulting in small but statistically significant recog-nition improvements. We conclude with a discussion of methods including clustering for obtaining further improvements on large vocabulary tasks.

### Citations

8842 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ... model parameters to obtain new estimates. We can now iterate with these new estimates. The EM algorithm guarantees not to decrease the likelihood assigned to the training data. A. P. Dempster et al. =-=[21]-=- give a formal treatment of the EM algorithm. If θ is the set of the current parameters for the HMMs, the auxiliary function is given by Q(θ, θ ′ ) = � P(s|O,W; θ) logP(O, s|W; θ ′ ) (2.6) q∈Q where Q... |

2460 | A Decision-Theoretic Generalization of OnLine Learning and an Application to Boosting
- Freund, Shapire
- 1997
(Show Context)
Citation Context ...ose the instances of confusion pairs we want to process based on a threshold that results in high quality segment sets; this filtering of the segment sets will be elaborated in Chapter 7. 56sAdaBoost =-=[36]-=- is a learning algorithm that first identifies training data that are erroneously classified. Several instances of the original classifier are trained over different distributions of the training data... |

2337 | Support-vector Networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ... dot products by a kernel function K(xi,xj) which would imply that we perform a nonlinear feature transformation (ζ(·)) on the data prior to performing the dot product, i.e., K(xi,xj) = ζ(xi) · ζ(xj) =-=[19]-=-. Mathematically, the dual is written as argmax αi � i αi − 1 2 33 � αiαjyiyjK(xi,xj). (4.15) ij with the same constraints as given by Equations 4.11 and 4.12. New observations x will then classified ... |

1605 |
Fundamentals of Speech Recognition
- Rabiner, Jung
- 1993
(Show Context)
Citation Context ... a T-length observation vector sequence O = o1 · · ·oT by the Acoustic Processor or the front-end of the speech recognizer. The maximum a posteriori (MAP) recognizer can then be formulated as follows =-=[2, 81, 53]-=-: choose the most likely word string ( ˆ W) given the acoustic data: ˆW = argmax P(W |O), (2.1) W ∈W where W represents all possible word strings. Using Bayes rule we can write, ˆW = argmax W ∈W P(O|W... |

1393 | A Training Algorithm for Optimal Margin Classifiers
- Boser, Guyon, et al.
- 1992
(Show Context)
Citation Context ...ower error rates on the training set while a low values of C imply a larger margin and therefore better generalization capabilities. The minimization is usually done using Lagrangian multipliers {αi} =-=[6]-=-, one for each constraint. The reasons for introducing Lagrangian multipliers are twofold: (i) the training data appear only in the form of dot products between vectors, a property we will soon take a... |

1316 |
Binary codes capable of correcting deletions, insertions and reversals. Cybernetics Control Theory 10:707–710
- Levenshtein
- 1966
(Show Context)
Citation Context ...umber of recognition errors is the minimum number of insertions, deletions and substitutions required to obtain the truth from the recognizer output. This measure also called the Levenshtein distance =-=[61]-=- can be efficiently calculated using dynamic programming techniques. The truth is usually taken to be human transcriptions; humans listen to the speech closely and write down what they think was spoke... |

1254 |
Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm
- Viterbi
- 1967
(Show Context)
Citation Context ...of as a large HMM. MAP decoding according to Equation 2.4 involves searching this huge network and determining the most likely path given the acoustic observations. Usually a form of Viterbi decoding =-=[96]-=- is used to obtain the MAP hypothesis. The token passing model [84] is one such formulation of the Viterbi algorithm. As we traverse the large HMM structure, we associate a token with each state j at ... |

1190 |
Practical methods of optimization
- Fletcher
- 1987
(Show Context)
Citation Context ...age of and (ii) the constraints (Equations 4.1) onsthe training data will be replaced with constraints on the Lagrangian multipliers themselves, which are much easier to handle. The primal Lagrangian =-=[34]-=- is given by argmin φ,b 1 2 ||φ||2 + C � ξi − � αi(yi(xi · φ + b) − 1 + ξi) − � i i i λiξi 32 (4.5) where λi are Lagrangian multipliers to enforce the positivity of ξi. Maximizing the primal w.r.to φ,... |

782 |
Statistical Methods for Speech Recognition
- Jelinek
- 1998
(Show Context)
Citation Context ... a T-length observation vector sequence O = o1 · · ·oT by the Acoustic Processor or the front-end of the speech recognizer. The maximum a posteriori (MAP) recognizer can then be formulated as follows =-=[2, 81, 53]-=-: choose the most likely word string ( ˆ W) given the acoustic data: ˆW = argmax P(W |O), (2.1) W ∈W where W represents all possible word strings. Using Bayes rule we can write, ˆW = argmax W ∈W P(O|W... |

544 |
An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes, Inequalities 3
- Baum
- 1972
(Show Context)
Citation Context ...e phone or state level are not available. We assume that the state information is hidden or that the training data is incomplete. There is an efficient algorithm, called the Baum-Welch (BW) algorithm =-=[3]-=- (sometimes 8scalled the Forward-Backward (FB) algorithm) that enables the estimation of model parameters given incomplete training data under the Maximum-Likelihood (ML) criterion. Baum-Welch is an i... |

482 |
Connectionist speech recognition: A hybrid approach
- Bourlard, Morgan
- 1994
(Show Context)
Citation Context ...lustering approach to break down the task of discriminating between thousands of classes (HMM states) into a smaller classification tasks; their motivation is the application of connectionist methods =-=[7]-=- for large vocabulary recognition. The smaller tasks are determined ahead of recognition by agglomerative clustering on information divergence. During recognition, the posterior probability of a class... |

462 | Max-margin markov networks
- Taskar, Guestrin, et al.
- 2003
(Show Context)
Citation Context ... such as context-dependency, efficient parameter estimation procedures, etc. A framework that incorporates a large number of SVMs in a sequence classification task is still an active research problem =-=[1, 92]-=-. 42s5.2 Feature Spaces We will first discuss methods to transform variable length sequences into vectors of fixed dimension. Towards this end, we would also like to use the HMMs that we have trained ... |

458 | Selection of relevant features and examples in machine learning,”Artificial
- Blum, Langley
- 1997
(Show Context)
Citation Context ...e full matrix ˆ Σsc is problematic. 5.5.3 Dimensionality Reduction For efficiency and modeling robustness there may be value in reducing the dimensionality of the score-space. There has been research =-=[5, 90]-=- to estimate the information content of each dimension so that non-informative dimensions can be discarded. Assuming independence between dimensions, the goodness of a dimension can be found based on ... |

414 | Exploiting generative models in discriminative classifiers
- Jaakkola, Haussler
- 1998
(Show Context)
Citation Context ...so like to use the HMMs that we have trained so that some of the advantages of the generative models can be used along with the discriminatively trained models. 5.2.1 Fisher score-space Fisher scores =-=[48]-=- are a method that transform variable length sequences into vectors of fixed dimension. It assumes the existence of a parametrized generative model for the observed data. Each component of the Fisher ... |

351 | Fisher discriminant analysis with kernels
- Mika, Ratsch, et al.
- 1999
(Show Context)
Citation Context ...hey avoid the problem of dealing with variable-length feature vectors. They notice the problem of scaling in SVMs when given massive amounts of training data and use of the Kernel Fisher Discriminant =-=[65]-=- to alleviate the problem. I. Bazzi and D. Katabi [4] create a fixed length observation by selecting a fixed number of the most dissimilar feature vectors. They used Principal Component Analysis to re... |

201 | Hidden Markov support vector machines
- Altun, Tsochantaridis, et al.
- 2003
(Show Context)
Citation Context ... such as context-dependency, efficient parameter estimation procedures, etc. A framework that incorporates a large number of SVMs in a sequence classification task is still an active research problem =-=[1, 92]-=-. 42s5.2 Feature Spaces We will first discuss methods to transform variable length sequences into vectors of fixed dimension. Towards this end, we would also like to use the HMMs that we have trained ... |

190 | Extracting supports data for given task
- Schölkopf, Burges, et al.
- 1995
(Show Context)
Citation Context ...e definite for any choice of the gain factor d [77]. For the regular SVMs, this can result in non-convex optimization functions. In spite of this, tanh kernels have been used successfully in practice =-=[88]-=-. GiniSVMs have the advantage that, unlike regular SVMs, they can employ non positive-definite kernels and still produce convex optimization functions. This can be seen by inspecting Equation 5.13. Th... |

154 | Support vector regression machines
- Drucker, Burges, et al.
- 1997
(Show Context)
Citation Context ...are this modeling approach to other prior work in this area.s5.1 Motivation and Challenges Support Vector Machines (SVMs) are discriminative pattern classifiers that have shown remarkable performance =-=[8, 24, 59]-=- in static pattern classification tasks (by static we mean that the observations of the patterns to be classified are of a fixed dimension, d). Some of these tasks include handwriting recognition [59]... |

152 | Improving the accuracy and speed of support vector machines
- Burges, Schölkopf
- 1996
(Show Context)
Citation Context ...are this modeling approach to other prior work in this area.s5.1 Motivation and Challenges Support Vector Machines (SVMs) are discriminative pattern classifiers that have shown remarkable performance =-=[8, 24, 59]-=- in static pattern classification tasks (by static we mean that the observations of the patterns to be classified are of a fixed dimension, d). Some of these tasks include handwriting recognition [59]... |

128 | The use of context in large vocabulary speech recognition
- Odell
- 1995
(Show Context)
Citation Context ...orm distribution. The phone sequence B in Figure 2.2 shows HMMs modeling individual phones. These kind of HMMs are termed monophone models. However, monophone models cannot capture contextual effects =-=[75]-=- between phones. To see this effect consider the pronunciation of ae in the two words: EXAM eh g z ae m BRAG b r ae g. It is reasonable to expect ae to be pronounced differently (especially at the pho... |

127 | Speech recognition by composition of weighted finite automata
- Pereira, Riley
- 1997
(Show Context)
Citation Context ...ntinuous speech recognition. We hope we have convinced the reader of the same. 109sAppendix A 110 Weighted Finite State Automata We closely follow the presentation of F. C. N. Pereira and M. D. Riley =-=[79]-=-. A semiring (K, +K, ×K) is defined as a set K with two binary operations, collection +K and extension ×K such that • collection is associative and commutative with identity 0K; • extension is associa... |

124 |
Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density
- Legetter, Woodland
- 1995
(Show Context)
Citation Context ...ven by P(o|s) = K� i=1 ci,s (2π) D/2 exp |Σi,s| 1/2 � − 1 2 (o − Aµi,s) ⊤ Σ −1 i,s 9 � (o − Aµi,s) . (2.7) The most popular transform used is the Maximum Likelihood Linear Regression (MLLR) transform =-=[60]-=-. This transform (A) is computed so as to increase the likelihood assigned by the SD models to the hypothesis of the SI system. Referring to Equation 2.6, the parameters to be re-estimated are now the... |

114 |
Probabilities for SV machines
- Platt
- 2000
(Show Context)
Citation Context ...ces as given by Equation 4.16. We need to post-process the outputs of the SVMs to map them to normalized scores. There has been research to generate probabilities from SVM outputs using held-out data =-=[80]-=- and by approximate inference schemes [58]. We will now look at how we can generate normalized scores directly from large-margin classifiers.sζ (•) Figure 4.2: An original two-dimensional linearly non... |

109 | Probabilistic Kernel Regression Models
- Jaakkola, Haussler
- 1999
(Show Context)
Citation Context ...ic SVM is that its raw outputs are unnormalized scores and have to transformed to obtain conditional probability estimates. We then present an unified framework called Probabilistic Kernel Regression =-=[49]-=- that subsumes SVMs. Normalized scores can be generated from some large margin classifiers under this framework. Finally, we present the GiniSupport Vector Machine [15], an approximation to the Kernel... |

102 | An inequality for rational functions with applications to some statistical estimation problems - Gopalakrishnan, Kanevsky, et al. - 1991 |

102 | A Kullback-Leibler divergence based kernel for SVM classification in multimedia applications
- Moreno, Ho, et al.
- 2004
(Show Context)
Citation Context ...rt extremely good results even with a simple single full covariance Gaussian models. Moreno et al. have also applied these Kullback-Leibler Divergence Based Kernels also to image classification tasks =-=[68]-=-. SVMs have also been used for other speech tasks also. C. Cortes et al. [18] developed rational kernels that use the word (or phone) strings in two lattices to determine if the lattices are similar. ... |

92 | Comparison of Learning Algorithms for Handwritten Digit Recognition
- LeCun
- 1995
(Show Context)
Citation Context ...are this modeling approach to other prior work in this area.s5.1 Motivation and Challenges Support Vector Machines (SVMs) are discriminative pattern classifiers that have shown remarkable performance =-=[8, 24, 59]-=- in static pattern classification tasks (by static we mean that the observations of the patterns to be classified are of a fixed dimension, d). Some of these tasks include handwriting recognition [59]... |

81 | Applications of Support Vector Machines to Speech Recognition
- Ganapathiraju, Hamaker, et al.
(Show Context)
Citation Context ...ast as powerful a classifier as the underlying generative model. 5.2.2 Log-likelihood ratio score-space Fisher scores were extended in the case when the generative model is an HMM by A. Ganapathiraju =-=[39]-=- and by N. Smith et al. [91]. They were also extended to better model the case when there are two competing HMMs [90] for a given observation sequence. This formulation has the added benefit that the ... |

75 | Large Scale Discriminative Training for Speech Recognition
- woodland, Povey
- 2000
(Show Context)
Citation Context ...eworks that besides increasing the training set likelihood under the corresponding models also attempts to lower the likelihood assigned by competing model sequences. Maximum Mutual Information (MMI) =-=[74, 43, 102]-=- is one such criterion which tries to maximize the mutual information between the training word sequences W and the observation sequences O. Formally MMI attempts to estimate parameters as θ ′ = argma... |

73 | On the use of support vector machines for phonetic classification
- Clarkson, Moreno
- 1999
(Show Context)
Citation Context ...f low-confidence identified in lattices; we do not apply SVMs for every segment. Also, our SVMs features are derived from HMMs themselves and not from the input features. 58sP. Clarkson and P. Moreno =-=[17]-=- used the same 3-4-3 idea as [40] and studied the performance of SVMs in vowel and phone classification. They also give a detailed analysis of challenges involved in applying SVMs for speech recogniti... |

62 | M.: Discriminative language modeling with conditional random fields and the perceptron algorithm
- Roark, Saraclar, et al.
(Show Context)
Citation Context ...the estimated distribution, i.e., 107 ˆw = argmaxP(w|Φ) (9.1) w∈C This is an explicit form of building language models to discriminate between specific word pairs. This is in contrast to earlier work =-=[83]-=- where language model weights in large lattices were modified to reduce the WER. 9.1.2 Multi-Class classifiers An immediate extension to our experiments towards obtaining further improvements is the u... |

59 |
Minimum Bayes-Risk Automatic Speech Recognition,” Computer Speech and Language
- Goel, Byrne
(Show Context)
Citation Context ... all possible alignments of each path in the lattice to the reference hypothesis and then choosing the best among them. This is simply intractable. However there is an approximate efficient algorithm =-=[41, 57]-=- that transforms the original lattice to a form (see Figure 3.1,middle) that contains all the information needed to find the best alignments of every word string to the reference hypothesis W. The inf... |

53 | Large vocabulary decoding and confidence estimation using word posterior probabilities
- Evermann, Woodland
- 2000
(Show Context)
Citation Context ... of comparing two links in a lattice and a measure of confidence of the system in its hypothesis. We follow the framework introduced by Wessel et al. [101] and then developed by Evermann and Woodland =-=[30]-=-. The link posterior probability γ(l|O) is defined as the ratio of the sum of the likelihoods of all paths passing through l to the likelihood of the observed data. Formally, γ(l|O) = � Ql P(W,O) P(O)... |

49 | Using Word Probabilities as Confidence Measures
- Wessel, Macherey, et al.
- 1998
(Show Context)
Citation Context ...ink in the lattice; this will give us both a means of comparing two links in a lattice and a measure of confidence of the system in its hypothesis. We follow the framework introduced by Wessel et al. =-=[101]-=- and then developed by Evermann and Woodland [30]. The link posterior probability γ(l|O) is defined as the ratio of the sum of the likelihoods of all paths passing through l to the likelihood of the o... |

46 | Moderating the outputs of support vector machine classifiers
- Kwok
- 1999
(Show Context)
Citation Context ...post-process the outputs of the SVMs to map them to normalized scores. There has been research to generate probabilities from SVM outputs using held-out data [80] and by approximate inference schemes =-=[58]-=-. We will now look at how we can generate normalized scores directly from large-margin classifiers.sζ (•) Figure 4.2: An original two-dimensional linearly non-separable problem becomes separable in th... |

40 |
Tools for the analysis of benchmark speech recognition tests
- Pallett, Fisher, et al.
- 1990
(Show Context)
Citation Context ..., these gains are statistically significant and stable with respect to λ: we obtained this performance improvement for λ = 0.4, 0.5, 0.6, and, 0.7, and in all instances the significance test p-values =-=[78]-=- were less than 0.001. 7.6 Summary of Experiments The experiments in this chapter were designed to show the feasibility of codebreaking on an LVCSR task. We showed that we can use GiniSVMs in combinat... |

39 |
The Switchboard Transcription Project
- GREENBERG
- 1996
(Show Context)
Citation Context ...blished for the phonetic evaluation component of the 2000 Large Vocabulary Conversational Speech Recognition evaluation [35] that makes use of the ICSI phonetically transcribed SWITCHBOARD collection =-=[45]-=-. Baseform acoustic models P(O|B; θB) consisting of 48 monophone models were trained as in the JHU 2000 evaluation system [35]. The models were estimated on the training portion of the ICSI data using... |

38 | Detection of Abrupt Spectral Changes using Support Vector Machines. An Application to Audio Signal Segmentation
- Davy, Godsill
- 2002
(Show Context)
Citation Context ..., pitch, energy, and spectral contours [89]. C. Ma and M. A. Rudolph [62] used SVMs for utterance verification. Kernel ideas have also been used in eigen voice adaptation [63]. M. Davy and S. Godsill =-=[20]-=- use SVMs as a novelty detector for audio signal representation. B. Krishnapuram and L. Carin [56] use Fisher Scores for multiaspect target recognition. SVMs have also been used to detect stop consona... |

37 | Discriminative, Generative and Imitative Learning
- Jebara
- 2001
(Show Context)
Citation Context ...ized scores can be generated from some large margin classifiers under this framework. Finally, we present the GiniSupport Vector Machine [15], an approximation to the Kernel Logistic Regression (KLR) =-=[50]-=-. Unlike KLR, the GiniSVM produces both sparse solutions and has a quadratic optimization function. We will conclude this chapter with a brief discussion justifying the use of large margin classifiers... |

33 | Distinctive feature detection using support vector machines
- Niyogi, Burges, et al.
- 1999
(Show Context)
Citation Context ...tector for audio signal representation. B. Krishnapuram and L. Carin [56] use Fisher Scores for multiaspect target recognition. SVMs have also been used to detect stop consonants in continuous speech =-=[71]-=-. SVMs have been used to classify speech as either adult or child voiced [70] based on acoustic and linguistic scores. N. Mesgarani et al. [64] use SVMs to discriminate speech from non-speech sounds u... |

33 | Support vector machines for segmental minimum bayes risk decoding of continuous speech
- Venkataramani, Chakrabartty, et al.
(Show Context)
Citation Context ...gests that code-breaking should be done so that the baseline posterior distribution over the confusion pairs is considered in the decoding process. We have developed simple voting procedures for this =-=[95, 94]-=-, that were described in Section 5.6. We now proceed to training specialized decoders for our code-breaking test set. 88s7.5 Training Specialized Decoders The next step is to use the baseline HMMs we ... |

32 | Segmental minimum Bayes-risk decoding for automatic speech recognition
- Goel, Kumar, et al.
- 2004
(Show Context)
Citation Context ...s [72] whose vocabulary is alphabets and digits alone. Thus the letter B and the number 8 will be among the words to be recognized by the system. 3.1 Lattice Cutting For our purposes, lattice cutting =-=[42, 57]-=- is a procedure that segments an input lattice into sub-lattices. These sub-lattices when concatenated together can represent all the paths in the original lattice. More importantly, the sub-lattices ... |

29 |
A Hybrid GMM/SVM approach to speaker identification
- Fine, Navrátil, et al.
- 2001
(Show Context)
Citation Context ...ross-entropy. FDKM techniques were mainly developed to implement speech algorithms in low power VLSI technology. There has also been considerable use of SVMs in speaker verification also. Fine et al. =-=[32, 31]-=- used Fisher Kernels to obtain features for the SVMs. They exploited the fact that when GMM and SVM classifiers with roughly the same level of performance exhibit uncorrelated errors they can be combi... |

29 |
Maximum Mutual Information Estimation of Hidden Markov Models,” Automatic Speech and Speaker Recognition: Advanced Topics, Chinhui
- Normandin
- 1996
(Show Context)
Citation Context ...eworks that besides increasing the training set likelihood under the corresponding models also attempts to lower the likelihood assigned by competing model sequences. Maximum Mutual Information (MMI) =-=[74, 43, 102]-=- is one such criterion which tries to maximize the mutual information between the training word sequences W and the observation sequences O. Formally MMI attempts to estimate parameters as θ ′ = argma... |

28 | Training LVCSR systems on thousands of hours of data
- Evermann, Chan, et al.
- 2005
(Show Context)
Citation Context ... HMM parameter estimation procedure is coupled with that of the SVM’s. This leads to scores that are more discriminative in nature. They report a decrease in errors counts over a RT04 development set =-=[29]-=- but no improvements in error rates. While their framework is very similar to ours, when we apply code-breaking to large vocabulary recognition, we will choose the instances of confusion pairs we want... |

26 | Data-dependent kernel in SVM Classification of speech patterns
- Smith, Gales, et al.
- 2002
(Show Context)
Citation Context ...pt of lowconfidence regions and of training specialized decoders only for specific confusions of the original decoder. M. J. F. Gales and M. Layton [38] extend the framework introduced by Smith et al =-=[91]-=- to large vocabulary continuous speech recognition using the ideas presented by V. Venkataramani et al [95]. Lattices are converted into a sequence of confusion networks, similar in structure to confu... |

25 | Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture
- Schuller, Rigoll, et al.
- 2004
(Show Context)
Citation Context ...o classify an utterance as one of a finite number of classes. SVMs have also been used to guess the emotion of the speaker using features derived from the signal, pitch, energy, and spectral contours =-=[89]-=-. C. Ma and M. A. Rudolph [62] used SVMs for utterance verification. Kernel ideas have also been used in eigen voice adaptation [63]. M. Davy and S. Godsill [20] use SVMs as a novelty detector for aud... |

21 | Speech discrimination based on multiscale spectrotemporal modulations
- Mesgarani, Shamma, et al.
- 2004
(Show Context)
Citation Context ...been used to detect stop consonants in continuous speech [71]. SVMs have been used to classify speech as either adult or child voiced [70] based on acoustic and linguistic scores. N. Mesgarani et al. =-=[64]-=- use SVMs to discriminate speech from non-speech sounds using SVMs trained on auditory features. 61sChapter 6 Validating Code Breaking 62 We have now introduced and described all the various stages an... |

21 | Using the Fisher Kernel Method for Web Audio Classification
- Moreno, Rifkin
- 2000
(Show Context)
Citation Context ...n, i.e., a vector of unigram and bigram stats, as the feature space for training SVMs. Y. Liu et al. [103] account for different costs in misclassification when training SVMs. P. Moreno and R. Rifkin =-=[69]-=- have also used Fisher Scores for web audio classification. P. J. Moreno and P. Ho [67] later used SVMs with Fisher Scores for speaker verification. They developed a Kullback-Leibler Divergence Based ... |

21 | Efficient Lattice Representation and Generation
- Weng, Stolcke, et al.
- 1998
(Show Context)
Citation Context ...These pronunciation models were then used to rescore word level lattices on the 2003 NIST rich text transcription task. The word lattices were restricted to confusion networks using local information =-=[100]-=-. They report WER reductions on training data and on a subset of the development test data, but no statistically significant WER reductions the complete test set. While we also propose using SVMs only... |