Results 1 - 10
of
12
Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods
- ADVANCES IN LARGE MARGIN CLASSIFIERS
, 1999
"... The output of a classifier should be a calibrated posterior probability to enable post-processing. Standard SVMs do not provide such probabilities. One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. Howev ..."
Abstract
-
Cited by 503 (0 self)
- Add to MetaCart
The output of a classifier should be a calibrated posterior probability to enable post-processing. Standard SVMs do not provide such probabilities. One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. However, training with a maximum likelihood score will produce non-sparse kernel machines. Instead, we train an SVM, then train the parameters of an additional sigmoid function to map the SVM outputs into probabilities. This chapter compares classification error rate and likelihood scores for an SVM plus sigmoid versus a kernel method trained with a regularized likelihood error function. These methods are tested on three data-mining-style data sets. The SVM+sigmoid yields probabilities of comparable quality to the regularized maximum likelihood kernel method, while still retaining the sparseness of the SVM.
Connectionist Probability Estimation in HMM Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract
-
Cited by 45 (9 self)
- Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
A tutorial on energy-based learning
- Predicting Structured Data
, 2006
"... Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graph-transformer networks, conditional random fields, maximum margin Markov networks, and several manifold learning methods. Probabilistic models must be properly normalized, which sometimes requires evaluating intractable integrals over the space of all possible variable configurations. Since EBMs have no requirement for proper normalization, this problem is naturally circumvented. EBMs can be viewed as a form of non-probabilistic factor graphs, and they provide considerably more flexibility in the design of architectures and training criteria than probabilistic approaches. 1
Speech Recognition using Neural Networks
, 1995
"... This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modelin ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NN-HMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NN-HMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic ...
Articulatory Methods for Speech Production and Recognition
, 1996
"... roduction-based knowledge into the recognition framework. By using an explicit time-domain articulatory model of the mechanisms of co-articulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acoustically-dri ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
roduction-based knowledge into the recognition framework. By using an explicit time-domain articulatory model of the mechanisms of co-articulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acoustically-driven approaches. Separate articulatory and acoustic models are provided, and in each case the parameters of the models are automatically optimised over a training data set. A predictive statistically-based model of co-articulation is described, and found to yield improved articulatory modelling accuracy compared with X-ray articulatory traces. Parameterised acoustic vectors are synthesised by a set of artificial neural networks, and the resulting acoustic representations are used to re-score N-best recognition hypothesis lists produced by an HMM-based recogniser. The system is evaluated on two test databases, one including speaker-specific X-ray training data and the other aco
Probability Estimation By Feed-Forward Networks In Continuous Speech Recognition
- In Proceedings IEEE Workshop on Neural Networks for Signal Processing
, 1991
"... We review the use of feed-forward networks as estimators of probability densities in hidden Markov modelling. In this paper we are mostly concerned with radial basis functions (RBF) networks. We note the isomorphism of RBF networks to tied mixture density estimators; additionally we note that RBF ne ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
We review the use of feed-forward networks as estimators of probability densities in hidden Markov modelling. In this paper we are mostly concerned with radial basis functions (RBF) networks. We note the isomorphism of RBF networks to tied mixture density estimators; additionally we note that RBF networks are trained to estimate posteriors rather than the likelihoods estimated by tied mixture density estimators. We show how the neural network training should be modified to resolve this mismatch. We also discuss problems with discriminative training, particularly the problem of dealing with unlabelled training data and the mismatch between model and data priors. L&H Speechproducts, Ieper, B-8900, Belgium ii INTRODUCTION In continuous speech recognition we wish to estimate P(W W 1 jX T 1 , M), the posterior probability of a word sequence W W 1 = w 1 , ..., wW given the acoustic evidence X T 1 = x 1 , ..., x T and the parameters of the models used Q. This probability canno...
Using asymmetric distributions to improve classifier probabilities: A comparison of new and standard parametric methods
, 2002
"... For many discriminative classifiers, it is desirable to convert an unnormalized confidence score output from the classifier to a normalized probability estimate. Such a method can also be used for creating better estimates from a probabilistic classifier that outputs poor estimates. Typical parametr ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
For many discriminative classifiers, it is desirable to convert an unnormalized confidence score output from the classifier to a normalized probability estimate. Such a method can also be used for creating better estimates from a probabilistic classifier that outputs poor estimates. Typical parametric methods have an underlying assumption that the score distribution for a class is symmetric; we motivate why this assumption is undesirable, especially when the scores are output by a classifier. Two asymmetric families, an asymmetric generalization of a Gaussian and a Laplace distribution, are presented, and a method of fitting them in expected linear time is described. Finally, an experimental analysis of parametric fits to the outputs of two text classifiers, naïve Bayes (which is known to emit poor probabilities) and a linear SVM, is conducted. The analysis shows that one of these asymmetric families is theoretically attractive (introducing few new parameters while increasing flexibility), computationally efficient, and empirically preferable.
Large vocabulary continuous speech recognition using linguistic features and constraints
, 2005
"... Automatic speech recognition (ASR) is a process of applying constraints, as encoded in the computer system (the recognizer), to the speech signal until ambiguity is satisfactorily resolved to the extent that only one sequence of words is hypothesized. Such constraints fall naturally into two categor ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Automatic speech recognition (ASR) is a process of applying constraints, as encoded in the computer system (the recognizer), to the speech signal until ambiguity is satisfactorily resolved to the extent that only one sequence of words is hypothesized. Such constraints fall naturally into two categories. One deals with the the ordering of words (syntax) and organization of their meanings (semantics, pragmatics, etc). The other governs how speech signals are related to words, a process often termed as “lexical access”. This thesis studies the Huttenlocher-Zue lexical access model, its implementation in a modern probabilistic speech recognition framework and its application to continuous speech from an open vocabulary. The Huttenlocher-Zue model advocates a two-pass lexical access paradigm. In the first pass, the lexicon is effectively pruned using broad linguistic constraints. In the original Huttenlocher-Zue model, the authors had proposed six linguistic features motivated by the manner of pronunciation.
Speech Recognition
, 1994
"... Contents 1 Introduction 1 2 The Human Speech 3 2.1 Phonemes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2.1.1 Other Speech Units : : : : : : : : : : : : : : : : : : : : : 4 2.2 Kinds of Phonemes : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2.1 Consonants : : : : : : : ..."
Abstract
- Add to MetaCart
Contents 1 Introduction 1 2 The Human Speech 3 2.1 Phonemes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2.1.1 Other Speech Units : : : : : : : : : : : : : : : : : : : : : 4 2.2 Kinds of Phonemes : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2.1 Consonants : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2.1.1 Voicing : : : : : : : : : : : : : : : : : : : : : : : 6 2.2.1.2 Place of Articulation : : : : : : : : : : : : : : : 6 2.2.1.3 Manner of Articulation : : : : : : : : : : : : : : 7 2.2.2 Vowels : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2.2.3 Diphthongs : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2.3 Formants : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 3 A Signal Processing View of the Human Speech 10 3.1 Def
DYNAMO: An Algorithm for Dynamic Acoustic Modeling
, 1998
"... This paper summarizes part of SRI's effort to improve acoustic modeling in the context of the Large Vocabulary Continuous Speech Recognition (LVCSR) project. It concentrates on two problems that are believed to contribute to the large error rates observed with LVCSR databases: (1) the lack of discri ..."
Abstract
- Add to MetaCart
This paper summarizes part of SRI's effort to improve acoustic modeling in the context of the Large Vocabulary Continuous Speech Recognition (LVCSR) project. It concentrates on two problems that are believed to contribute to the large error rates observed with LVCSR databases: (1) the lack of discriminative power of the speech models in the acoustic space, and (2) the discrepancy between the criterion used to train the models (typically frame-level maximum likelihood) and the task expected from the models (word-level recognition) . We address the first issue by searching for features that help in narrowing the model distributions, and by proposing a neural-networkbased architecture to combine these features. The neural networks (NNET) are used in association with a set of large Gaussian mixture models (GMM) whose mixture weights are dynamically estimated by the neural networks, for each frame of incoming data. We call the resulting algorithm DYNAMO, for dynamic acoustic modeling. To a...

