Results 1  10
of
12
Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods
 ADVANCES IN LARGE MARGIN CLASSIFIERS
, 1999
"... The output of a classifier should be a calibrated posterior probability to enable postprocessing. Standard SVMs do not provide such probabilities. One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. Howev ..."
Abstract

Cited by 699 (0 self)
 Add to MetaCart
The output of a classifier should be a calibrated posterior probability to enable postprocessing. Standard SVMs do not provide such probabilities. One method to create probabilities is to directly train a kernel classifier with a logit link function and a regularized maximum likelihood score. However, training with a maximum likelihood score will produce nonsparse kernel machines. Instead, we train an SVM, then train the parameters of an additional sigmoid function to map the SVM outputs into probabilities. This chapter compares classification error rate and likelihood scores for an SVM plus sigmoid versus a kernel method trained with a regularized likelihood error function. These methods are tested on three dataminingstyle data sets. The SVM+sigmoid yields probabilities of comparable quality to the regularized maximum likelihood kernel method, while still retaining the sparseness of the SVM.
Connectionist Probability Estimation in HMM Speech Recognition
 IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract

Cited by 61 (16 self)
 Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
A tutorial on energybased learning
 Predicting Structured Data
, 2006
"... EnergyBased Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
EnergyBased Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graphtransformer networks, conditional random fields, maximum margin Markov networks, and several manifold learning methods. Probabilistic models must be properly normalized, which sometimes requires evaluating intractable integrals over the space of all possible variable configurations. Since EBMs have no requirement for proper normalization, this problem is naturally circumvented. EBMs can be viewed as a form of nonprobabilistic factor graphs, and they provide considerably more flexibility in the design of architectures and training criteria than probabilistic approaches. 1
Speech Recognition using Neural Networks
, 1995
"... This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modelin ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their stateoftheart performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NNHMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NNHMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic ...
Articulatory Methods for Speech Production and Recognition
, 1996
"... roductionbased knowledge into the recognition framework. By using an explicit timedomain articulatory model of the mechanisms of coarticulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acousticallydri ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
roductionbased knowledge into the recognition framework. By using an explicit timedomain articulatory model of the mechanisms of coarticulation, it is hoped to obtain a more accurate model of contextual effects in the acoustic signal, while using fewer parameters than traditional acousticallydriven approaches. Separate articulatory and acoustic models are provided, and in each case the parameters of the models are automatically optimised over a training data set. A predictive statisticallybased model of coarticulation is described, and found to yield improved articulatory modelling accuracy compared with Xray articulatory traces. Parameterised acoustic vectors are synthesised by a set of artificial neural networks, and the resulting acoustic representations are used to rescore Nbest recognition hypothesis lists produced by an HMMbased recogniser. The system is evaluated on two test databases, one including speakerspecific Xray training data and the other aco
Probability Estimation By FeedForward Networks In Continuous Speech Recognition
 In Proceedings IEEE Workshop on Neural Networks for Signal Processing
, 1991
"... We review the use of feedforward networks as estimators of probability densities in hidden Markov modelling. In this paper we are mostly concerned with radial basis functions (RBF) networks. We note the isomorphism of RBF networks to tied mixture density estimators; additionally we note that RBF ne ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
We review the use of feedforward networks as estimators of probability densities in hidden Markov modelling. In this paper we are mostly concerned with radial basis functions (RBF) networks. We note the isomorphism of RBF networks to tied mixture density estimators; additionally we note that RBF networks are trained to estimate posteriors rather than the likelihoods estimated by tied mixture density estimators. We show how the neural network training should be modified to resolve this mismatch. We also discuss problems with discriminative training, particularly the problem of dealing with unlabelled training data and the mismatch between model and data priors. L&H Speechproducts, Ieper, B8900, Belgium ii INTRODUCTION In continuous speech recognition we wish to estimate P(W W 1 jX T 1 , M), the posterior probability of a word sequence W W 1 = w 1 , ..., wW given the acoustic evidence X T 1 = x 1 , ..., x T and the parameters of the models used Q. This probability canno...
Using asymmetric distributions to improve classifier probabilities: A comparison of new and standard parametric methods
, 2002
"... For many discriminative classifiers, it is desirable to convert an unnormalized confidence score output from the classifier to a normalized probability estimate. Such a method can also be used for creating better estimates from a probabilistic classifier that outputs poor estimates. Typical parametr ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
For many discriminative classifiers, it is desirable to convert an unnormalized confidence score output from the classifier to a normalized probability estimate. Such a method can also be used for creating better estimates from a probabilistic classifier that outputs poor estimates. Typical parametric methods have an underlying assumption that the score distribution for a class is symmetric; we motivate why this assumption is undesirable, especially when the scores are output by a classifier. Two asymmetric families, an asymmetric generalization of a Gaussian and a Laplace distribution, are presented, and a method of fitting them in expected linear time is described. Finally, an experimental analysis of parametric fits to the outputs of two text classifiers, naïve Bayes (which is known to emit poor probabilities) and a linear SVM, is conducted. The analysis shows that one of these asymmetric families is theoretically attractive (introducing few new parameters while increasing flexibility), computationally efficient, and empirically preferable.
Large vocabulary continuous speech recognition using linguistic features and constraints
, 2005
"... Automatic speech recognition (ASR) is a process of applying constraints, as encoded in the computer system (the recognizer), to the speech signal until ambiguity is satisfactorily resolved to the extent that only one sequence of words is hypothesized. Such constraints fall naturally into two categor ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Automatic speech recognition (ASR) is a process of applying constraints, as encoded in the computer system (the recognizer), to the speech signal until ambiguity is satisfactorily resolved to the extent that only one sequence of words is hypothesized. Such constraints fall naturally into two categories. One deals with the the ordering of words (syntax) and organization of their meanings (semantics, pragmatics, etc). The other governs how speech signals are related to words, a process often termed as “lexical access”. This thesis studies the HuttenlocherZue lexical access model, its implementation in a modern probabilistic speech recognition framework and its application to continuous speech from an open vocabulary. The HuttenlocherZue model advocates a twopass lexical access paradigm. In the first pass, the lexicon is effectively pruned using broad linguistic constraints. In the original HuttenlocherZue model, the authors had proposed six linguistic features motivated by the manner of pronunciation.
Speech Recognition
, 1994
"... Contents 1 Introduction 1 2 The Human Speech 3 2.1 Phonemes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2.1.1 Other Speech Units : : : : : : : : : : : : : : : : : : : : : 4 2.2 Kinds of Phonemes : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2.1 Consonants : : : : : : : ..."
Abstract
 Add to MetaCart
Contents 1 Introduction 1 2 The Human Speech 3 2.1 Phonemes : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 3 2.1.1 Other Speech Units : : : : : : : : : : : : : : : : : : : : : 4 2.2 Kinds of Phonemes : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2.1 Consonants : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.2.1.1 Voicing : : : : : : : : : : : : : : : : : : : : : : : 6 2.2.1.2 Place of Articulation : : : : : : : : : : : : : : : 6 2.2.1.3 Manner of Articulation : : : : : : : : : : : : : : 7 2.2.2 Vowels : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2.2.3 Diphthongs : : : : : : : : : : : : : : : : : : : : : : : : : : 8 2.3 Formants : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 9 3 A Signal Processing View of the Human Speech 10 3.1 Def
DYNAMO: An Algorithm for Dynamic Acoustic Modeling
, 1998
"... This paper summarizes part of SRI's effort to improve acoustic modeling in the context of the Large Vocabulary Continuous Speech Recognition (LVCSR) project. It concentrates on two problems that are believed to contribute to the large error rates observed with LVCSR databases: (1) the lack of discri ..."
Abstract
 Add to MetaCart
This paper summarizes part of SRI's effort to improve acoustic modeling in the context of the Large Vocabulary Continuous Speech Recognition (LVCSR) project. It concentrates on two problems that are believed to contribute to the large error rates observed with LVCSR databases: (1) the lack of discriminative power of the speech models in the acoustic space, and (2) the discrepancy between the criterion used to train the models (typically framelevel maximum likelihood) and the task expected from the models (wordlevel recognition) . We address the first issue by searching for features that help in narrowing the model distributions, and by proposing a neuralnetworkbased architecture to combine these features. The neural networks (NNET) are used in association with a set of large Gaussian mixture models (GMM) whose mixture weights are dynamically estimated by the neural networks, for each frame of incoming data. We call the resulting algorithm DYNAMO, for dynamic acoustic modeling. To a...