Results 1  10
of
14
Support vector machines for speech recognition
 Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract

Cited by 76 (2 self)
 Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and overparameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
unknown title
"... The continuous latent variable modelling formalism This chapter gives the theoretical basis for continuous latent variable models. Section 2.1 defines intuitively the concept of latent variable models and gives a brief historical introduction to them. Section 2.2 uses a simple example, inspired by t ..."
Abstract
 Add to MetaCart
The continuous latent variable modelling formalism This chapter gives the theoretical basis for continuous latent variable models. Section 2.1 defines intuitively the concept of latent variable models and gives a brief historical introduction to them. Section 2.2 uses a simple example, inspired by the mechanics of a mobile point, to justify and explain latent variables. Section 2.3 gives a more rigorous definition, which we will use throughout this thesis. Section 2.6 describes the most important specific continuous latent variable models and section 2.7 defines mixtures of continuous latent variable models. The chapter discusses other important topics, including parameter estimation, identifiability, interpretability and marginalisation in high dimensions. Section 2.9 on dimensionality reduction will be the basis for part II of the thesis. Section 2.10 very briefly mentions some applications of continuous latent variable models for dimensionality reduction. Section 2.11 shows a worked example of a simple continuous latent variable model. Section 2.12 give some complementary mathematical results, in particular the derivation of a diagonal noise GTM model and of its EM algorithm. 2.1 Introduction and historical overview of latent variable models Latent variable models are probabilistic models that try to explain a (relatively) highdimensional process in
unknown title
"... The continuous latent variable modelling formalism This chapter gives the theoretical basis for continuous latent variable models. Section 2.1 defines intuitively the concept of latent variable models and gives a brief historical introduction to them. Section 2.2 uses a simple example, inspired by t ..."
Abstract
 Add to MetaCart
The continuous latent variable modelling formalism This chapter gives the theoretical basis for continuous latent variable models. Section 2.1 defines intuitively the concept of latent variable models and gives a brief historical introduction to them. Section 2.2 uses a simple example, inspired by the mechanics of a mobile point, to justify and explain latent variables. Section 2.3 gives a more rigorous definition, which we will use throughout this thesis. Section 2.6 describes the most important specific continuous latent variable models and section 2.7 defines mixtures of continuous latent variable models. The chapter discusses other important topics, including parameter estimation, identifiability, interpretability and marginalisation in high dimensions. Section 2.9 on dimensionality reduction will be the basis for part II of the thesis. Section 2.10 very briefly mentions some applications of continuous latent variable models for dimensionality reduction. Section 2.11 shows a worked example of a simple continuous latent variable model. Section 2.12 give some complementary mathematical results, in particular the derivation of a diagonal noise GTM model and of its EM algorithm. 2.1 Introduction and historical overview of latent variable models Latent variable models are probabilistic models that try to explain a (relatively) highdimensional process in
Chapter 4 Dimensionality reduction
"... This chapter introduces and defines the problem of dimensionality reduction, discusses the topics of the curse of the dimensionality and the intrinsic dimensionality and then surveys nonprobabilistic methods for dimensionality reduction, that is, methods that do not define a probabilistic model for ..."
Abstract
 Add to MetaCart
This chapter introduces and defines the problem of dimensionality reduction, discusses the topics of the curse of the dimensionality and the intrinsic dimensionality and then surveys nonprobabilistic methods for dimensionality reduction, that is, methods that do not define a probabilistic model for the data. These include linear methods (PCA, projection pursuit), nonlinear autoassociators, kernel methods, local dimensionality reduction, principal curves, vector quantisation methods (elastic net, selforganising map) and multidimensional scaling methods. One of these methods (the elastic net) does define a probabilistic model but not a continuous dimensionality reduction mapping. If one is interested in stochastically modelling the dimensionality reduction mapping then the natural choice are latent variable models, discussed in chapter 2. We close the chapter with a summary and with some thoughts on dimensionality reduction with discrete variables. Consider an application in which a system processes data in the form of a collection of realvalued vectors: speech signals, images, etc. Suppose that the system is only effective if the dimension of each individual vector—the number of components of the vector—is not too high, where high depends on the particular application. The problem of dimensionality reduction appears when the data are in fact of a higher dimension
unknown title
, 2001
"... Continuous latent variable models for dimensionality reduction and sequential data reconstruction by ..."
Abstract
 Add to MetaCart
Continuous latent variable models for dimensionality reduction and sequential data reconstruction by
Evaluation of a stack decoder on a Japanese Newspaper Dictation Task
, 1996
"... This paper describes some of the implementation details of the "Nozomi" stack decoder for LVCSR. The decoder was tested on a Japanese Newspaper Dictation Task using a 5000 word vocabulary. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora an ..."
Abstract
 Add to MetaCart
This paper describes some of the implementation details of the "Nozomi" stack decoder for LVCSR. The decoder was tested on a Japanese Newspaper Dictation Task using a 5000 word vocabulary. Using continuous density acoustic models with 2000 and 3000 states trained on the JNAS/ASJ corpora and a 3gram LM trained on the RWC text corpus, both models provided by the IPA group [9], it was possible to reach more than 95% word accuracy on the standard test set. With computationally cheap acoustic models we could achieve around 89% accuracy in nearly realtime on a 300 Mhz Pentium II. Using a diskbased LM the memory usage could be optimized to 4 MB in total. key words ffl speech recognition ffl Japanese newspaper dictation ffl onepass stack decoder 1 INTRODUCTION LVCSR is currently limited to workstations and fast highend laptops with a lot of memory. To make LVCSR work on PDAs, cellular phones, userinterfaces, wrist watches etc., it is necessary find time and memoryefficient algorithms...
MIXTURE DENSITY NETWORKS, HUMAN ARTICULATORY DATA AND ACOUSTICTOARTICULATORY INVERSION OF CONTINUOUS SPEECH
"... Researchers have been investigating methods for retrieving the articulation underlying an acoustic speech signal for more than three decades. A successful method would find many applications, for example: low bitrate speech coding, helping individuals with speech and hearing disorders by providing ..."
Abstract
 Add to MetaCart
Researchers have been investigating methods for retrieving the articulation underlying an acoustic speech signal for more than three decades. A successful method would find many applications, for example: low bitrate speech coding, helping individuals with speech and hearing disorders by providing visual feedback during speech training, and the possibility of improved automatic speech
unknown title
, 2001
"... Continuous latent variable models for dimensionality reduction and sequential data reconstruction by ..."
Abstract
 Add to MetaCart
Continuous latent variable models for dimensionality reduction and sequential data reconstruction by
A Survey of Discriminative and Connectionist Methods for Speech Processing
, 2002
"... Discriminative speech processing techniques attempt to compute the maximum a posterior probability of some speech event, such as a particular phoneme being spoken, given the observed data. Nondiscriminative techniques compute the likelihood of the observed data assuming an event. Nondiscriminative ..."
Abstract
 Add to MetaCart
Discriminative speech processing techniques attempt to compute the maximum a posterior probability of some speech event, such as a particular phoneme being spoken, given the observed data. Nondiscriminative techniques compute the likelihood of the observed data assuming an event. Nondiscriminative methods such as simple HMMs (hidden Markov models) achieved success despite their lack of discriminative modelling. This survey will look at enhancements to the HMM model which have improved their discrimination ability and hence their overall performance. This survey also reviews alternative discriminative methods, namely connectionist methods such as ANNs (arti cial neural networks). We will also draw comparisons between discriminative HMMs and connectionist models, showing that connectionist models can be viewed as a generalisation of discriminative HMMs. 1