Results 1  10
of
17
Support vector machines for speech recognition
 Proceedings of the International Conference on Spoken Language Processing
, 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract

Cited by 83 (2 self)
 Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and overparameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.
Framewise phoneme classification with bidirectional lstm and other neural network architectures
 Neural Networks
, 2005
"... Abstract — In this paper, we apply bidirectional training to a Long Short Term Memory (LSTM) network for the first time. We also present a modified, full gradient version of the LSTM learning algorithm. On the TIMIT speech database, we measure the framewise phoneme classification ability of bidirect ..."
Abstract

Cited by 51 (17 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper, we apply bidirectional training to a Long Short Term Memory (LSTM) network for the first time. We also present a modified, full gradient version of the LSTM learning algorithm. On the TIMIT speech database, we measure the framewise phoneme classification ability of bidirectional and unidirectional variants of both LSTM and conventional Recurrent Neural Networks (RNNs). We find that the LSTM architecture outperforms conventional RNNs and that bidirectional networks outperform unidirectional ones. I.
UNIDIRECTIONAL LONG SHORTTERMMEMORY RECURRENT NEURAL NETWORK WITH RECURRENT OUTPUT LAYER FOR LOWLATENCY SPEECH SYNTHESIS
"... Long shortterm memory recurrent neural networks (LSTMRNNs) have been applied to various speech applications including acoustic modeling for statistical parametric speech synthesis. One of the concerns for applying them to texttospeech applications is its effect on latency. To address this conce ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Long shortterm memory recurrent neural networks (LSTMRNNs) have been applied to various speech applications including acoustic modeling for statistical parametric speech synthesis. One of the concerns for applying them to texttospeech applications is its effect on latency. To address this concern, this paper proposes a lowlatency, streaming speech synthesis architecture using unidirectional LSTMRNNs with a recurrent output layer. The use of unidirectional RNN architecture allows framesynchronous streaming inference of output acoustic features given input linguistic features. The recurrent output layer further encourages smooth transition between acoustic features at consecutive frames. Experimental results in subjective listening tests show that the proposed architecture can synthesize natural sounding speech without requiring utterancelevel batch processing. Index Terms — Statistical parametric speech synthesis; recurrent neural networks; long shortterm memory; lowlatency; 1.
DEEP MIXTURE DENSITY NETWORKS FOR ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
"... Statistical parametric speech synthesis (SPSS) using deep neural networks (DNNs) has shown its potential to produce naturallysounding synthesized speech. However, there are limitations in the current implementation of DNNbased acoustic modeling for speech synthesis, such as the unimodal nature o ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Statistical parametric speech synthesis (SPSS) using deep neural networks (DNNs) has shown its potential to produce naturallysounding synthesized speech. However, there are limitations in the current implementation of DNNbased acoustic modeling for speech synthesis, such as the unimodal nature of its objective function and its lack of ability to predict variances. To address these limitations, this paper investigates the use of a mixture density output layer. It can estimate full probability density functions over realvalued output features conditioned on the corresponding input features. Experimental results in objective and subjective evaluations show that the use of the mixture density output layer improves the prediction accuracy of acoustic features and the naturalness of the synthesized speech. Index Terms — Statistical parametric speech synthesis; hidden Markov models; deep neural networks; mixture density networks; 1.
unknown title
"... The continuous latent variable modelling formalism This chapter gives the theoretical basis for continuous latent variable models. Section 2.1 defines intuitively the concept of latent variable models and gives a brief historical introduction to them. Section 2.2 uses a simple example, inspired by t ..."
Abstract
 Add to MetaCart
The continuous latent variable modelling formalism This chapter gives the theoretical basis for continuous latent variable models. Section 2.1 defines intuitively the concept of latent variable models and gives a brief historical introduction to them. Section 2.2 uses a simple example, inspired by the mechanics of a mobile point, to justify and explain latent variables. Section 2.3 gives a more rigorous definition, which we will use throughout this thesis. Section 2.6 describes the most important specific continuous latent variable models and section 2.7 defines mixtures of continuous latent variable models. The chapter discusses other important topics, including parameter estimation, identifiability, interpretability and marginalisation in high dimensions. Section 2.9 on dimensionality reduction will be the basis for part II of the thesis. Section 2.10 very briefly mentions some applications of continuous latent variable models for dimensionality reduction. Section 2.11 shows a worked example of a simple continuous latent variable model. Section 2.12 give some complementary mathematical results, in particular the derivation of a diagonal noise GTM model and of its EM algorithm. 2.1 Introduction and historical overview of latent variable models Latent variable models are probabilistic models that try to explain a (relatively) highdimensional process in
unknown title
"... The continuous latent variable modelling formalism This chapter gives the theoretical basis for continuous latent variable models. Section 2.1 defines intuitively the concept of latent variable models and gives a brief historical introduction to them. Section 2.2 uses a simple example, inspired by t ..."
Abstract
 Add to MetaCart
The continuous latent variable modelling formalism This chapter gives the theoretical basis for continuous latent variable models. Section 2.1 defines intuitively the concept of latent variable models and gives a brief historical introduction to them. Section 2.2 uses a simple example, inspired by the mechanics of a mobile point, to justify and explain latent variables. Section 2.3 gives a more rigorous definition, which we will use throughout this thesis. Section 2.6 describes the most important specific continuous latent variable models and section 2.7 defines mixtures of continuous latent variable models. The chapter discusses other important topics, including parameter estimation, identifiability, interpretability and marginalisation in high dimensions. Section 2.9 on dimensionality reduction will be the basis for part II of the thesis. Section 2.10 very briefly mentions some applications of continuous latent variable models for dimensionality reduction. Section 2.11 shows a worked example of a simple continuous latent variable model. Section 2.12 give some complementary mathematical results, in particular the derivation of a diagonal noise GTM model and of its EM algorithm. 2.1 Introduction and historical overview of latent variable models Latent variable models are probabilistic models that try to explain a (relatively) highdimensional process in
Chapter 4 Dimensionality reduction
"... This chapter introduces and defines the problem of dimensionality reduction, discusses the topics of the curse of the dimensionality and the intrinsic dimensionality and then surveys nonprobabilistic methods for dimensionality reduction, that is, methods that do not define a probabilistic model for ..."
Abstract
 Add to MetaCart
This chapter introduces and defines the problem of dimensionality reduction, discusses the topics of the curse of the dimensionality and the intrinsic dimensionality and then surveys nonprobabilistic methods for dimensionality reduction, that is, methods that do not define a probabilistic model for the data. These include linear methods (PCA, projection pursuit), nonlinear autoassociators, kernel methods, local dimensionality reduction, principal curves, vector quantisation methods (elastic net, selforganising map) and multidimensional scaling methods. One of these methods (the elastic net) does define a probabilistic model but not a continuous dimensionality reduction mapping. If one is interested in stochastically modelling the dimensionality reduction mapping then the natural choice are latent variable models, discussed in chapter 2. We close the chapter with a summary and with some thoughts on dimensionality reduction with discrete variables. Consider an application in which a system processes data in the form of a collection of realvalued vectors: speech signals, images, etc. Suppose that the system is only effective if the dimension of each individual vector—the number of components of the vector—is not too high, where high depends on the particular application. The problem of dimensionality reduction appears when the data are in fact of a higher dimension
unknown title
, 2001
"... Continuous latent variable models for dimensionality reduction and sequential data reconstruction by ..."
Abstract
 Add to MetaCart
Continuous latent variable models for dimensionality reduction and sequential data reconstruction by
MIXTURE DENSITY NETWORKS, HUMAN ARTICULATORY DATAAND ACOUSTICTOARTICULATORY INVERSION OF
"... A relatively small number of empirical learning models applied to human articulatory data have beendescribed in the literature. These include extended Kalman filtering ([5]), artificial neural networks1 ([14]), selforganising HMMs ([16]) and codebook methods ([7]). However, these efforts have mostl ..."
Abstract
 Add to MetaCart
(Show Context)
A relatively small number of empirical learning models applied to human articulatory data have beendescribed in the literature. These include extended Kalman filtering ([5]), artificial neural networks1 ([14]), selforganising HMMs ([16]) and codebook methods ([7]). However, these efforts have mostlybeen limited to some subsection of full speech, such as a few stopconsonants or vowel transitions.