Results 1  10
of
20
Dynamical modeling with kernels for nonlinear time series prediction
"... We consider the question of predicting nonlinear time series. Kernel Dynamical Modeling, a new method based on kernels, is proposed as an extension to linear dynamical models. The kernel trick is used twice: first, to learn the parameter of the model, and second, to compute preimages of the time ser ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
We consider the question of predicting nonlinear time series. Kernel Dynamical Modeling, a new method based on kernels, is proposed as an extension to linear dynamical models. The kernel trick is used twice: first, to learn the parameter of the model, and second, to compute preimages of the time series predicted in the feature space by means of Support Vector Regression. Our model shows strong connection with the classic Kalman Filter model, with the kernel feature space as hidden state space. Kernel Dynamical Modeling is tested against two benchmark time series and achieves high quality predictions. 1
Linear Gaussian models for speech recognition
 CAMBRIDGE UNIVERSITY
, 2004
"... Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete stat ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete state that generated them is not a good assumption for speech recognition. State space models may be used to address some shortcomings of this assumption. State space models are based on a continuous state vector evolving through time according to a state evo
Factor analysed hidden Markov models for Speech Recognition
 COMPUTER SPEECH AND LANGUAGE
, 2004
"... Recently various techniques to improve the correlation model of feature vector elements in speech recognition systems have been proposed. Such techniques include semitied covariance HMMs and systems based on factor analysis. All these schemes have been shown to improve the speech recognition perfor ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
Recently various techniques to improve the correlation model of feature vector elements in speech recognition systems have been proposed. Such techniques include semitied covariance HMMs and systems based on factor analysis. All these schemes have been shown to improve the speech recognition performance without dramatically increasing the number of model parameters compared to standard diagonal covariance Gaussian mixture HMMs. This paper introduces a general form of acoustic model, the factor analysed HMM. A variety of configurations of this model and parameter sharing schemes, some of which correspond to standard systems, were examined. An EM algorithm for the parameter optimisation is presented along with a number of methods to increase the e#ciency of training. The performance of FAHMMs on medium to large vocabulary continuous speech recognition tasks was investigated. The experiments show that without elaborate complexity control an equivalent or better performance compared to a standard diagonal covariance Gaussian mixture HMM system can be achieved with considerably fewer parameters.
Switching Linear Dynamical Systems For Speech Recognition
, 2003
"... This paper describes the application of RaoBlackwellised Gibbs sampling (RBGS) to speech recognition using switching linear dynamical systems (SLDSs) as the acoustic model. ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
This paper describes the application of RaoBlackwellised Gibbs sampling (RBGS) to speech recognition using switching linear dynamical systems (SLDSs) as the acoustic model.
Product of Gaussians for speech recognition
 Computer Speech & Language
, 2003
"... 1 Introduction Mixture of Gaussians (MoG) are commonly used as the state representation in hidden Markov model (HMM) based speech recognition. These Gaussian mixture models are easy to train using expectation maximisation (EM) techniques [4] and are able to approximate any distribution given a suffi ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
1 Introduction Mixture of Gaussians (MoG) are commonly used as the state representation in hidden Markov model (HMM) based speech recognition. These Gaussian mixture models are easy to train using expectation maximisation (EM) techniques [4] and are able to approximate any distribution given a sufficient number of components. However, only a limited number of parameters can be effectively trained given a finite quantity of training data. This limitation restricts the ability of MoG systems to model highly complex distributions. A range of distributed representations have been developed to overcome this problem. These distributed representations may be split into two basic forms. The first assumes that the sources are asynchronous. The second assumes that the sources are synchronous. o o ot1 t t+1 q q qt1 t t+1
Implicit Pronunciation Modelling in ASR
, 2002
"... Modelling of pronunciation variability is an important part of the acoustic model of a speech recognition system. Good pronunciation models contribute to the robustness and portability of a speech recogniser. Usually pronunciation modelling is associated with the recognition lexicon which allows a d ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Modelling of pronunciation variability is an important part of the acoustic model of a speech recognition system. Good pronunciation models contribute to the robustness and portability of a speech recogniser. Usually pronunciation modelling is associated with the recognition lexicon which allows a direct control of HMM selection. However, in stateoftheart systems the use of clustering techniques has considerable crosseffects for the dictionary design. Most large vocabulary speech recognition systems make use of a dictionary with multiple possible pronunciation variants per word. In this paper a method for a consistent reduction of the number of pronunciation variants to one pronunciation per word is described. Using the single pronunciation dictionaries similar or better word error rate performance is achieved both on Wall Street Journal and Switchboard data.
Nonlinear Time Series Filtering, Smoothing and Learning using the Kernel Kalman Filter
"... In this paper, we propose a new model, the Kernel Kalman Filter, to perform various nonlinear time series processing. This model is based on the use of Mercer kernel functions in the framework of the Kalman Filter or Linear Dynamical Systems. Thanks to the kernel trick, all the equations involved in ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
In this paper, we propose a new model, the Kernel Kalman Filter, to perform various nonlinear time series processing. This model is based on the use of Mercer kernel functions in the framework of the Kalman Filter or Linear Dynamical Systems. Thanks to the kernel trick, all the equations involved in our model to perform filtering, smoothing and learning tasks, only require matrix algebra calculus whilst providing the ability to model complex time series. In particular, it is possible to learn dynamics from some nonlinear noisy time series implementing an exact EM procedure. When predictions in the original input space are needed, an efficient and original preimage learning strategy is proposed. 1.
Transformation Streams and the HMM Error Model
 Computer Speech and Language
, 2001
"... The most popular model used in automatic speech recognition is the hidden Markov model (HMM). Though good performance has been obtained with such models there are well known limitations for its ability to model speech. A variety of modications to the standard HMM topology have been proposed to handl ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The most popular model used in automatic speech recognition is the hidden Markov model (HMM). Though good performance has been obtained with such models there are well known limitations for its ability to model speech. A variety of modications to the standard HMM topology have been proposed to handle these problems. One such scheme is the factorial HMM. This paper introduces a new form of factorial HMM which makes use of transformation streams. This new scheme is a generalisation of the standard factorial HMM and other related schemes in speech processing. A particular form of this model, the HMM error model (HEM) is described in detail. The HEM is evaluated on two standard large vocabulary speaker independent speech recognition tasks. On both tasks signicant reductions in word error rate are obtained over standard HMMbased systems. 2 1
KALMAN FILTER BASED SPEECH SYNTHESIS †
"... Preliminary results are reported from a very simple speechsynthesis system based on clustereddiphone Kalman Filter based modeling of linespectral frequency based features. Parameters were estimated using maximumlikelihood EM training, with a constraint enforced that prevented eigenvalue magnitud ..."
Abstract
 Add to MetaCart
Preliminary results are reported from a very simple speechsynthesis system based on clustereddiphone Kalman Filter based modeling of linespectral frequency based features. Parameters were estimated using maximumlikelihood EM training, with a constraint enforced that prevented eigenvalue magnitudes in the transition matrix from exceeding 1. Frames of training data were assigned diphone unit labels by forced alignment with an HMM recognition system. The HMM cluster tree was also used for Kalman Filter unit cluster assignments. The result is a simple synthesis system that has few parameters, synthesizes intelligible speech without audible discontinuities, and that can be adapted using MLLR techniques to support synthesis of a broad panoply of speakers from a single base model with small amounts of training data. The result is interesting for embedded synthesis applications. Index Terms — Speech synthesis, Kalman filtering 1.
Contributions to The Estimation of MixedState Conditionally Heteroscedastic Latent Factor Models: A Comparative Study
"... MixedState conditionally heteroscedastic latent factor models attempt to describe a complex nonlinear dynamic system with a succession of linear latent factor models indexed by a switching variable. Unfortunately, despite the framework’s simplicity exact state and parameter estimation are still int ..."
Abstract
 Add to MetaCart
MixedState conditionally heteroscedastic latent factor models attempt to describe a complex nonlinear dynamic system with a succession of linear latent factor models indexed by a switching variable. Unfortunately, despite the framework’s simplicity exact state and parameter estimation are still intractable because of the interdependency across the latent factor volatility processes. Recently, a broad class of learning and inference algorithms for time series models have been successfully cast in the framework of dynamic Bayesian networks (DBN). This paper describes a novel DBNbased switching conditionally heteroscedastic latent factor model. The key methodological contribution of this paper is the novel use of the Generalized PseudoBayesian method GPB2, a structured variational learning approach and an approximated version of the Viterbi algorithm in conjunction with the EM algorithm for overcoming the intractability of exact inference in mixedstate latent factor model. The conditional EM algorithm that we have developed for the maximum likelihood estimation, is based on an extended switching Kalman filter approach which yields inferences about the unobservable path of the common factors and their variances, and the latent variable of the state process. Extensive Monte Carlo simulations show promising results for tracking, interpolation, synthesis, and classification using learned models.