Results 1 - 10
of
25
Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds
- Journal of Machine Learning Research
, 2003
"... The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. ..."
Abstract
-
Cited by 196 (8 self)
- Add to MetaCart
The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation.
Maximum Likelihood Modeling With Gaussian Distributions For Classification
- Proceedings of ICASSP
, 1998
"... Maximum Likelihood (ML) modeling of multiclass data for classication often suers from the following problems: a) data insuciency implying overtrained or unreliable models b) large storage requirement c) large computational requirement and/or d) ML is not discriminating between classes. Sharing param ..."
Abstract
-
Cited by 81 (26 self)
- Add to MetaCart
Maximum Likelihood (ML) modeling of multiclass data for classication often suers from the following problems: a) data insuciency implying overtrained or unreliable models b) large storage requirement c) large computational requirement and/or d) ML is not discriminating between classes. Sharing parameters across classes (or constraining the parameters) clearly tends to alleviate the rst three problems. It this paper we show that in some cases it can also lead to better discrimination (as evidenced by reduced misclassication error). The parameters considered are the means and variances of the gaussians and linear transformations of the feature space (or equivalently the gaussian means). Some constraints on the parameters are shown to lead to Linear Discrimination Analysis (a well-known result) while others are shown to lead to optimal feature spaces (a relatively new result) . Applications of some of these ideas to the speech recognition problem are also given. 1.
Modeling Inverse Covariance Matrices by Basis Expansion
, 2003
"... This paper proposes a new covariance modeling technique for Gaussian Mixture Models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis i.e., j = P j = k , 2 R; a k 2 R . A generalized EM algorithm is proposed to obtain maximum likelihood paramete ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
This paper proposes a new covariance modeling technique for Gaussian Mixture Models. Specifically the inverse covariance (precision) matrix of each Gaussian is expanded in a rank-1 basis i.e., j = P j = k , 2 R; a k 2 R . A generalized EM algorithm is proposed to obtain maximum likelihood parameter estimates for the basis set fa k a k=1 and the expansion coefficients f g. This model, called the Extended Maximum Likelihood Linear Transform (EMLLT) model, is extremely flexible: by varying the number of basis elements from D = d to D = d(d + 1)=2 one gradually moves from a Maximum Likelihood Linear Transform (MLLT) model to a full-covariance model. Experimental results on two speech recognition tasks show that the EMLLT model can give relative gains of up to 35% in the word error rate over a standard diagonal covariance model, 30% over a standard MLLT model.
Generalised linear Gaussian models
, 2001
"... This paper addresses the time-series modelling of high dimensional data. Currently, the hidden Markov model (HMM) is the most popular and successful model especially in speech recognition. However, there are well known shortcomings in HMMs particularly in the modelling of the correlation between suc ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
This paper addresses the time-series modelling of high dimensional data. Currently, the hidden Markov model (HMM) is the most popular and successful model especially in speech recognition. However, there are well known shortcomings in HMMs particularly in the modelling of the correlation between successive observation vectors; that is, inter-frame correlation. Standard diagonal covariance matrix HMMs also lack the modelling of the spatial correlation in the feature vectors; that is, intra-frame correlation. Several other time-series models have been proposed recently especially in the segment model framework to address the inter-frame correlation problem such as Gauss-Markov and dynamical system segment models. The lack of intra-frame correlation has been compensated for with transform schemes such as semi-tied full covariance matrices (STC). All these models can be regarded as belonging to the broad class of generalised linear Gaussian models. Linear Gaussian models (LGM) are popular as many forms may be trained efficiently using the expectation maximisation algorithm. In this paper, several LGMs and generalised LGMs are reviewed. The models can be roughly categorised into four combinations according to two different state evolution and two different observation processes. The state evolution process can be based on a discrete finite state machine such as in the HMMs or a linear first-order Gauss-Markov process such as in the traditional linear dynamical systems. The observation process can be represented as a factor analysis model or a linear discriminant analysis model. General HMMs and schemes proposed to improve their performance such as STC can be regarded as special cases in this framework.
Maximum Likelihood Multiple Projection Schemes For Hidden Markov Models
- IEEE Transactions on Speech and Audio Processing
, 2000
"... The rst stage in many pattern recognition tasks is to generate a good set of features from the observed data. In many complex pattern recognition tasks the choice of a good feature space in which to model the data varies depending on the signal content, for example phone dependent subspaces in speec ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
The rst stage in many pattern recognition tasks is to generate a good set of features from the observed data. In many complex pattern recognition tasks the choice of a good feature space in which to model the data varies depending on the signal content, for example phone dependent subspaces in speech recognition. In these cases multiple feature subspaces should be used. Handling multiple subspaces whilst still maintaining meaningful likelihood comparisons between classes is a complex problem. This paper deals with this problem by viewing multiple subspace projections within a maximum likelihood framework. Two new multiple projection schemes are described. These new schemes are evaluated on a large vocabulary speech recognition task in terms of performance, speed of likelihood calculation and number of model parameters. 2 1
Factor Analysis Invariant To Linear Transformations Of Data
- In Proceedings International Conference on Speech and Language Processing
, 1998
"... Modeling data with Gaussian distributions is an important statistical problem. To obtain robust models one imposes constraints the means and covariances of these distributions [6, 4, 10, 8]. Constrained ML modeling implies the existence of optimal feature spaces where the constraints are more valid ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Modeling data with Gaussian distributions is an important statistical problem. To obtain robust models one imposes constraints the means and covariances of these distributions [6, 4, 10, 8]. Constrained ML modeling implies the existence of optimal feature spaces where the constraints are more valid [2, 3]. This paper introduces one such constrained ML modeling technique called factor analysis invariant to linear transformations (FACILT) which is essentially factor analysis in optimal feature spaces. FACILT is a generalization of several existing methods for modeling covariances. This paper presents an EM algorithm for FACILT modeling.
Model selection in acoustic modeling
- IN PROC. EUROSPEECH
, 1999
"... Recently several classes of models have been suggested for use in continuous density HMMs for speech recognition. This paper proposes to choose both the model type and model size (number of parameters) by optimizing the Bayesian information criterion. Specifically we apply this to Gaussian mixture d ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Recently several classes of models have been suggested for use in continuous density HMMs for speech recognition. This paper proposes to choose both the model type and model size (number of parameters) by optimizing the Bayesian information criterion. Specifically we apply this to Gaussian mixture density estimation to determine both the number of Gaussians and the covariance structure of each Gaussian, and decision tree clustering of HMM states. A numerical algorithm similar to the EM algorithm for mixture density estimation is proposed for optimizing BIC.
Factor analysed hidden Markov models for Speech Recognition
- COMPUTER SPEECH AND LANGUAGE
, 2004
"... Recently various techniques to improve the correlation model of feature vector elements in speech recognition systems have been proposed. Such techniques include semi-tied covariance HMMs and systems based on factor analysis. All these schemes have been shown to improve the speech recognition perfor ..."
Abstract
-
Cited by 12 (6 self)
- Add to MetaCart
Recently various techniques to improve the correlation model of feature vector elements in speech recognition systems have been proposed. Such techniques include semi-tied covariance HMMs and systems based on factor analysis. All these schemes have been shown to improve the speech recognition performance without dramatically increasing the number of model parameters compared to standard diagonal covariance Gaussian mixture HMMs. This paper introduces a general form of acoustic model, the factor analysed HMM. A variety of configurations of this model and parameter sharing schemes, some of which correspond to standard systems, were examined. An EM algorithm for the parameter optimisation is presented along with a number of methods to increase the e#ciency of training. The performance of FAHMMs on medium to large vocabulary continuous speech recognition tasks was investigated. The experiments show that without elaborate complexity control an equivalent or better performance compared to a standard diagonal covariance Gaussian mixture HMM system can be achieved with considerably fewer parameters.
Linear Gaussian models for speech recognition
- CAMBRIDGE UNIVERSITY
, 2004
"... Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete stat ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Currently the most popular acoustic model for speech recognition is the hidden Markov model (HMM). However, HMMs are based on a series of assumptions some of which are known to be poor. In particular, the assumption that successive speech frames are conditionally independent given the discrete state that generated them is not a good assumption for speech recognition. State space models may be used to address some shortcomings of this assumption. State space models are based on a continuous state vector evolving through time according to a state evo-
Learning high dimensional correspondences from low dimensional manifolds
- In: Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining
, 2003
"... Many different high dimensional data sets are characterized by the same underlying modes of variability. When these modes of variability are continuous and few in number, they can be viewed as parameterizing a low dimensional manifold. The manifold provides a compact shared representation of the dat ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Many different high dimensional data sets are characterized by the same underlying modes of variability. When these modes of variability are continuous and few in number, they can be viewed as parameterizing a low dimensional manifold. The manifold provides a compact shared representation of the data, suggesting correspondences between the high dimensional examples from different data sets. These correspondences, though naturally induced by the underlying manifold, are difficult to learn using traditional methods in supervised learning. In this paper, we generalize three methods in unsupervised learning—principal components analysis, factor analysis, and locally linear embedding— to discover subspaces and manifolds that provide common low dimensional representations of different high dimensional data sets. We use the shared representations discovered by these algorithms to put high dimensional examples from different data sets into correspondence. Finally, we show that a notion of “self-correspondence” between examples in the same data set can be used to improve the performance of these algorithms on small data sets. The algorithms are demonstrated on images and text. 1.

