Results 1  10
of
117
Unsupervised learning of finite mixture models
 IEEE Transactions on pattern analysis and machine intelligence
, 2002
"... AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization ..."
Abstract

Cited by 267 (20 self)
 Add to MetaCart
AbstractÐThis paper proposes an unsupervised algorithm for learning a finite mixture model from multivariate data. The adjective ªunsupervisedº is justified by two properties of the algorithm: 1) it is capable of selecting the number of components and 2) unlike the standard expectationmaximization (EM) algorithm, it does not require careful initialization. The proposed method also avoids another drawback of EM for mixture fitting: the possibility of convergence toward a singular estimate at the boundary of the parameter space. The novelty of our approach is that we do not use a model selection criterion to choose one among a set of preestimated candidate models; instead, we seamlessly integrate estimation and model selection in a single algorithm. Our technique can be applied to any type of parametric mixture model for which it is possible to write an EM algorithm; in this paper, we illustrate it with experiments involving Gaussian mixtures. These experiments testify for the good performance of our approach. Index TermsÐFinite mixtures, unsupervised learning, model selection, minimum message length criterion, Bayesian methods, expectationmaximization algorithm, clustering. æ 1
Learning with Labeled and Unlabeled Data
, 2001
"... In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as ..."
Abstract

Cited by 165 (3 self)
 Add to MetaCart
In this paper, on the one hand, we aim to give a review on literature dealing with the problem of supervised learning aided by additional unlabeled data. On the other hand, being a part of the author's first year PhD report, the paper serves as a frame to bundle related work by the author as well as numerous suggestions for potential future work. Therefore, this work contains more speculative and partly subjective material than the reader might expect from a literature review. We give a rigorous definition of the problem and relate it to supervised and unsupervised learning. The crucial role of prior knowledge is put forward, and we discuss the important notion of inputdependent regularization. We postulate a number of baseline methods, being algorithms or algorithmic schemes which can more or less straightforwardly be applied to the problem, without the need for genuinely new concepts. However, some of them might serve as basis for a genuine method. In the literature revi...
Gaussian process latent variable models for visualisation of high dimensional data
 Adv. in Neural Inf. Proc. Sys
, 2004
"... We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the ex ..."
Abstract

Cited by 132 (5 self)
 Add to MetaCart
We introduce a variational inference framework for training the Gaussian process latent variable model and thus performing Bayesian nonlinear dimensionality reduction. This method allows us to variationally integrate out the input variables of the Gaussian process and compute a lower bound on the exact marginal likelihood of the nonlinear latent variable model. The maximization of the variational lower bound provides a Bayesian training procedure that is robust to overfitting and can automatically select the dimensionality of the nonlinear latent space. We demonstrate our method on real world datasets. The focus in this paper is on dimensionality reduction problems, but the methodology is more general. For example, our algorithm is immediately applicable for training Gaussian process models in the presence of missing or uncertain inputs. 1
Propagation Algorithms for Variational Bayesian Learning
 In Advances in Neural Information Processing Systems 13
, 2001
"... Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugateexponential graphical models. We show how the belief propagation and the junction tree algorithms ..."
Abstract

Cited by 110 (14 self)
 Add to MetaCart
Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugateexponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linearGaussian statespace models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the statespace model in a variety of synthetic problems and one real highdimensional data set.
Incremental Online Learning in High Dimensions
 Neural Computation
, 2005
"... Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e ..."
Abstract

Cited by 104 (15 self)
 Add to MetaCart
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e#cient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic leaveoneout cross validation for learning without the need to memorize training data, iii) adjusts its weighting kernels based only on local information in order to minimize the danger of negative interference of incremental learning, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number of  possibly redundant  inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and e#ciently operate in very high dimensional spaces.
Tutorial on Variational Approximation Methods
 In Advanced Mean Field Methods: Theory and Practice
, 2000
"... We provide an introduction to the theory and use of variational methods for inference and estimation in the context of graphical models. Variational methods become useful as ecient approximate methods when the structure of the graph model no longer admits feasible exact probabilistic calculations. T ..."
Abstract

Cited by 73 (1 self)
 Add to MetaCart
We provide an introduction to the theory and use of variational methods for inference and estimation in the context of graphical models. Variational methods become useful as ecient approximate methods when the structure of the graph model no longer admits feasible exact probabilistic calculations. The emphasis of this tutorial is on illustrating how inference and estimation problems can be transformed into variational form along with describing the resulting approximation algorithms and their properties insofar as these are currently known. 1 Introduction The term variational methods refers to a large collection of optimization techniques. The classical context for these methods involves nding the extremum of an integral depending on an unknown function and its derivatives. This classical de nition, however, and the accompanying calculus of variation no longer adequately characterizes modern variational methods. Modern variational approaches have become indispensable tools in...
Segmentation of multivariate mixed data via lossy coding and compression
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2007
"... Abstract—In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmen ..."
Abstract

Cited by 68 (13 self)
 Add to MetaCart
Abstract—In this paper, based on ideas from lossy data coding and compression, we present a simple but effective technique for segmenting multivariate mixed data that are drawn from a mixture of Gaussian distributions, which are allowed to be almost degenerate. The goal is to find the optimal segmentation that minimizes the overall coding length of the segmented data, subject to a given distortion. By analyzing the coding length/rate of mixed data, we formally establish some strong connections of data segmentation to many fundamental concepts in lossy data compression and ratedistortion theory. We show that a deterministic segmentation is approximately the (asymptotically) optimal solution for compressing mixed data. We propose a very simple and effective algorithm that depends on a single parameter, the allowable distortion. At any given distortion, the algorithm automatically determines the corresponding number and dimension of the groups and does not involve any parameter estimation. Simulation results reveal intriguing phasetransitionlike behaviors of the number of segments when changing the level of distortion or the amount of outliers. Finally, we demonstrate how this technique can be readily applied to segment real imagery and bioinformatic data. Index Terms—Multivariate mixed data, data segmentation, data clustering, rate distortion, lossy coding, lossy compression, image segmentation, microarray data clustering. 1
Automatic choice of dimensionality for PCA
, 2000
"... A central issue in principal component analysis (PCA) is choosing the number of principal components to be retained. By interpreting PCA as density estimation, we show how to use Bayesian model selection to estimate the true dimensionality of the data. The resulting estimate is simple to compute ..."
Abstract

Cited by 67 (1 self)
 Add to MetaCart
A central issue in principal component analysis (PCA) is choosing the number of principal components to be retained. By interpreting PCA as density estimation, we show how to use Bayesian model selection to estimate the true dimensionality of the data. The resulting estimate is simple to compute yet guaranteed to pick the correct dimensionality, given enough data. The estimate involves an integral over the Steifel manifold of kframes, which is difficult to compute exactly. But after choosing an appropriate parameterization and applying Laplace's method, an accurate and practical estimator is obtained. In simulations, it is convincingly better than crossvalidation and other proposed algorithms, plus it runs much faster.
Ensemble learning for independent component analysis
 in Advances in Independent Component Analysis
, 2000
"... i Abstract This thesis is concerned with the problem of Blind Source Separation. Specifically we considerthe Independent Component Analysis (ICA) model in which a set of observations are modelled by xt = Ast: (1) where A is an unknown mixing matrix and st is a vector of hidden source components atti ..."
Abstract

Cited by 49 (2 self)
 Add to MetaCart
i Abstract This thesis is concerned with the problem of Blind Source Separation. Specifically we considerthe Independent Component Analysis (ICA) model in which a set of observations are modelled by xt = Ast: (1) where A is an unknown mixing matrix and st is a vector of hidden source components attime t. The ICA problem is to find the sources given only a set of observations. In chapter 1, the blind source separation problem is introduced. In chapter 2 the methodof Ensemble Learning is explained. Chapter 3 applies Ensemble Learning to the ICA model and chapter 4 assesses the use of Ensemble Learning for model selection.Chapters 57 apply the Ensemble Learning ICA algorithm to data sets from physics (a medical imaging data set consisting of images of a tooth), biology (data sets from cDNAmicroarrays) and astrophysics (Planck image separation and galaxy spectra separation).