Results 1  10
of
15
Hierarchical Models of Variance Sources
 SIGNAL PROCESSING
, 2003
"... In many models, variances are assumed to be constant although this assumption is often unrealistic in practice. Joint modelling of means and variances is di#cult in many learning approaches, because it can lead into infinite probability densities. We show that a Bayesian variational technique which ..."
Abstract

Cited by 32 (12 self)
 Add to MetaCart
In many models, variances are assumed to be constant although this assumption is often unrealistic in practice. Joint modelling of means and variances is di#cult in many learning approaches, because it can lead into infinite probability densities. We show that a Bayesian variational technique which is sensitive to probability mass instead of density is able to jointly model both variances and means. We consider a model structure where a Gaussian variable, called variance node, controls the variance of another Gaussian variable. Variance nodes make it possible to build hierarchical models for both variances and means. We report experiments with artificial data which demonstrate the ability of the learning algorithm to find variance sources explaining and characterizing well the variances in the multidimensional data. Experiments with biomedical MEG data show that variance sources are present in realworld signals.
Advances in nonlinear blind source separation
 In Proc. of the 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003
, 2003
"... Abstract — In this paper, we briefly review recent advances in blind source separation (BSS) for nonlinear mixing models. After a general introduction to the nonlinear BSS and ICA (independent Component Analysis) problems, we discuss in more detail uniqueness issues, presenting some new results. A f ..."
Abstract

Cited by 30 (2 self)
 Add to MetaCart
Abstract — In this paper, we briefly review recent advances in blind source separation (BSS) for nonlinear mixing models. After a general introduction to the nonlinear BSS and ICA (independent Component Analysis) problems, we discuss in more detail uniqueness issues, presenting some new results. A fundamental difficulty in the nonlinear BSS problem and even more so in the nonlinear ICA problem is that they are nonunique without extra constraints, which are often implemented by using a suitable regularization. Postnonlinear mixtures are an important special case, where a nonlinearity is applied to linear mixtures. For such mixtures, the ambiguities are essentially the same as for the linear ICA or BSS problems. In the later part of this paper, various separation techniques proposed for postnonlinear mixtures and general nonlinear mixtures are reviewed. I. THE NONLINEAR ICA AND BSS PROBLEMS Consider Æ samples of the observed data vector Ü, modeled by
Nonlinear Independent Factor Analysis by Hierarchical Models
 in Proc. 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003
, 2003
"... The building blocks introduced earlier by us in [1] are used for constructing a hierarchical nonlinear model for nonlinear factor analysis. We call the resulting method hierarchical nonlinear factor analysis (HNFA). The variational Bayesian learning algorithm used in this method has a linear computa ..."
Abstract

Cited by 25 (13 self)
 Add to MetaCart
The building blocks introduced earlier by us in [1] are used for constructing a hierarchical nonlinear model for nonlinear factor analysis. We call the resulting method hierarchical nonlinear factor analysis (HNFA). The variational Bayesian learning algorithm used in this method has a linear computational complexity, and it is able to infer the structure of the model in addition to estimating the unknown parameters. We show how nonlinear mixtures can be separated by first estimating a nonlinear subspace using HNFA and then rotating the subspace using linear independent component analysis. Experimental results show that the cost function minimised during learning predicts well the quality of the estimated subspace.
On the effect of the form of the posterior approximation in variational learning of ICA models
 in Proc. of the 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003
, 2003
"... Abstract. We show that the choice of posterior approximation affects the solution found in Bayesian variational learning of linear independent component analysis models. Assuming the sources to be independent a posteriori favours a solution which has orthogonal mixing vectors. Linear mixing models w ..."
Abstract

Cited by 20 (6 self)
 Add to MetaCart
Abstract. We show that the choice of posterior approximation affects the solution found in Bayesian variational learning of linear independent component analysis models. Assuming the sources to be independent a posteriori favours a solution which has orthogonal mixing vectors. Linear mixing models with either temporally correlated sources or nonGaussian source models are considered but the analysis extends to nonlinear mixtures as well.
Variational Bayesian Learning of ICA with Missing Data
, 2003
"... this article, we extend the variational Bayesian ICA method to problemswith missing data. More important, the probability density estimate of the missing entries can be used to #ll in the missing values. This allows the density model to be re#ned and made more accurate ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
this article, we extend the variational Bayesian ICA method to problemswith missing data. More important, the probability density estimate of the missing entries can be used to #ll in the missing values. This allows the density model to be re#ned and made more accurate
Building Blocks For Variational Bayesian Learning Of Latent Variable Models
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including variance models a ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including variance models and nonlinear modelling, which are lacking from most existing variational systems. The introduced blocks are designed to fit together and to yield e#cient update rules. Practical implementation of various models is easy thanks to an associated software package which derives the learning formulas automatically once a specific model structure has been fixed. Variational Bayesian learning provides a cost function which is used both for updating the variables of the model and for optimising the model structure. All the computations can be carried out locally, resulting in linear computational complexity. We present
Overlearning in Marginal DistributionBased ICA: Analysis and Solutions. JMach Learn Res 2003
"... The present paper is written as a word of caution, with users of independent component analysis (ICA) in mind, to overlearning phenomena that are often observed. We consider two types of overlearning, typical to highorder statistics based ICA. These algorithms can be seen to maximise the negentropy ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
The present paper is written as a word of caution, with users of independent component analysis (ICA) in mind, to overlearning phenomena that are often observed. We consider two types of overlearning, typical to highorder statistics based ICA. These algorithms can be seen to maximise the negentropy of the source estimates. The first kind of overlearning results in the generation of spikelike signals, if there are not enough samples in the data or there is a considerable amount of noise present. It is argued that, if the data has power spectrum characterised by 1 / f curve, we face a more severe problem, which cannot be solved inside the strict ICA model. This overlearning is better characterised by bumps instead of spikes. Both overlearning types are demonstrated in the case of artificial signals as well as magnetoencephalograms (MEG). Several methods are suggested to circumvent both types, either by making the estimation of the ICA model more robust or by including further modelling of the data.
Handling missing data with variational bayesian estimation of ica
 Institute for Neural Computation, Caltech
, 2002
"... Missing data is common in real world datasets and is a problem for many estimation techniques. We have developed a variational Bayesian method to perform Independent Component Analysis (ICA) on highdimensional data containing missing entries. Missing data are handled naturally in the Bayesian frame ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Missing data is common in real world datasets and is a problem for many estimation techniques. We have developed a variational Bayesian method to perform Independent Component Analysis (ICA) on highdimensional data containing missing entries. Missing data are handled naturally in the Bayesian framework by integrating the generative density model. Modeling the distributions of the independent sources with mixture of Gaussians allows sources to be estimated with different kurtosis and skewness. The variational Bayesian method automatically determines the dimensionality of the data and yields an accurate density model for the observed data without overfitting problems. This allows direct probability estimation of missing values in the high dimensional space and avoids dimension reduction preprocessing which is not feasible with missing data. 1.
Learning Hierarchical Dynamics Using Independent Component Analysis
, 2003
"... Mixture modelling techniques such as Mixtures of Principal component and Factor analysers [1, 2] are very powerful in representing and segmenting Gaussian clusters in data. Meaningful segmentations may be lost, however, if these selfsimilar areas are nonGaussian. For such data, an intuitive model ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Mixture modelling techniques such as Mixtures of Principal component and Factor analysers [1, 2] are very powerful in representing and segmenting Gaussian clusters in data. Meaningful segmentations may be lost, however, if these selfsimilar areas are nonGaussian. For such data, an intuitive model is a Mixture of Independent Component Analysers [3, 4, 5]. Such a model, however, ignores dynamics, both between clusters and within clusters. The former can be remedied by enforcing a Markov prior over the compnent mixture variables, leading to a Hidden Markov model (HMM) with ICA generators. The latter can be modelled if the source models of these ICA generators are themselves dynamic, for example by utilising HMM sources. HMMs are models for picking up dynamic changes of state in the underlying data generation process, and are therefore useful in capturing highorder temporal information. The proposed method is a piecewise approach to detecting dynamic movement, focussing on abrupt changes in the observation model and/or in the source model, while assuming static statistics in between. The hierarchical approach allows the analysis of signals which have macro and microdynamics, such as stock indices.
EM in HighDimensional Spaces
"... Abstract—This paper considers fitting a mixture of Gaussians model to highdimensional data in scenarios where there are fewer data samples than feature dimensions. Issues that arise when using principal component analysis (PCA) to represent Gaussian distributions inside ExpectationMaximization (EM ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract—This paper considers fitting a mixture of Gaussians model to highdimensional data in scenarios where there are fewer data samples than feature dimensions. Issues that arise when using principal component analysis (PCA) to represent Gaussian distributions inside ExpectationMaximization (EM) are addressed, and a practical algorithm results. Unlike other algorithms that have been proposed, this algorithm does not try to compress the data to fit lowdimensional models. Instead, it models Gaussian distributions in the @ IAdimensional space spanned by the data samples. We are able to show that this algorithm converges on data sets where lowdimensional techniques do not. Index Terms—Expectation–Maximization, image classification, maximum likelihood estimation, principal component analysis, unsupervised learning. I.