Results 1  10
of
12
Hierarchical Models of Variance Sources
 SIGNAL PROCESSING
, 2003
"... In many models, variances are assumed to be constant although this assumption is often unrealistic in practice. Joint modelling of means and variances is di#cult in many learning approaches, because it can lead into infinite probability densities. We show that a Bayesian variational technique which ..."
Abstract

Cited by 33 (12 self)
 Add to MetaCart
In many models, variances are assumed to be constant although this assumption is often unrealistic in practice. Joint modelling of means and variances is di#cult in many learning approaches, because it can lead into infinite probability densities. We show that a Bayesian variational technique which is sensitive to probability mass instead of density is able to jointly model both variances and means. We consider a model structure where a Gaussian variable, called variance node, controls the variance of another Gaussian variable. Variance nodes make it possible to build hierarchical models for both variances and means. We report experiments with artificial data which demonstrate the ability of the learning algorithm to find variance sources explaining and characterizing well the variances in the multidimensional data. Experiments with biomedical MEG data show that variance sources are present in realworld signals.
Advances in nonlinear blind source separation
 In Proc. of the 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003
, 2003
"... Abstract — In this paper, we briefly review recent advances in blind source separation (BSS) for nonlinear mixing models. After a general introduction to the nonlinear BSS and ICA (independent Component Analysis) problems, we discuss in more detail uniqueness issues, presenting some new results. A f ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
Abstract — In this paper, we briefly review recent advances in blind source separation (BSS) for nonlinear mixing models. After a general introduction to the nonlinear BSS and ICA (independent Component Analysis) problems, we discuss in more detail uniqueness issues, presenting some new results. A fundamental difficulty in the nonlinear BSS problem and even more so in the nonlinear ICA problem is that they are nonunique without extra constraints, which are often implemented by using a suitable regularization. Postnonlinear mixtures are an important special case, where a nonlinearity is applied to linear mixtures. For such mixtures, the ambiguities are essentially the same as for the linear ICA or BSS problems. In the later part of this paper, various separation techniques proposed for postnonlinear mixtures and general nonlinear mixtures are reviewed. I. THE NONLINEAR ICA AND BSS PROBLEMS Consider Æ samples of the observed data vector Ü, modeled by
Variational learning and bitsback coding: an informationtheoretic view to Bayesian learning
 IEEE Transactions on Neural Networks
"... Abstract—The bitsback coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and informationtheoretic minimumdescriptionlength (MDL) learning approaches. The bitsback coding allows interpreting the cost function ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
Abstract—The bitsback coding first introduced by Wallace in 1990 and later by Hinton and van Camp in 1993 provides an interesting link between Bayesian learning and informationtheoretic minimumdescriptionlength (MDL) learning approaches. The bitsback coding allows interpreting the cost function used in the variational Bayesian method called ensemble learning as a code length in addition to the Bayesian view of misfit of the posterior approximation and a lower bound of model evidence. Combining these two viewpoints provides interesting insights to the learning process and the functions of different parts of the model. In this paper, the problem of variational Bayesian learning of hierarchical latent variable models is used to demonstrate the benefits of the two views. The codelength interpretation provides new views to many parts of the problem such as model comparison and pruning and helps explain many phenomena occurring in learning. Index Terms—Bitsback coding, ensemble learning, hierarchical latent variable models, minimum description length, variational Bayesian learning. I.
Accelerating cyclic update algorithms for parameter estimation by pattern searches
 Neural Processing Letters
"... Abstract. A popular strategy for dealing with large parameter estimation problems is to split the problem into manageable subproblems and solve them cyclically one by one until convergence. A wellknown drawback of this strategy is slow convergence in low noise conditions. We propose using socalled ..."
Abstract

Cited by 16 (9 self)
 Add to MetaCart
Abstract. A popular strategy for dealing with large parameter estimation problems is to split the problem into manageable subproblems and solve them cyclically one by one until convergence. A wellknown drawback of this strategy is slow convergence in low noise conditions. We propose using socalled pattern searches which consist of an exploratory phase followed by a line search. During the exploratory phase, a search direction is determined by combining the individual updates of all subproblems. The approach can be used to speed up several wellknown learning methods such as variational Bayesian learning (ensemble learning) and expectationmaximization algorithm with modest algorithmic modifications. Experimental results show that the proposed method is able to reduce the required convergence time by 60–85 % in realistic variational Bayesian learning problems.
Building Blocks For Variational Bayesian Learning Of Latent Variable Models
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including variance models a ..."
Abstract

Cited by 11 (8 self)
 Add to MetaCart
We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including variance models and nonlinear modelling, which are lacking from most existing variational systems. The introduced blocks are designed to fit together and to yield e#cient update rules. Practical implementation of various models is easy thanks to an associated software package which derives the learning formulas automatically once a specific model structure has been fixed. Variational Bayesian learning provides a cost function which is used both for updating the variables of the model and for optimising the model structure. All the computations can be carried out locally, resulting in linear computational complexity. We present
Missing Values in Hierarchical Nonlinear Factor Analysis
 In Proc. of the Int. Conf. on Artificial Neural Networks and Neural Information Processing  ICANN/ICONIP 2003
, 2003
"... The properties of hierarchical nonlinear factor analysis (HNFA) recently introduced by Valpola and others [3] are studied by reconstructing values. The variational Bayesian learning algorithm for HNFA has linear computational complexity and is able to infer the structure of the model in addition to ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
The properties of hierarchical nonlinear factor analysis (HNFA) recently introduced by Valpola and others [3] are studied by reconstructing values. The variational Bayesian learning algorithm for HNFA has linear computational complexity and is able to infer the structure of the model in addition to estimating the parameters. To compare HNFA with other methods, we continued the experiments with speech spectrograms in [1] comparing nonlinear factor analysis (NFA) with linear factor analysis (FA) and with the selforganising map. Experiments suggest that HNFA lies between FA and NFA in handling nonlinear problems. Furthermore, HNFA gives better reconstructions than FA and it is more reliable than NFA.
Partially observed values, in
 Proc. Int. Joint Conf. on Neural Networks (IJCNN 2004
, 2004
"... It is common to have both observed and missing values in data. This paper concentrates on the case where a value can be somewhere between those two ends, partially observed and partially missing. To achieve that, a method of using evidence nodes in a Bayesian network is studied. Different ways of ha ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
It is common to have both observed and missing values in data. This paper concentrates on the case where a value can be somewhere between those two ends, partially observed and partially missing. To achieve that, a method of using evidence nodes in a Bayesian network is studied. Different ways of handling inaccuracies are discussed in examples and the proposed approach is justified in the experiments with real image data.
Using kernel PCA for initialisation of variational Bayesian nonlinear blind source separation method
 Proc. of the Fifth Int. Conf. on Independent Component Analysis and Blind Signal Separation (ICA 2004), volume 3195 of Lecture Notes in Computer Science
, 2004
"... Abstract. The variational Bayesian nonlinear blind source separation method introduced by Lappalainen and Honkela in 2000 is initialised with linear principal component analysis (PCA). Because of the multilayer perceptron (MLP) network used to model the nonlinearity, the method is susceptible to loc ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
Abstract. The variational Bayesian nonlinear blind source separation method introduced by Lappalainen and Honkela in 2000 is initialised with linear principal component analysis (PCA). Because of the multilayer perceptron (MLP) network used to model the nonlinearity, the method is susceptible to local minima and therefore sensitive to the initialisation used. As the method is used for nonlinear separation, the linear initialisation may in some cases lead it astray. In this paper we study the use of kernel PCA (KPCA) in the initialisation. KPCA is a rather straightforward generalisation of linear PCA and it is much faster to compute than the variational Bayesian method. The experiments show that it can produce significantly better initialisations than linear PCA. Additionally, the model comparison methods provided by the variational Bayesian framework can be easily applied to compare different kernels. 1
Online variational Bayesian learning
 In Proc. of the 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003
, 2003
"... Variational Bayesian learning is an approximation to the exact Bayesian learning where the true posterior is approximated with a simpler distribution. In this paper we present an online variant of variational Bayesian learning. The method is based on collecting likelihood information as the trainin ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Variational Bayesian learning is an approximation to the exact Bayesian learning where the true posterior is approximated with a simpler distribution. In this paper we present an online variant of variational Bayesian learning. The method is based on collecting likelihood information as the training samples are processed one at a time and decaying the old likelihood information. The decay or forgetting is very important since otherwise the system would get stuck to the first reasonable solution it finds. The method is tested with a simple linear independent component analysis (ICA) problem but it can easily be applied to other more difficult problems. 1.
Nonlinear relational Markov networks with an application to the game of Go
 In Proceedings of the International Conference on Artificial Neural Networks (ICANN 2005
, 2005
"... Abstract. It would be useful to have a joint probabilistic model for a general relational database. Objects in a database can be related to each other by indices and they are described by a number of discrete and continuous attributes. Many models have been developed for relational discrete data, an ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Abstract. It would be useful to have a joint probabilistic model for a general relational database. Objects in a database can be related to each other by indices and they are described by a number of discrete and continuous attributes. Many models have been developed for relational discrete data, and for data with nonlinear dependencies between continuous values. This paper combines two of these methods, relational Markov networks and hierarchical nonlinear factor analysis, resulting in joining nonlinear models in a structure determined by the relations in the data. The experiments on collective regression in the board game go suggest that regression accuracy can be improved by taking into account both relations and nonlinearities. 1