Results 1  10
of
148
Survey on Independent Component Analysis
 NEURAL COMPUTING SURVEYS
, 1999
"... A common problem encountered in such disciplines as statistics, data analysis, signal processing, and neural network research, is nding a suitable representation of multivariate data. For computational and conceptual simplicity, such a representation is often sought as a linear transformation of the ..."
Abstract

Cited by 2241 (104 self)
 Add to MetaCart
A common problem encountered in such disciplines as statistics, data analysis, signal processing, and neural network research, is nding a suitable representation of multivariate data. For computational and conceptual simplicity, such a representation is often sought as a linear transformation of the original data. Wellknown linear transformation methods include, for example, principal component analysis, factor analysis, and projection pursuit. A recently developed linear transformation method is independent component analysis (ICA), in which the desired representation is the one that minimizes the statistical dependence of the components of the representation. Such a representation seems to capture the essential structure of the data in many applications. In this paper, we survey the existing theory and methods for ICA.
Independent Component Analysis Using an Extended Infomax Algorithm for Mixed Subgaussian and Supergaussian Sources
, 1999
"... An extension of the infomax algorithm of Bell and Sejnowski (1995) is presented that is able blindly to separate mixed signals with sub and supergaussian source distributions. This was achieved by using a simple type of learning rule first derived by Girolami (1997) by choosing negentropy as a proj ..."
Abstract

Cited by 307 (22 self)
 Add to MetaCart
An extension of the infomax algorithm of Bell and Sejnowski (1995) is presented that is able blindly to separate mixed signals with sub and supergaussian source distributions. This was achieved by using a simple type of learning rule first derived by Girolami (1997) by choosing negentropy as a projection pursuit index. Parameterized probability distributions that have sub and supergaussian regimes were used to derive a general learning rule that preserves the simple architecture proposed by Bell and Sejnowski (1995), is optimized using the natural gradient by Amari (1998), and uses the stability analysis of Cardoso and Laheld (1996) to switch between sub and supergaussian regimes. We demonstrate that the extended infomax algorithm is able to separate 20 sources with a variety of source distributions easily. Applied to highdimensional data from electroencephalographic recordings, it is effective at separating artifacts such as eye blinks and line noise from weaker electrical signals that arise from sources in the brain.
A Unifying Informationtheoretic Framework for Independent Component Analysis
, 1999
"... We show that different theories recently proposed for Independent Component Analysis (ICA) lead to the same iterative learning algorithm for blind separation of mixed independent sources. We review those theories and suggest that information theory can be used to unify several lines of research. Pea ..."
Abstract

Cited by 104 (8 self)
 Add to MetaCart
We show that different theories recently proposed for Independent Component Analysis (ICA) lead to the same iterative learning algorithm for blind separation of mixed independent sources. We review those theories and suggest that information theory can be used to unify several lines of research. Pearlmutter and Parra (1996) and Cardoso (1997) showed that the infomax approach of Bell and Sejnowski (1995) and the maximum likelihood estimation approach are equivalent. We show that negentropy maximization also has equivalent properties and therefore all three approaches yield the same learning rule for a fixed nonlinearity. Girolami and Fyfe (1997a) have shown that the nonlinear Principal Component Analysis (PCA) algorithm of Karhunen and Joutsensalo (1994) and Oja (1997) can also be viewed from informationtheoretic principles since it minimizes the sum of squares of the fourthorder marginal cumulants and therefore approximately minimizes the mutual information (Comon, 1994). Lambert (19...
Nonlinear Independent Component Analysis Using Ensemble Learning: Experiments And Discussion
, 2000
"... In this paper, we present experimental results on a nonlinear independent component analysis approach based on Bayesian ensemble learning. The theory of the method is discussed in a companion paper. Simulations with artificial and natural data demonstrate the feasibility and good performance of the ..."
Abstract

Cited by 63 (21 self)
 Add to MetaCart
In this paper, we present experimental results on a nonlinear independent component analysis approach based on Bayesian ensemble learning. The theory of the method is discussed in a companion paper. Simulations with artificial and natural data demonstrate the feasibility and good performance of the proposed approach. We also discuss the relationships of the method to other existing methods.
Kernel methods for measuring independence
 Journal of Machine Learning Research
, 2005
"... We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prov ..."
Abstract

Cited by 58 (19 self)
 Add to MetaCart
We introduce two new functionals, the constrained covariance and the kernel mutual information, to measure the degree of independence of random variables. These quantities are both based on the covariance between functions of the random variables in reproducing kernel Hilbert spaces (RKHSs). We prove that when the RKHSs are universal, both functionals are zero if and only if the random variables are pairwise independent. We also show that the kernel mutual information is an upper bound near independence on the Parzen window estimate of the mutual information. Analogous results apply for two correlationbased dependence functionals introduced earlier: we show the kernel canonical correlation and the kernel generalised variance to be independence measures for universal kernels, and prove the latter to be an upper bound on the mutual information near independence. The performance of the kernel dependence functionals in measuring independence is verified in the context of independent component analysis.
InformationTheoretic Approach to Blind Separation of Sources in Nonlinear Mixture
, 1998
"... The linear mixture model is assumed in most of the papers devoted to blind separation. A more realistic model for mixture should be nonlinear. In this paper, a twolayer perceptron is used as a demixing system to separate sources in nonlinear mixture. The learning algorithms for the demixing sys ..."
Abstract

Cited by 51 (4 self)
 Add to MetaCart
The linear mixture model is assumed in most of the papers devoted to blind separation. A more realistic model for mixture should be nonlinear. In this paper, a twolayer perceptron is used as a demixing system to separate sources in nonlinear mixture. The learning algorithms for the demixing system are derived by two approaches: maximum entropy and minimum mutual information. The algorithms derived from the two approaches have a common structure. The new learning equations for the hidden layer are different from the learning equations for the output layer. The natural gradient descent method is applied in maximizing entropy and minimizing mutual information. The information (entropy or mutual information) backpropagation method is proposed to derive the learning equations for the hidden layer.
Learning and inference in the brain
, 2003
"... This article is about how the brain data mines its sensory inputs. There are several architectural principles of functional brain anatomy that have emerged from careful anatomic and physiologic studies over the past century. These principles are considered in the light of representational learning t ..."
Abstract

Cited by 51 (8 self)
 Add to MetaCart
This article is about how the brain data mines its sensory inputs. There are several architectural principles of functional brain anatomy that have emerged from careful anatomic and physiologic studies over the past century. These principles are considered in the light of representational learning to see if they could have been predicted a priori on the basis of purely theoretical considerations. We first review the organisation of hierarchical sensory cortices, paying special attention to the distinction between forward and backward connections. We then review various approaches to representational learning as special cases of generative models, starting with supervised learning and ending with learning based upon empirical Bayes. The latter predicts many features, such as a hierarchical cortical system, prevalent topdown backward influences and functional asymmetries between forward and backward connections that are seen in the real brain. The key points made in this article are: (i) hierarchical generative models enable the learning of empirical priors and eschew prior assumptions about the causes of sensory input that are inherent in nonhierarchical models. These assumptions are necessary for learning schemes based on information theory and efficient or sparse coding, but are not necessary in a hierarchical context. Critically, the anatomical infrastructure that may implement generative models in the brain is hierarchical. Furthermore, learning based on empirical Bayes can proceed in a biologically plausible way. (ii) The second point is that backward connections are essential if the processes generating inputs cannot be inverted, or the inversion cannot be parameterised. Because these processes involve manytoone mappings, are nonlinear and dynamic in nature, they are generally noninvertible. This enforces an explicit parameterisation of generative models (i.e. backward
Advances in nonlinear blind source separation
 In Proc. of the 4th Int. Symp. on Independent Component Analysis and Blind Signal Separation (ICA2003
, 2003
"... Abstract — In this paper, we briefly review recent advances in blind source separation (BSS) for nonlinear mixing models. After a general introduction to the nonlinear BSS and ICA (independent Component Analysis) problems, we discuss in more detail uniqueness issues, presenting some new results. A f ..."
Abstract

Cited by 41 (2 self)
 Add to MetaCart
(Show Context)
Abstract — In this paper, we briefly review recent advances in blind source separation (BSS) for nonlinear mixing models. After a general introduction to the nonlinear BSS and ICA (independent Component Analysis) problems, we discuss in more detail uniqueness issues, presenting some new results. A fundamental difficulty in the nonlinear BSS problem and even more so in the nonlinear ICA problem is that they are nonunique without extra constraints, which are often implemented by using a suitable regularization. Postnonlinear mixtures are an important special case, where a nonlinearity is applied to linear mixtures. For such mixtures, the ambiguities are essentially the same as for the linear ICA or BSS problems. In the later part of this paper, various separation techniques proposed for postnonlinear mixtures and general nonlinear mixtures are reviewed. I. THE NONLINEAR ICA AND BSS PROBLEMS Consider Æ samples of the observed data vector Ü, modeled by
Factorial coding of natural images: how effective are linear models in removing higherorder dependencies?
 JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A
, 2006
"... The performance of unsupervised learning models for natural images is evaluated quantitatively by means of information theory. We estimate the gain in statistical independence (the multiinformation reduction) achieved with independent component analysis (ICA), principal component analysis (PCA), z ..."
Abstract

Cited by 39 (12 self)
 Add to MetaCart
The performance of unsupervised learning models for natural images is evaluated quantitatively by means of information theory. We estimate the gain in statistical independence (the multiinformation reduction) achieved with independent component analysis (ICA), principal component analysis (PCA), zerophase whitening, and predictive coding. Predictive coding is translated into the transform coding framework, where it can be characterized by the constraint of a triangular filter matrix. A randomly sampled whitening basis and the Haar wavelet are included into the comparison as well. The comparison of all these methods is carried out for different patch sizes, ranging from 2x2 to 16x16 pixels. In spite of large differences in the shape of the basis functions, we find only small differences in the multiinformation between all decorrelation transforms (5% or less) for all patch sizes. Among the secondorder methods, PCA is optimal for small patch sizes and predictive coding performs best for large patch sizes. The extra gain achieved with ICA is always less than 2%. In conclusion, the `edge filters&amp;amp;amp;amp;lsquo; found with ICA lead only to a surprisingly small improvement in terms of its actual objective.