Results 1  10
of
64
Independent Component Analysis
 Neural Computing Surveys
, 2001
"... A common problem encountered in such disciplines as statistics, data analysis, signal processing, and neural network research, is nding a suitable representation of multivariate data. For computational and conceptual simplicity, such a representation is often sought as a linear transformation of the ..."
Abstract

Cited by 1492 (93 self)
 Add to MetaCart
A common problem encountered in such disciplines as statistics, data analysis, signal processing, and neural network research, is nding a suitable representation of multivariate data. For computational and conceptual simplicity, such a representation is often sought as a linear transformation of the original data. Wellknown linear transformation methods include, for example, principal component analysis, factor analysis, and projection pursuit. A recently developed linear transformation method is independent component analysis (ICA), in which the desired representation is the one that minimizes the statistical dependence of the components of the representation. Such a representation seems to capture the essential structure of the data in many applications. In this paper, we survey the existing theory and methods for ICA. 1
Fast and robust fixedpoint algorithms for independent component analysis
 IEEE TRANS. NEURAL NETW
, 1999
"... Independent component analysis (ICA) is a statistical method for transforming an observed multidimensional random vector into components that are statistically as independent from each other as possible. In this paper, we use a combination of two different approaches for linear ICA: Comon’s informat ..."
Abstract

Cited by 511 (34 self)
 Add to MetaCart
Independent component analysis (ICA) is a statistical method for transforming an observed multidimensional random vector into components that are statistically as independent from each other as possible. In this paper, we use a combination of two different approaches for linear ICA: Comon’s informationtheoretic approach and the projection pursuit approach. Using maximum entropy approximations of differential entropy, we introduce a family of new contrast (objective) functions for ICA. These contrast functions enable both the estimation of the whole decomposition by minimizing mutual information, and estimation of individual independent components as projection pursuit directions. The statistical properties of the estimators based on such contrast functions are analyzed under the assumption of the linear mixture model, and it is shown how to choose contrast functions that are robust and/or of minimum variance. Finally, we introduce simple fixedpoint algorithms for practical optimization of the contrast functions. These algorithms optimize the contrast functions very fast and reliably.
Neural Networks and Statistical Models
, 1994
"... There has been much publicity about the ability of artificial neural networks to learn and generalize. In fact, the most commonly used artificial neural networks, called multilayer perceptrons, are nothing more than nonlinear regression and discriminant models that can be implemented with standard s ..."
Abstract

Cited by 99 (1 self)
 Add to MetaCart
There has been much publicity about the ability of artificial neural networks to learn and generalize. In fact, the most commonly used artificial neural networks, called multilayer perceptrons, are nothing more than nonlinear regression and discriminant models that can be implemented with standard statistical software. This paper explains what neural networks are, translates neural network jargon into statistical jargon, and shows the relationships between neural networks and statistical models such as generalized linear models, maximum redundancy analysis, projection pursuit, and cluster analysis.
A Unifying Informationtheoretic Framework for Independent Component Analysis
, 1999
"... We show that different theories recently proposed for Independent Component Analysis (ICA) lead to the same iterative learning algorithm for blind separation of mixed independent sources. We review those theories and suggest that information theory can be used to unify several lines of research. Pea ..."
Abstract

Cited by 82 (8 self)
 Add to MetaCart
We show that different theories recently proposed for Independent Component Analysis (ICA) lead to the same iterative learning algorithm for blind separation of mixed independent sources. We review those theories and suggest that information theory can be used to unify several lines of research. Pearlmutter and Parra (1996) and Cardoso (1997) showed that the infomax approach of Bell and Sejnowski (1995) and the maximum likelihood estimation approach are equivalent. We show that negentropy maximization also has equivalent properties and therefore all three approaches yield the same learning rule for a fixed nonlinearity. Girolami and Fyfe (1997a) have shown that the nonlinear Principal Component Analysis (PCA) algorithm of Karhunen and Joutsensalo (1994) and Oja (1997) can also be viewed from informationtheoretic principles since it minimizes the sum of squares of the fourthorder marginal cumulants and therefore approximately minimizes the mutual information (Comon, 1994). Lambert (19...
A Theory of Learning Classification Rules
, 1992
"... The main contributions of this thesis are a Bayesian theory of learning classification rules, the unification and comparison of this theory with some previous theories of learning, and two extensive applications of the theory to the problems of learning class probability trees and bounding error whe ..."
Abstract

Cited by 79 (6 self)
 Add to MetaCart
The main contributions of this thesis are a Bayesian theory of learning classification rules, the unification and comparison of this theory with some previous theories of learning, and two extensive applications of the theory to the problems of learning class probability trees and bounding error when learning logical rules. The thesis is motivated by considering some current research issues in machine learning such as bias, overfitting and search, and considering the requirements placed on a learning system when it is used for knowledge acquisition. Basic Bayesian decision theory relevant to the problem of learning classification rules is reviewed, then a Bayesian framework for such learning is presented. The framework has three components: the hypothesis space, the learning protocol, and criteria for successful learning. Several learning protocols are analysed in detail: queries, logical, noisy, uncertain and positiveonly examples. The analysis is done by interpreting a protocol as a...
New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit
, 1998
"... We derive a firstorder approximation of the density of maximum entropy for a continuous 1D random variable, given a number of simple constraints. This results in a density expansion which is somewhat similar to the classical polynomial density expansions by GramCharlier and Edgeworth. Using this ..."
Abstract

Cited by 73 (6 self)
 Add to MetaCart
We derive a firstorder approximation of the density of maximum entropy for a continuous 1D random variable, given a number of simple constraints. This results in a density expansion which is somewhat similar to the classical polynomial density expansions by GramCharlier and Edgeworth. Using this approximation of density, an approximation of 1D differential entropy is derived. The approximation of entropy is both more exact and more robust against outliers than the classical approximation based on the polynomial density expansions, without being computationally more expensive. The approximation has applications, for example, in independent component analysis and projection pursuit.
A Review of Kernel Methods in Machine Learning
, 2006
"... We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticate ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
We review recent methods for learning with positive definite kernels. All these methods formulate learning and estimation problems as linear tasks in a reproducing kernel Hilbert space (RKHS) associated with a kernel. We cover a wide range of methods, ranging from simple classifiers to sophisticated methods for estimation with structured data.
Modefinding for mixtures of Gaussian distributions
 Dept. of Computer Science, University of Sheffield
, 1999
"... I consider the problem of finding all the modes of a mixture of multivariate Gaussian distributions, which has applications in clustering and regression. I derive exact formulas for the gradient and Hessian and give a partial proof that the number of modes cannot be more than the number of component ..."
Abstract

Cited by 34 (8 self)
 Add to MetaCart
I consider the problem of finding all the modes of a mixture of multivariate Gaussian distributions, which has applications in clustering and regression. I derive exact formulas for the gradient and Hessian and give a partial proof that the number of modes cannot be more than the number of components, and are contained in the convex hull of the component centroids. Then, I develop two exhaustive mode search algorithms: one based on combined quadratic maximisation and gradient ascent and the other one based on a fixedpoint iterative scheme. Appropriate values for the search control parameters are derived by taking into account theoretical results regarding the bounds for the gradient and Hessian of the mixture. The significance of the modes is quantified locally (for each mode) by error bars, or confidence intervals (estimated using the values of the Hessian at each mode); and globally by the sparseness of the mixture, measured by its differential entropy (estimated through bounds). I conclude with some reflections about bumpfinding.
A review of dimension reduction techniques
, 1997
"... The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in highdimensional spaces and as a modelling tool for such data. It is defined as the search for a lowdimensional manifold that embeds the highdimensional data. A cl ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in highdimensional spaces and as a modelling tool for such data. It is defined as the search for a lowdimensional manifold that embeds the highdimensional data. A classification of dimension reduction problems is proposed. A survey of several techniques for dimension reduction is given, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen’s maps or the generalised topographic mapping. Neural network implementations for several of these techniques are also reviewed, such as the projection pursuit learning network and the BCM neuron with an objective function. Several appendices complement the mathematical treatment of the main text.