Results 1  10
of
44
Independent Component Analysis
 Neural Computing Surveys
, 2001
"... A common problem encountered in such disciplines as statistics, data analysis, signal processing, and neural network research, is nding a suitable representation of multivariate data. For computational and conceptual simplicity, such a representation is often sought as a linear transformation of the ..."
Abstract

Cited by 1492 (93 self)
 Add to MetaCart
A common problem encountered in such disciplines as statistics, data analysis, signal processing, and neural network research, is nding a suitable representation of multivariate data. For computational and conceptual simplicity, such a representation is often sought as a linear transformation of the original data. Wellknown linear transformation methods include, for example, principal component analysis, factor analysis, and projection pursuit. A recently developed linear transformation method is independent component analysis (ICA), in which the desired representation is the one that minimizes the statistical dependence of the components of the representation. Such a representation seems to capture the essential structure of the data in many applications. In this paper, we survey the existing theory and methods for ICA. 1
Fast and robust fixedpoint algorithms for independent component analysis
 IEEE TRANS. NEURAL NETW
, 1999
"... Independent component analysis (ICA) is a statistical method for transforming an observed multidimensional random vector into components that are statistically as independent from each other as possible. In this paper, we use a combination of two different approaches for linear ICA: Comon’s informat ..."
Abstract

Cited by 511 (34 self)
 Add to MetaCart
Independent component analysis (ICA) is a statistical method for transforming an observed multidimensional random vector into components that are statistically as independent from each other as possible. In this paper, we use a combination of two different approaches for linear ICA: Comon’s informationtheoretic approach and the projection pursuit approach. Using maximum entropy approximations of differential entropy, we introduce a family of new contrast (objective) functions for ICA. These contrast functions enable both the estimation of the whole decomposition by minimizing mutual information, and estimation of individual independent components as projection pursuit directions. The statistical properties of the estimators based on such contrast functions are analyzed under the assumption of the linear mixture model, and it is shown how to choose contrast functions that are robust and/or of minimum variance. Finally, we introduce simple fixedpoint algorithms for practical optimization of the contrast functions. These algorithms optimize the contrast functions very fast and reliably.
Learning lowlevel vision
 International Journal of Computer Vision
, 2000
"... We show a learningbased method for lowlevel vision problems. We setup a Markov network of patches of the image and the underlying scene. A factorization approximation allows us to easily learn the parameters of the Markov network from synthetic examples of image/scene pairs, and to e ciently prop ..."
Abstract

Cited by 468 (25 self)
 Add to MetaCart
We show a learningbased method for lowlevel vision problems. We setup a Markov network of patches of the image and the underlying scene. A factorization approximation allows us to easily learn the parameters of the Markov network from synthetic examples of image/scene pairs, and to e ciently propagate image information. Monte Carlo simulations justify this approximation. We apply this to the \superresolution " problem (estimating high frequency details from a lowresolution image), showing good results. For the motion estimation problem, we show resolution of the aperture problem and llingin arising from application of the same probabilistic machinery.
Independent Component Filters Of Natural Images Compared With Simple Cells In Primary Visual Cortex
, 1998
"... this article we investigate to what extent the statistical properties of natural images can be used to understand the variation of receptive field properties of simple cells in the mammalian primary visual cortex. The receptive fields of simple cells have been studied extensively (e.g., Hubel & Wies ..."
Abstract

Cited by 273 (0 self)
 Add to MetaCart
this article we investigate to what extent the statistical properties of natural images can be used to understand the variation of receptive field properties of simple cells in the mammalian primary visual cortex. The receptive fields of simple cells have been studied extensively (e.g., Hubel & Wiesel 1968, DeValois et al. 1982a, DeAngelis et al. 1993): they are localised in space and time, have bandpass characteristics in the spatial and temporal frequency domains, are oriented, and are often sensitive to the direction of motion of a stimulus. Here we will concentrate on the spatial properties of simple cells. Several hypotheses as to the function of these cells have been proposed. As the cells preferentially respond to oriented edges or lines, they can be viewed as edge or line detectors. Their joint localisation in both the spatial domain and the spatial frequency domain has led to the suggestion that they mimic Gabor filters, minimising uncertainty in both domains (Daugman 1980, Marcelja 1980). More recently, the match between the operations performed by simple cells and the wavelet transform has attracted attention (e.g., Field 1993). The approaches based on Gabor filters and wavelets basically consider processing by the visual cortex as a general image processing strategy, relatively independent of detailed assumptions about image statistics. On the other hand, the edge and line detector hypothesis is based on the intuitive notion that edges and lines are both abundant and important in images. This theme of relating simple cell properties with the statistics of natural images was explored extensively by Field (1987, 1994). He proposed that the cells are optimized specifically for coding natural images. He argued that one possibility for such a code, sparse coding...
An equivalence between sparse approximation and Support Vector Machines
 A.I. Memo 1606, MIT Arti cial Intelligence Laboratory
, 1997
"... This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. The pathname for this publication is: aipublications/15001999/AIM1606.ps.Z This paper shows a relationship between two di erent approximation techniques: the Support Vector Machines (SVM), proposed by V.Vapnik (1995), ..."
Abstract

Cited by 205 (7 self)
 Add to MetaCart
This publication can be retrieved by anonymous ftp to publications.ai.mit.edu. The pathname for this publication is: aipublications/15001999/AIM1606.ps.Z This paper shows a relationship between two di erent approximation techniques: the Support Vector Machines (SVM), proposed by V.Vapnik (1995), and a sparse approximation scheme that resembles the Basis Pursuit DeNoising algorithm (Chen, 1995 � Chen, Donoho and Saunders, 1995). SVM is a technique which can be derived from the Structural Risk Minimization Principle (Vapnik, 1982) and can be used to estimate the parameters of several di erent approximation schemes, including Radial Basis Functions, algebraic/trigonometric polynomials, Bsplines, and some forms of Multilayer Perceptrons. Basis Pursuit DeNoising is a sparse approximation technique, in which a function is reconstructed by using a small number of basis functions chosen from a large set (the dictionary). We show that, if the data are noiseless, the modi ed version of Basis Pursuit DeNoising proposed in this paper is equivalent to SVM in the following sense: if applied to the same data set the two techniques give the same solution, which is obtained by solving the same quadratic programming problem. In the appendix we also present a derivation of the SVM technique in the framework of regularization theory, rather than statistical learning theory, establishing a connection between SVM, sparse approximation and regularization theory.
Blind Source Separation by Sparse Decomposition in a Signal Dictionary
, 2000
"... Introduction In blind source separation an Nchannel sensor signal x(t) arises from M unknown scalar source signals s i (t), linearly mixed together by an unknown N M matrix A, and possibly corrupted by additive noise (t) x(t) = As(t) + (t) (1.1) We wish to estimate the mixing matrix A and the M ..."
Abstract

Cited by 193 (32 self)
 Add to MetaCart
Introduction In blind source separation an Nchannel sensor signal x(t) arises from M unknown scalar source signals s i (t), linearly mixed together by an unknown N M matrix A, and possibly corrupted by additive noise (t) x(t) = As(t) + (t) (1.1) We wish to estimate the mixing matrix A and the Mdimensional source signal s(t). Many natural signals can be sparsely represented in a proper signal dictionary s i (t) = K X k=1 C ik ' k (t) (1.2) The scalar functions ' k
Emergence of Phase and ShiftInvariant Features by Decomposition of Natural Images into Independent Feature Subspaces
, 2000
"... this article, we show that the same principle of independence maximization can explain the emergence of phase and shiftinvariant features, similar to those found in complex cells. This new kind of emergence is obtained by maximizing the independence between norms of projections on linear subspaces ..."
Abstract

Cited by 169 (33 self)
 Add to MetaCart
this article, we show that the same principle of independence maximization can explain the emergence of phase and shiftinvariant features, similar to those found in complex cells. This new kind of emergence is obtained by maximizing the independence between norms of projections on linear subspaces (instead of the independence of simple linear filter outputs). Thenorms of the projections on such "independent feature subspaces" then indicate the values of invariant features
Generative models for discovering sparse distributed representations
 Philosophical Transactions of the Royal Society B
, 1997
"... We describe a hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network. The model uses bottomup, topdown and lateral connections to perform Bayesian perceptual inference correctly. Once perceptual inference has b ..."
Abstract

Cited by 120 (5 self)
 Add to MetaCart
We describe a hierarchical, generative model that can be viewed as a nonlinear generalization of factor analysis and can be implemented in a neural network. The model uses bottomup, topdown and lateral connections to perform Bayesian perceptual inference correctly. Once perceptual inference has been performed the connection strengths can be updated using a very simple learning rule that only requires locally available information. We demonstrate that the network learns to extract sparse, distributed, hierarchical representations.
Incremental Online Learning in High Dimensions
 Neural Computation
, 2005
"... Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e ..."
Abstract

Cited by 104 (15 self)
 Add to MetaCart
Locally weighted projection regression (LWPR) is a new algorithm for incremental nonlinear function approximation in high dimensional spaces with redundant and irrelevant input dimensions. At its core, it employs nonparametric regression with locally linear models. In order to stay computationally e#cient and numerically robust, each local model performs the regression analysis with a small number of univariate regressions in selected directions in input space in the spirit of partial least squares regression. We discuss when and how local learning techniques can successfully work in high dimensional spaces and review the various techniques for local dimensionality reduction before finally deriving the LWPR algorithm. The properties of LWPR are that it i) learns rapidly with second order learning methods based on incremental training, ii) uses statistically sound stochastic leaveoneout cross validation for learning without the need to memorize training data, iii) adjusts its weighting kernels based only on local information in order to minimize the danger of negative interference of incremental learning, iv) has a computational complexity that is linear in the number of inputs, and v) can deal with a large number of  possibly redundant  inputs, as shown in various empirical evaluations with up to 90 dimensional data sets. For a probabilistic interpretation, predictive variance and confidence intervals are derived. To our knowledge, LWPR is the first truly incremental spatially localized learning method that can successfully and e#ciently operate in very high dimensional spaces.
Sparse Code Shrinkage: Denoising of Nongaussian Data by Maximum Likelihood Estimation
, 1999
"... Sparse coding is a method for finding a representation of data in which each of the components of the representation is only rarely significantly active. Such a representation is closely related to redundancy reduction and independent component analysis, and has some neurophysiological plausibility. ..."
Abstract

Cited by 93 (15 self)
 Add to MetaCart
Sparse coding is a method for finding a representation of data in which each of the components of the representation is only rarely significantly active. Such a representation is closely related to redundancy reduction and independent component analysis, and has some neurophysiological plausibility. In this paper, we show how sparse coding can be used for denoising. Using maximum likelihood estimation of nongaussian variables corrupted by gaussian noise, we show how to apply a softthresholding (shrinkage) operator on the components of sparse coding so as to reduce noise. Our method is closely related to the method of wavelet shrinkage, but it has the important benefit over wavelet methods that the representation is determined solely by the statistical properties of the data. The wavelet representation, on the other hand, relies heavily on certain mathematical properties (like selfsimilarity) that may be only weakly related to the properties of natural data.