Results 1 - 10
of
190
Nonlinear component analysis as a kernel eigenvalue problem
-
, 1996
"... We describe a new method for performing a nonlinear form of Principal Component Analysis. By the use of integral operator kernel functions, we can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear map; for instance the space of all ..."
Abstract
-
Cited by 775 (63 self)
- Add to MetaCart
We describe a new method for performing a nonlinear form of Principal Component Analysis. By the use of integral operator kernel functions, we can efficiently compute principal components in high-dimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible 5-pixel products in 16x16 images. We give the derivation of the method, along with a discussion of other techniques which can be made nonlinear with the kernel approach; and present first experimental results on nonlinear feature extraction for pattern recognition.
The "Independent Components" of Natural Scenes are Edge Filters
, 1997
"... It has previously been suggested that neurons with line and edge selectivities found in primary visual cortex of cats and monkeys form a sparse, distributed representation of natural scenes, and it has been reasoned that such responses should emerge from an unsupervised learning algorithm that attem ..."
Abstract
-
Cited by 381 (24 self)
- Add to MetaCart
It has previously been suggested that neurons with line and edge selectivities found in primary visual cortex of cats and monkeys form a sparse, distributed representation of natural scenes, and it has been reasoned that such responses should emerge from an unsupervised learning algorithm that attempts to find a factorial code of independent visual features. We show here that a new unsupervised learning algorithm based on information maximization, a nonlinear "infomax" network, when applied to an ensemble of natural scenes produces sets of visual filters that are localized and oriented. Some of these filters are Gabor-like and resemble those produced by the sparseness-maximization network. In addition, the outputs of these filters are as independent as possible, since this infomax network performs Independent Components Analysis or ICA, for sparse (super-gaussian) component distributions. We compare the resulting ICA filters and their associated basis functions, with other decorrelating filters produced by Principal Components Analysis (PCA) and zero-phase whitening filters (ZCA). The ICA filters have more sparsely distributed (kurtotic) outputs on natural scenes. They also resemble the receptive fields of simple cells in visual cortex, which suggests that these neurons form a natural, information-theoretic
Regularization networks and support vector machines
- Advances in Computational Mathematics
, 2000
"... Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization a ..."
Abstract
-
Cited by 215 (28 self)
- Add to MetaCart
Regularization Networks and Support Vector Machines are techniques for solving certain problems of learning from examples – in particular the regression problem of approximating a multivariate function from sparse data. Radial Basis Functions, for example, are a special case of both regularization and Support Vector Machines. We review both formulations in the context of Vapnik’s theory of statistical learning which provides a general foundation for the learning problem, combining functional analysis and statistics. The emphasis is on regression: classification is treated as a special case.
Learning Overcomplete Representations
, 2000
"... In an overcomplete basis, the number of basis vectors is greater than the dimensionality of the input, and the representation of an input is not a unique combination of basis vectors. Overcomplete representations have been advocated because they have greater robustness in the presence of noise, can ..."
Abstract
-
Cited by 188 (8 self)
- Add to MetaCart
In an overcomplete basis, the number of basis vectors is greater than the dimensionality of the input, and the representation of an input is not a unique combination of basis vectors. Overcomplete representations have been advocated because they have greater robustness in the presence of noise, can be sparser, and can have greater flexibility in matching structure in the data. Overcomplete codes have also been proposed as a model of some of the response properties of neurons in primary visual cortex. Previous work has focused on finding the best representation of a signal using a fixed overcomplete basis (or dictionary). We present an algorithm for learning an overcomplete basis by viewing it as probabilistic model of the observed data. We show that overcomplete bases can yield a better approximation of the underlying statistical distribution of the data and can thus lead to greater coding efficiency. This can be viewed as a generalization of the technique of independent component analysis and provides a method for Bayesian reconstruction of signals in the presence of noise and for blind source separation when there are more sources than mixtures.
Learning to Probabilistically Identify Authoritative Documents
- In Proceedings of the 17th International Conference on Machine Learning
, 2000
"... We describe a model of document citation that learns to identify hubs and authorities in a set of linked documents, such as pages retrieved from the world wide web, or papers retrieved from a research paper archive. Unlike the popular HITS algorithm, which relies on dubious statistical assumpt ..."
Abstract
-
Cited by 109 (2 self)
- Add to MetaCart
We describe a model of document citation that learns to identify hubs and authorities in a set of linked documents, such as pages retrieved from the world wide web, or papers retrieved from a research paper archive. Unlike the popular HITS algorithm, which relies on dubious statistical assumptions, our model provides probabilistic estimates that have clear semantics. We also find that in general, the identified authoritative documents correspond better to human intuition. 1. Introduction Bibliometrics has been described as a "series of techniques that seek to quantify the process of written communication" (Ikpaahindi, 1985). It typically attempts to give quantified answers to questions involving the relationships among documents, or authors and documents: "Who are the most authoritative authors in this field?" "What are the seminal papers?" "How many distinct communities are studying this subject?" and many others (see White & McCain, 1989 for details). Traditionally, the s...
Convolutive Blind Separation of Non-Stationary
"... Acoustic signals recorded simultaneously in a reverberant environment can be described as sums of differently convolved sources. The task of source separation is to identify the multiple channels and possibly to invert those in order to obtain estimates of the underlying sources. We tackle the probl ..."
Abstract
-
Cited by 86 (3 self)
- Add to MetaCart
Acoustic signals recorded simultaneously in a reverberant environment can be described as sums of differently convolved sources. The task of source separation is to identify the multiple channels and possibly to invert those in order to obtain estimates of the underlying sources. We tackle the problem by explicitly exploiting the nonstationarity of the acoustic sources. Changing cross-correlations at multiple times give a sufficient set of constraints for the unknown channels. A least squares optimization allows us to estimate a forward model, identifying thus the multi-path channel. In the same manner we can find an FIR backward model, which generates well separated model sources. Furthermore, for more than three channels we have sufficient conditions to estimate underlying additive sensor noise powers. We show good performance in real room environments and demonstrate the algorithm's utility for automatic speech recognition.
A Probabilistic Framework for the Adaptation and Comparison of Image Codes
- J. Opt. Soc. Am. A
, 1999
"... ..."
Independent Component Representations for Face Recognition
"... In a task such as face recognition, much of the important information may be contained in the high-order relationships among the image pixels. A number of face recognition algorithms employ principal component analysis (PCA), which is based on the second-order statistics of the image set, and does n ..."
Abstract
-
Cited by 82 (8 self)
- Add to MetaCart
In a task such as face recognition, much of the important information may be contained in the high-order relationships among the image pixels. A number of face recognition algorithms employ principal component analysis (PCA), which is based on the second-order statistics of the image set, and does not address high-order statistical dependencies such as the relationships among three or more pixels. Independent component analysis (ICA) is a generalization of PCA which separates the high-order moments of the input in addition to the second-order moments. ICA was performed on a set of face images by an unsupervised learning algorithm derived from the principle of optimal information transfer through sigmoidal neurons. 1 The algorithm maximizes the mutual information between the input and the output, which produces statistically independent outputs under certain conditions. ICA was performed on the face images under two different architectures. The first architecture provided a statistica...
A Unifying Information-theoretic Framework for Independent Component Analysis
, 1999
"... We show that different theories recently proposed for Independent Component Analysis (ICA) lead to the same iterative learning algorithm for blind separation of mixed independent sources. We review those theories and suggest that information theory can be used to unify several lines of research. Pea ..."
Abstract
-
Cited by 74 (5 self)
- Add to MetaCart
We show that different theories recently proposed for Independent Component Analysis (ICA) lead to the same iterative learning algorithm for blind separation of mixed independent sources. We review those theories and suggest that information theory can be used to unify several lines of research. Pearlmutter and Parra (1996) and Cardoso (1997) showed that the infomax approach of Bell and Sejnowski (1995) and the maximum likelihood estimation approach are equivalent. We show that negentropy maximization also has equivalent properties and therefore all three approaches yield the same learning rule for a fixed nonlinearity. Girolami and Fyfe (1997a) have shown that the nonlinear Principal Component Analysis (PCA) algorithm of Karhunen and Joutsensalo (1994) and Oja (1997) can also be viewed from information-theoretic principles since it minimizes the sum of squares of the fourth-order marginal cumulants and therefore approximately minimizes the mutual information (Comon, 1994). Lambert (19...
Multichannel Blind Deconvolution and Equalization Using the Natural Gradient
- In The First Signal Processing Workshop on Signal Processing Advances in Wireless Communications
, 1997
"... Multichannel deconvolution and equalization is an important task for numerous applications in communications, signal processing, and control. In this paper, we extend the efficient natural gradient search method in [1] to derive a set of online algorithms for combined multichannel blind source separ ..."
Abstract
-
Cited by 70 (21 self)
- Add to MetaCart
Multichannel deconvolution and equalization is an important task for numerous applications in communications, signal processing, and control. In this paper, we extend the efficient natural gradient search method in [1] to derive a set of online algorithms for combined multichannel blind source separation and time-domain deconvolution/equalization of additive, convolved signal mixtures. Through formal analysis, we prove that the doubly-infinite multichannel equalizer based on the maximum entropy cost function with natural gradient possesses the so-called "equivariance property" such that its asymptotic performance depends on the normalized stochastic distribution of the source signals and not on the mixing characteristics of the unknown channel. We also provide the necessary approximations to enable a computationallysimple finite-impulse-response implementation of the natural-gradient-based multichannel deconvolution scheme. Simulations indicate the ability of the algorithm to perform e...

