Results 1 - 10
of
42
Regularization Theory and Neural Networks Architectures
- Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract
-
Cited by 257 (30 self)
- Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Boosting Image Retrieval
, 2000
"... We present an approach for image retrieval using a very large number of highly selective features and efficient online learning. Our approach is predicated on the assumption that each image is generated by a sparse set of visual “causes ” and that images which are visually similar share causes. We p ..."
Abstract
-
Cited by 217 (4 self)
- Add to MetaCart
We present an approach for image retrieval using a very large number of highly selective features and efficient online learning. Our approach is predicated on the assumption that each image is generated by a sparse set of visual “causes ” and that images which are visually similar share causes. We propose a mechanism for computing a very large number of highly selective features which capture some aspects of this causal structure (in our implementation there are over 45,000 highly selective features). At query time a user selects a few example images, and a technique known as “boosting ” is used to learn a classification function in this feature space. By construction, the boosting procedure learns a simple classifier which only relies on 20 of the features. As a result a very large database of images can be scanned rapidly, perhaps a million images per second. Finally we will describe a set of experiments performed using our retrieval system on a database of 3000 images.
Exploratory projection pursuit
- Journal of the American Statistical Association
, 1987
"... Exploratory projection pursuit is concerned with finding relatively highly revealing lower dimensional projections of high dimensional data. The intent is to discover views of the multivariate data set that exhibit nonlinear effects-clustering, concentrations near nonlinear manifolds- that are not c ..."
Abstract
-
Cited by 206 (0 self)
- Add to MetaCart
Exploratory projection pursuit is concerned with finding relatively highly revealing lower dimensional projections of high dimensional data. The intent is to discover views of the multivariate data set that exhibit nonlinear effects-clustering, concentrations near nonlinear manifolds- that are not captured by the linear correlation structure. This paper presents a new algorithm for this purpose that has both statistical and computational advantages over previous methods. A connection to density estimation is established. Examples are presented and issues related to practical application are discussed.
Objective Function Formulation of the BCM Theory of Visual Cortical Plasticity: Statistical Connections, Stability Conditions
- NEURAL NETWORKS
, 1992
"... In this paper, we present an objective function formulation of the BCM theory of visual cortical plasticity that permits us to demonstrate the connection between the unsupervised BCM learning procedure and various statistical methods, in particular, that of Projection Pursuit. This formulation provi ..."
Abstract
-
Cited by 77 (33 self)
- Add to MetaCart
In this paper, we present an objective function formulation of the BCM theory of visual cortical plasticity that permits us to demonstrate the connection between the unsupervised BCM learning procedure and various statistical methods, in particular, that of Projection Pursuit. This formulation provides a general method for stability analysis of the fixed points of the theory and enables us to analyze the behavior and the evolution of the network under various visual rearing conditions. It also allows comparison with many existing unsupervised methods. This model has been shown successful in various applications such as phoneme and 3D object recognition. We thus have the striking and possibly highly significant result that a biological neuron is performing a sophisticated statistical procedure.
Multiple Regimes in Northern Hemisphere Height Fields via Mixture Model Clustering
- J. Atmos. Sci
, 1998
"... Mixture model clustering is applied to Northern Hemisphere (NH) 700-mb geopotential height anomalies. A mixture model is a flexible probability density estimation technique, consisting of a linear combination of k component densities. A key feature of the mixture modeling approach to clustering is t ..."
Abstract
-
Cited by 37 (24 self)
- Add to MetaCart
Mixture model clustering is applied to Northern Hemisphere (NH) 700-mb geopotential height anomalies. A mixture model is a flexible probability density estimation technique, consisting of a linear combination of k component densities. A key feature of the mixture modeling approach to clustering is the ability to estimate a posterior probability distribution for k, the number of clusters, given the data and the model, and thus objectively determine the number of clusters that is most likely to fit the data. A data set of 44 winters of NH 700-mb fields is projected onto its two leading empirical orthogonal functions (EOFs) and analyzed using mixtures of Gaussian components. Crossvalidated likelihood is used to determine the best value of k, the number of clusters. The posterior probability so determined peaks at k = 3 and thus yields clear evidence for 3 clusters in the NH 700-mb data. The 3-cluster result is found to be robust with respect to variations in data preprocessing and data an...
Supervised Classification in High Dimensional Space: Geometrical, Statistical, and Asymptotical Properties of Multivariate Data
- IEEE Transactions on System, Man, and Cybernetics, Volume 28, Part C
, 1998
"... © 1997 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other w ..."
Abstract
-
Cited by 35 (4 self)
- Add to MetaCart
© 1997 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This paper appeared in the IEEE Transactions on Systems, Man, and
A review of dimension reduction techniques
, 1997
"... The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in high-dimensional spaces and as a modelling tool for such data. It is defined as the search for a low-dimensional manifold that embeds the high-dimensional data. A cl ..."
Abstract
-
Cited by 29 (4 self)
- Add to MetaCart
The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in high-dimensional spaces and as a modelling tool for such data. It is defined as the search for a low-dimensional manifold that embeds the high-dimensional data. A classification of dimension reduction problems is proposed. A survey of several techniques for dimension reduction is given, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen’s maps or the generalised topographic mapping. Neural network implementations for several of these techniques are also reviewed, such as the projection pursuit learning network and the BCM neuron with an objective function. Several appendices complement the mathematical treatment of the main text.
Geometric Methods for Feature Extraction and Dimensional Reduction
- In L. Rokach and O. Maimon (Eds.), Data
, 2005
"... Abstract We give a tutorial overview of several geometric methods for feature extraction and dimensional reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component anal ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
Abstract We give a tutorial overview of several geometric methods for feature extraction and dimensional reduction. We divide the methods into projective methods and methods that model the manifold on which the data lies. For projective methods, we review projection pursuit, principal component analysis (PCA), kernel PCA, probabilistic PCA, and oriented PCA; and for the manifold methods, we review multidimensional scaling (MDS), landmark MDS, Isomap, locally linear embedding, Laplacian eigenmaps and spectral clustering. The Nyström method, which links several of the algorithms, is also reviewed. The goal is to provide a self-contained review of the concepts and mathematics underlying these algorithms.
Learning as Extraction of Low-Dimensional Representations
- Mechanisms of Perceptual Learning
, 1996
"... Psychophysical findings accumulated over the past several decades indicate that perceptual tasks such as similarity judgment tend to be performed on a low-dimensional representation of the sensory data. Low dimensionality is especially important for learning, as the number of examples required for a ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
Psychophysical findings accumulated over the past several decades indicate that perceptual tasks such as similarity judgment tend to be performed on a low-dimensional representation of the sensory data. Low dimensionality is especially important for learning, as the number of examples required for attaining a given level of performance grows exponentially with the dimensionality of the underlying representation space. In this chapter, we argue that, whereas many perceptual problems are tractable precisely because their intrinsic dimensionality is low, the raw dimensionality of the sensory data is normally high, and must be reduced by a nontrivial computational process, which, in itself, may involve learning. Following a survey of computational techniques for dimensionality reduction, we show that it is possible to learn a low-dimensional representation that captures the intrinsic low-dimensional nature of certain classes of visual objects, thereby facilitating further learning of tasks...
Searching for Filters With "Interesting" Output Distributions: An Uninteresting Direction to Explore?
- Network
, 1996
"... . It has been proposed that the receptive fields of neurons in V1 are optimised to generate "sparse", Kurtotic, or "interesting" output probability distributions (Barlow & Tolhurst, 1992; Barlow, 1994; Field, 1994; Intrator & Cooper, 1991; Intrator, 1992). We investigate the empirical evidence for t ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
. It has been proposed that the receptive fields of neurons in V1 are optimised to generate "sparse", Kurtotic, or "interesting" output probability distributions (Barlow & Tolhurst, 1992; Barlow, 1994; Field, 1994; Intrator & Cooper, 1991; Intrator, 1992). We investigate the empirical evidence for this further and argue that filters can produce "interesting" output distributions simply because natural images have variable local intensity variance. If the proposed filters have zero D.C., then the probability distribution of filter outputs (and hence the output Kurtosis) is well predicted simply from these effects of variable local variance. This suggests that finding filters with high output Kurtosis does not necessarily signal interesting image structure. It is then argued that finding filters that maximise output Kurtosis generates filters that are incompatible with observed physiology. In particular the optimal difference--of--Gaussian (DOG) filter should have the smallest possible s...

