Results 1  10
of
49
A Theory of Networks for Approximation and Learning
 Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract

Cited by 237 (25 self)
 Add to MetaCart
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
Exploring strategies for training deep neural networks
 Journal of Machine Learning Research
"... Département d’informatique et de recherche opérationnelle ..."
Abstract

Cited by 88 (12 self)
 Add to MetaCart
(Show Context)
Département d’informatique et de recherche opérationnelle
Predictability and redundancy of natural images
 Journal of the Optical Society of Americu A
, 1987
"... One aspect of human image understanding is the ability to estimate missing parts of a natural image. This ability depends on the redundancy of the representation used to describe the class of images. In 1951, Shannon [Bell. Syst. Tech. J. 30,50 (1951)] showed how to estimate bounds on the entropy an ..."
Abstract

Cited by 51 (1 self)
 Add to MetaCart
One aspect of human image understanding is the ability to estimate missing parts of a natural image. This ability depends on the redundancy of the representation used to describe the class of images. In 1951, Shannon [Bell. Syst. Tech. J. 30,50 (1951)] showed how to estimate bounds on the entropy and redundancy of an information source from predictability data. The entropy, in turn, gives a measure of the limits to errorfree information compaction. An experiment was devised in which human observers interactively restored missing gray levels from 128 X 128 pixel pictures with 16 gray levels. For eight images, the redundancy ranged from 46%, for a complicated picture of foliage, to 7 4%, for a picture of a face. For almostcomplete pictures, but not for noisy pictures, this performance can be matched by a nearestneighbor predictor. One of the distinguishing characteristics of intelligent systems is the ability to make accurate and reliable predictions from partial data. Our own ability to interpret the images that our eyes receive involves making inferences about the environmental causes of image intensities, often from incomplete data. This ability to make predictions or inferences depends on the existence of statistical dependencies or
FeedForward Neural Networks and Topographic Mappings for Exploratory Data Analysis
 Neural Computing and Applications
, 1996
"... A recent novel approach to the visualisation and analysis of datasets, and one which is particularly applicable to those of a high dimension, is discussed in the context of real applications. A feedforward neural network is utilised to effect a topographic, structurepreserving, dimensionreducing ..."
Abstract

Cited by 50 (2 self)
 Add to MetaCart
A recent novel approach to the visualisation and analysis of datasets, and one which is particularly applicable to those of a high dimension, is discussed in the context of real applications. A feedforward neural network is utilised to effect a topographic, structurepreserving, dimensionreducing transformation of the data, with an additional facility to incorporate different degrees of associated subjective information. The properties of this transformation are illustrated on synthetic and real datasets, including the 1992 UK Research Assessment Exercise for funding in higher education. The method is compared and contrasted to established techniques for feature extraction, and related to topographic mappings, the Sammon projection and the statistical field of multidimensional scaling. 1 INTRODUCTION The visualisation and analysis of highdimensional data is a difficult problem and one that may be helpfully viewed in the context of feature extraction, which provides a useful commo...
Bayesian Neural Networks and Density Networks
 Nuclear Instruments and Methods in Physics Research, A
, 1994
"... This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent ..."
Abstract

Cited by 47 (7 self)
 Add to MetaCart
(Show Context)
This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent variable model is dened that is capable of discovering the underlying dimensionality of a data set. A Bayesian learning algorithm for these networks is derived and demonstrated. 1 Introduction to the Bayesian view of learning A binary classier is a parameterized mapping from an input x to an output y 2 [0; 1]); when its parameters w are specied, the classier states the probability that an input x belongs to class t = 1, rather than the alternative t = 0. Consider a binary classier which models the probability as a sigmoid function of x: P (t = 1jx; w;H) = y(x; w;H) = 1 1 + e wx (1) This form of model is known to statisticians as a linear logistic model, and in the neural networks ...
Computational consequences of a bias toward short connections
 Journal of Cognitive Neuroscience
, 1992
"... Abstract. A fundamenral observation in the neurosciences is that the brain is a modular system in which different regions perform different tasks. Recent evidence, however, raises questions about the accuracy of this characterization with respect to neonates. One possible interpretation of this evid ..."
Abstract

Cited by 41 (0 self)
 Add to MetaCart
Abstract. A fundamenral observation in the neurosciences is that the brain is a modular system in which different regions perform different tasks. Recent evidence, however, raises questions about the accuracy of this characterization with respect to neonates. One possible interpretation of this evidence is that certain aspects of the modular organization of the adult brain arise developmenrally. To explore this hypothesis we wish to characterize the computational principles that underlie the development of modular systems. In previous work we have considered computational schemes that allow a learning system to discover the modular structure that is present in the environment (Jacobs, Jordan, & Barto, 1991). In the current paper we present a complementary approach in which the development of modularity is due to an architectural bias in the learner. In particular, we examine the computational consequences of a simple architectural bias toward shortrange connections. We present simulations that show that systems that learn under the influence of such a bias have a number of desirable properties including a tendency to decompose tasks into subtasks, to decouple the dynamics of recurrent subsystems, and to develop locationsensitive internal representations. Furthermore, the system s units develop local receptive and projective fields, and the system develops characteristics that are typically associated with topographic maps.
Unsupervised Neural Network Learning Procedures . . .
, 1996
"... In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: informationpreserving methods, densi ..."
Abstract

Cited by 31 (2 self)
 Add to MetaCart
(Show Context)
In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: informationpreserving methods, density estimation methods, and feature extraction methods. Each of these major sections concludes with a discussion of successful applications of the methods to realworld problems.
ConceptLearning In The Absence Of CounterExamples: An AutoassociationBased Approach To Classification
, 1999
"... The overwhelming majority of research currently pursued within the framework of conceptlearning concentrates on discriminationbased learning, an inductive learning paradigm that relies on both examples and counterexamples of the concept. This emphasis, however, can present a practical problem: th ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
The overwhelming majority of research currently pursued within the framework of conceptlearning concentrates on discriminationbased learning, an inductive learning paradigm that relies on both examples and counterexamples of the concept. This emphasis, however, can present a practical problem: there are realworld engineering problems for which counterexamples are both scarce and difficult to gather. For these problems, recognitionbased learning systems are much more appropriate because they do not use counterexamples in the conceptlearning phase. The purpose of this dissertation is to analyze a connectionist recognitionbased learning systemautoassociationbased classificationand answer the following questions: ffl What features of the autoassociator make it ca...
Continuous latent variable models for dimensionality reduction and sequential data reconstruction
, 2001
"... ..."
Learning population codes by minimizing description length. Neural Computation
, 1994
"... The Minimum Description Length principle (MDL) can be used to train the hidden units of a neural network to extract a representation that is cheap to describe but nonetheless allows the input to be reconstructed accurately. We show how MDL can be used to develop highly redundant population codes. Ea ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
(Show Context)
The Minimum Description Length principle (MDL) can be used to train the hidden units of a neural network to extract a representation that is cheap to describe but nonetheless allows the input to be reconstructed accurately. We show how MDL can be used to develop highly redundant population codes. Each hidden unit has a location in a lowdimensional implicit space. If the hidden unit activities form a bump of a standard shape in this space, they can be cheaply encoded by the center of this bump. So the weights from the input units to the hidden units in an autoencoder are trained to make the activities form a standard bump. The coordinates of the hidden units in the implicit space are also learned, thus allowing flexibility, as the network develops a discontinuous topography when presented with different input classes. 1