Results 1  10
of
26
A Theory of Networks for Approximation and Learning
 Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract

Cited by 194 (24 self)
 Add to MetaCart
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
FeedForward Neural Networks and Topographic Mappings for Exploratory Data Analysis
 Neural Computing and Applications
, 1996
"... A recent novel approach to the visualisation and analysis of datasets, and one which is particularly applicable to those of a high dimension, is discussed in the context of real applications. A feedforward neural network is utilised to effect a topographic, structurepreserving, dimensionreducing ..."
Abstract

Cited by 42 (2 self)
 Add to MetaCart
A recent novel approach to the visualisation and analysis of datasets, and one which is particularly applicable to those of a high dimension, is discussed in the context of real applications. A feedforward neural network is utilised to effect a topographic, structurepreserving, dimensionreducing transformation of the data, with an additional facility to incorporate different degrees of associated subjective information. The properties of this transformation are illustrated on synthetic and real datasets, including the 1992 UK Research Assessment Exercise for funding in higher education. The method is compared and contrasted to established techniques for feature extraction, and related to topographic mappings, the Sammon projection and the statistical field of multidimensional scaling. 1 INTRODUCTION The visualisation and analysis of highdimensional data is a difficult problem and one that may be helpfully viewed in the context of feature extraction, which provides a useful commo...
Exploring strategies for training deep neural networks
 Journal of Machine Learning Research
"... Département d’informatique et de recherche opérationnelle ..."
Abstract

Cited by 41 (8 self)
 Add to MetaCart
Département d’informatique et de recherche opérationnelle
Bayesian Neural Networks and Density Networks
 Nuclear Instruments and Methods in Physics Research, A
, 1994
"... This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent ..."
Abstract

Cited by 39 (8 self)
 Add to MetaCart
This paper reviews the Bayesian approach to learning in neural networks, then introduces a new adaptive model, the density network. This is a neural network for which target outputs are provided, but the inputs are unspecied. When a probability distribution is placed on the unknown inputs, a latent variable model is dened that is capable of discovering the underlying dimensionality of a data set. A Bayesian learning algorithm for these networks is derived and demonstrated. 1 Introduction to the Bayesian view of learning A binary classier is a parameterized mapping from an input x to an output y 2 [0; 1]); when its parameters w are specied, the classier states the probability that an input x belongs to class t = 1, rather than the alternative t = 0. Consider a binary classier which models the probability as a sigmoid function of x: P (t = 1jx; w;H) = y(x; w;H) = 1 1 + e wx (1) This form of model is known to statisticians as a linear logistic model, and in the neural networks ...
Learning population codes by minimizing description length. Neural Computation
, 1994
"... The Minimum Description Length principle (MDL) can be used to train the hidden units of a neural network to extract a representation that is cheap to describe but nonetheless allows the input to be reconstructed accurately. We show how MDL can be used to develop highly redundant population codes. Ea ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
The Minimum Description Length principle (MDL) can be used to train the hidden units of a neural network to extract a representation that is cheap to describe but nonetheless allows the input to be reconstructed accurately. We show how MDL can be used to develop highly redundant population codes. Each hidden unit has a location in a lowdimensional implicit space. If the hidden unit activities form a bump of a standard shape in this space, they can be cheaply encoded by the center of this bump. So the weights from the input units to the hidden units in an autoencoder are trained to make the activities form a standard bump. The coordinates of the hidden units in the implicit space are also learned, thus allowing flexibility, as the network develops a discontinuous topography when presented with different input classes. 1
Unsupervised Neural Network Learning Procedures . . .
, 1996
"... In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: informationpreserving methods, densi ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
In this article, we review unsupervised neural network learning procedures which can be applied to the task of preprocessing raw data to extract useful features for subsequent classification. The learning algorithms reviewed here are grouped into three sections: informationpreserving methods, density estimation methods, and feature extraction methods. Each of these major sections concludes with a discussion of successful applications of the methods to realworld problems.
ConceptLearning In The Absence Of CounterExamples: An AutoassociationBased Approach To Classification
, 1999
"... The overwhelming majority of research currently pursued within the framework of conceptlearning concentrates on discriminationbased learning, an inductive learning paradigm that relies on both examples and counterexamples of the concept. This emphasis, however, can present a practical problem: th ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
The overwhelming majority of research currently pursued within the framework of conceptlearning concentrates on discriminationbased learning, an inductive learning paradigm that relies on both examples and counterexamples of the concept. This emphasis, however, can present a practical problem: there are realworld engineering problems for which counterexamples are both scarce and difficult to gather. For these problems, recognitionbased learning systems are much more appropriate because they do not use counterexamples in the conceptlearning phase. The purpose of this dissertation is to analyze a connectionist recognitionbased learning systemautoassociationbased classificationand answer the following questions: ffl What features of the autoassociator make it ca...
Image Redundancy Reduction for Neural Network Classification using Discrete Cosine Transforms
 in Proceedings of the International Joint Conference on Neural Networks
, 2000
"... High information redundancy and strong correlations in face images result in inefficiencies when such images are used directly in recognition tasks. In this paper, Discrete Cosine Transforms (DCTs) are used to reduce image information redundancy because only a subset of the transform coefficients ar ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
High information redundancy and strong correlations in face images result in inefficiencies when such images are used directly in recognition tasks. In this paper, Discrete Cosine Transforms (DCTs) are used to reduce image information redundancy because only a subset of the transform coefficients are necessary to preserve the most important facial features, such as hair outline, eyes and mouth. We demonstrate experimentally that when DCT coefficients are fed into a backpropagation neural network for classification, high recognition rates can be achieved using only a small proportion (0.19%) of available transform components. This makes DCTbased face recognition more than two orders of magnitude faster than other approaches.
High Speed Face Recognition Based on Discrete Cosine Transforms and Neural Networks
, 1999
"... High information redundancy and correlation in face images result in ineciencies when such images are used directly for recognition. In this paper, discrete cosine transforms are used to reduce image information redundancy because only a subset of the transform coecients are necessary to preserve th ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
High information redundancy and correlation in face images result in ineciencies when such images are used directly for recognition. In this paper, discrete cosine transforms are used to reduce image information redundancy because only a subset of the transform coecients are necessary to preserve the most important facial features such as hair outline, eyes and mouth. We demonstrate experimentally that when DCT coecients are fed into a backpropagation neural network for classi cation, a high recognition rate can be achieved by using a very small proportion of transform coecients. This makes DCTbased face recognition much faster than other approaches. Key words: Face recognition, neural networks, feature extraction, discrete cosine transform. 1 Introduction High information redundancy present in face images results in ineciencies when these images are used directly for recognition, identi cation and classi cation. Typically one builds a computational model to transform pixel i...
Exploring CaseBased Building Design  CADRE
 In Artificial Intelligence in Engineering Design, Analysis, and Manufacturing
, 1993
"... Casebased design promises important advantages over rulebased design systems. However, the actual implementation of the paradigm poses many problems which put the advantages into question. In our work on CADRE, a casebased building design system, we have encountered seven fundamental problems whi ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Casebased design promises important advantages over rulebased design systems. However, the actual implementation of the paradigm poses many problems which put the advantages into question. In our work on CADRE, a casebased building design system, we have encountered seven fundamental problems which we think are common to most casebased design systems. We describe the problems and the ways we either solved or worked around them in the CADRE system. This leads us to conclusions about the general applicability of casebased reasoning to building design.