Results 1  10
of
263
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 314 (31 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Using mutual information for selecting features in supervised neural net learning
 IEEE Transactions on Neural Networks
, 1994
"... AbstractThis paper investigates the application of the mutual infor “ criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variabl ..."
Abstract

Cited by 198 (1 self)
 Add to MetaCart
AbstractThis paper investigates the application of the mutual infor “ criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variables, it is suitable for assessing the “information content ” of features in complex classification tasks, where methods bases on linear relations (like the correlation) are prone to mistakes. The fact that the mutual information is independent of the coordinates chosen permits a robust estimation. Nonetheless, the use of the mutual information for tasks characterized by high input dimensionality requires suitable approximations because of the prohibitive demands on computation and samples. An algorithm is proposed that is based on a “greedy ” selection of the features and that takes both the mutual information with respect to the output class and with respect to the alreadyselected features into account. Finally the results of a series of experiments are discussed. Index TermsFeature extraction, neural network pruning, dimensionality reduction, mutual information, supervised learning,
A Theory of Networks for Approximation and Learning
 Laboratory, Massachusetts Institute of Technology
, 1989
"... Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, t ..."
Abstract

Cited by 195 (24 self)
 Add to MetaCart
Learning an inputoutput mapping from a set of examples, of the type that many neural networks have been constructed to perform, can be regarded as synthesizing an approximation of a multidimensional function, that is solving the problem of hypersurface reconstruction. From this point of view, this form of learning is closely related to classical approximation techniques, such as generalized splines and regularization theory. This paper considers the problems of an exact representation and, in more detail, of the approximation of linear and nonlinear mappings in terms of simpler functions of fewer variables. Kolmogorov's theorem concerning the representation of functions of several variables in terms of functions of one variable turns out to be almost irrelevant in the context of networks for learning. Wedevelop a theoretical framework for approximation based on regularization techniques that leads to a class of threelayer networks that we call Generalized Radial Basis Functions (GRBF), since they are mathematically related to the wellknown Radial Basis Functions, mainly used for strict interpolation tasks. GRBF networks are not only equivalent to generalized splines, but are also closely related to pattern recognition methods suchasParzen windows and potential functions and to several neural network algorithms, suchas Kanerva's associative memory,backpropagation and Kohonen's topology preserving map. They also haveaninteresting interpretation in terms of prototypes that are synthesized and optimally combined during the learning stage. The paper introduces several extensions and applications of the technique and discusses intriguing analogies with neurobiological data.
Gaussian Networks for Direct Adaptive Control
 IEEE Transactions on Neural Networks
, 1991
"... A direct adaptive tracking control architecture is proposed and evaluated for a class of continuous time nonlinear dynamic systems for which an explicit linear parameterization of the uncertainty in the dynamics is either unknown or impossible. The architecture employs a network of gaussian radial ..."
Abstract

Cited by 132 (8 self)
 Add to MetaCart
A direct adaptive tracking control architecture is proposed and evaluated for a class of continuous time nonlinear dynamic systems for which an explicit linear parameterization of the uncertainty in the dynamics is either unknown or impossible. The architecture employs a network of gaussian radial basis functions to adaptively compensate for the plant nonlinearities. Under mild assumptions about the degree of smoothness exhibited by the nonlinear functions, the algorithm is proven to be globally stable, with tracking errors converging to a neighborhood of zero. A constructive procedure is detailed, which directly translates the assumed smoothness properties of the nonlinearities involved into a specification of the network required to represent the plant to a chosen degree of accuracy. A stable weight adjustment mechanism is then determined using Lyapunov theory. The network construction and performance of the resulting controller are illustrated through simulations with example syst...
Multilayer Feedforward Networks With a Nonpolynomial Activation Function Can Approximate Any Function
, 1993
"... Several researchers characterized the activation fimction under which multilayer feedforward networks can act as universal approximators. We show that most of all the characterizations that were reported thus far in the literature are special cases of the following general result: A standard multila ..."
Abstract

Cited by 118 (2 self)
 Add to MetaCart
Several researchers characterized the activation fimction under which multilayer feedforward networks can act as universal approximators. We show that most of all the characterizations that were reported thus far in the literature are special cases of the following general result: A standard multilayer feedforward network with a locally bounded piecewise continuous activation fimction can approximate an3, continuous function to any degree of accuracy if and only if the network's activation function is not a polynomial. We also emphasize the important role of the threshold, asserting that without it the last theorem does not hold.
Dimension Reduction by Local Principal Component Analysis
, 1997
"... Reducing or eliminating statistical redundancy between the components of highdimensional vector data enables a lowerdimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural networ ..."
Abstract

Cited by 101 (0 self)
 Add to MetaCart
Reducing or eliminating statistical redundancy between the components of highdimensional vector data enables a lowerdimensional representation without significant loss of information. Recognizing the limitations of principal component analysis (PCA), researchers in the statistics and neural network communities have developed nonlinear extensions of PCA. This article develops a local linear approach to dimension reduction that provides accurate representations and is fast to compute. We exercise the algorithms on speech and image data, and compare performance with PCA and with neural network implementations of nonlinear PCA. We find that both nonlinear techniques can provide more accurate representations than PCA and show that the local linear techniques outperform neural network implementations.
Networks and the Best Approximation Property
 Biological Cybernetics
, 1989
"... Networks can be considered as approximation schemes. Multilayer networks of the backpropagation type can approximate arbitrarily well continuous functions (Cybenko, 1989# Funahashi, 1989# Stinchcombe and White, 1989). Weprovethatnetworks derived from regularization theory and including Radial Bas ..."
Abstract

Cited by 96 (7 self)
 Add to MetaCart
Networks can be considered as approximation schemes. Multilayer networks of the backpropagation type can approximate arbitrarily well continuous functions (Cybenko, 1989# Funahashi, 1989# Stinchcombe and White, 1989). Weprovethatnetworks derived from regularization theory and including Radial Basis Functions (Poggio and Girosi, 1989), have a similar property.From the point of view of approximation theory, however, the property of approximating continuous functions arbitrarily well is not sufficientforcharacterizing good approximation schemes. More critical is the property of best approximation. The main result of this paper is that multilayer networks, of the type used in backpropagation, are not best approximation. For regularization networks (in particular Radial Basis Function networks) we prove existence and uniqueness of best approximation.
Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization
, 1993
"... ..."
Predictive ApplicationPerformance Modeling in a Computational Grid Environment
, 1999
"... This paper describes and evaluates the application of three local learning algorithms  nearestneighbor, weightedaverage, and locallyweighted polynomial regression  for the prediction of runspecific resourceusage on the basis of runtime input parameters supplied to tools. A twolevel knowl ..."
Abstract

Cited by 60 (12 self)
 Add to MetaCart
This paper describes and evaluates the application of three local learning algorithms  nearestneighbor, weightedaverage, and locallyweighted polynomial regression  for the prediction of runspecific resourceusage on the basis of runtime input parameters supplied to tools. A twolevel knowledge base allows the learning algorithms to track shortterm fluctuations in the performance of computing systems, and the use of instance editing techniques improves the scalability of the performancemodeling system. The learning algorithms assist PUNCH, a networkcomputing system at Purdue University, in emulating an ideal user in terms of its resource management and usage policies. 1. Introduction It is now recognized that the heterogeneous nature of the networkcomputing environment cannot be effectively exploited without some form of adaptive or demanddriven resource management (e.g., [10, 11, 12, 14, 18, 27]). A demanddriven resource management system can be characterized by its a...
Approximating the Semantics of Logic Programs by Recurrent Neural Networks
"... In [18] we have shown how to construct a 3layered recurrent neural network that computes the fixed point of the meaning function TP of a given propositional logic program P, which corresponds to the computation of the semantics of P. In this article we consider the first order case. We define a no ..."
Abstract

Cited by 55 (9 self)
 Add to MetaCart
In [18] we have shown how to construct a 3layered recurrent neural network that computes the fixed point of the meaning function TP of a given propositional logic program P, which corresponds to the computation of the semantics of P. In this article we consider the first order case. We define a notion of approximation for interpretations and prove that there exists a 3layered feed forward neural network that approximates the calculation of TP for a given first order acyclic logic program P with an injective level mapping arbitrarily well. Extending the feed forward network by recurrent connections we obtain a recurrent neural network whose iteration approximates the fixed point of TP. This result is proven by taking advantage of the fact that for acyclic logic programs the function TP is a contraction mapping on a complete metric space defined by the interpretations of the program. Mapping this space to the metric space IR with Euclidean distance, a real valued function fP can be defined which corresponds to TP and is continuous as well as a contraction. Consequently it can be approximated by an appropriately chosen class of feed forward neural networks.