Results 1  10
of
10
Regularization Theory and Neural Networks Architectures
 Neural Computation
, 1995
"... We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Ba ..."
Abstract

Cited by 309 (31 self)
 Add to MetaCart
We had previously shown that regularization principles lead to approximation schemes which are equivalent to networks with one layer of hidden units, called Regularization Networks. In particular, standard smoothness functionals lead to a subclass of regularization networks, the well known Radial Basis Functions approximation schemes. This paper shows that regularization networks encompass a much broader range of approximation schemes, including many of the popular general additive models and some of the neural networks. In particular, we introduce new classes of smoothness functionals that lead to different classes of basis functions. Additive splines as well as some tensor product splines can be obtained from appropriate classes of smoothness functionals. Furthermore, the same generalization that extends Radial Basis Functions (RBF) to Hyper Basis Functions (HBF) also leads from additive models to ridge approximation models, containing as special cases Breiman's hinge functions, som...
Prediction risk and architecture selection for neural networks
, 1994
"... Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimati ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimating the quality of model predictions and for model selection. Prediction risk estimation and model selection are especially important for problems with limited data. Techniques for estimating prediction risk include data resampling algorithms such as nonlinear cross–validation (NCV) and algebraic formulae such as the predicted squared error (PSE) and generalized prediction error (GPE). We show that exhaustive search over the space of network architectures is computationally infeasible even for networks of modest size. This motivates the use of heuristic strategies that dramatically reduce the search complexity. These strategies employ directed search algorithms, such as selecting the number of nodes via sequential network construction (SNC) and pruning inputs and weights via sensitivity based pruning (SBP) and optimal brain damage (OBD) respectively.
Constructive Algorithms for Structure Learning in Feedforward Neural Networks for Regression Problems
 IEEE Transactions on Neural Networks
, 1997
"... In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole ..."
Abstract

Cited by 66 (2 self)
 Add to MetaCart
In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole problem as a state space search, we first describe the general issues in constructive algorithms, with special emphasis on the search strategy. A taxonomy, based on the differences in the state transition mapping, the training algorithm and the network architecture, is then presented. Keywords Constructive algorithm, structure learning, state space search, dynamic node creation, projection pursuit regression, cascadecorrelation, resourceallocating network, group method of data handling. I. Introduction A. Problems with Fixed Size Networks I N recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among...
Constructive Feedforward Neural Networks for Regression Problems: A Survey
, 1995
"... In this paper, we review the procedures for constructing feedforward neural networks in regression problems. While standard backpropagation performs gradient descent only in the weight space of a network with fixed topology, constructive procedures start with a small network and then grow additiona ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
In this paper, we review the procedures for constructing feedforward neural networks in regression problems. While standard backpropagation performs gradient descent only in the weight space of a network with fixed topology, constructive procedures start with a small network and then grow additional hidden units and weights until a satisfactory solution is found. The constructive procedures are categorized according to the resultant network architecture and the learning algorithm for the network weights. The Hong Kong University of Science & Technology Technical Report Series Department of Computer Science 1 Introduction In recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among them, the class of multilayer feedforward networks is perhaps the most popular. Standard backpropagation performs gradient descent only in the weight space of a network with fixed topology; this approach is analogous to ...
A Smoothing Regularizer for Feedforward and Recurrent Neural Networks
, 1996
"... We derive a smoothing regularizer for dynamic network models by requiring robustness in prediction performance to perturbations of the training data. The regularizer can be viewed as a generalization of the first order Tikhonov stabilizer to dynamic models. For two layer networks with recurrent conn ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
We derive a smoothing regularizer for dynamic network models by requiring robustness in prediction performance to perturbations of the training data. The regularizer can be viewed as a generalization of the first order Tikhonov stabilizer to dynamic models. For two layer networks with recurrent connections described by Y (t) = f \Gamma WY (t \Gamma ø) + V X(t) \Delta ; Z(t) = UY (t) ; the training criterion with the regularizer is D = 1 N N X t=1 jjZ(t) \Gamma Z (\Phi; I(t))jj 2 + ae ø 2 (\Phi) ; where \Phi = fU; V; Wg is the network parameter set, Z(t) are the targets, I(t) = fX(s); s = 1; 2; \Delta \Delta \Delta ; tg represents the current and all historical input information, N is the size of the training data set, ae ø 2 (\Phi) is the regularizer, and is a regularization parameter. The closedform expression for the regularizer for timelagged recurrent networks is: ae ø (\Phi) = fljjU jjjjV jj 1 \Gamma fljjW jj h 1 \Gamma e fljjW jj\Gamma1 ø i ; ...
Implementing Projection Pursuit Learning
, 1996
"... This paper examines the implementation of projection pursuit regression (PPR) in the context of machine learning and neural networks. We propose a parametric PPR with direct training which achieves improved training speed and accuracy when compared with nonparametric PPR. Analysis and simulations ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
This paper examines the implementation of projection pursuit regression (PPR) in the context of machine learning and neural networks. We propose a parametric PPR with direct training which achieves improved training speed and accuracy when compared with nonparametric PPR. Analysis and simulations are done for heuristics to choose good initial projection directions. A comparison of a projection pursuit learning network with a one hidden layer sigmoidal neural network shows why grouping hidden units in a projection pursuit learning network is useful. Learning robot arm inverse dynamics is used as an example problem.
Smoothing Regularizers for Projective Basis Function Networks
, 1996
"... Smoothing regularizers for radial basis functions have been studied extensively, but no general smoothing regularizers for projective basis functions (PBFs), such as the widelyused sigmoidal PBFs, have heretofore th been proposed. We derive new classes of algebraicallysimple morder smoothing reg ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
Smoothing regularizers for radial basis functions have been studied extensively, but no general smoothing regularizers for projective basis functions (PBFs), such as the widelyused sigmoidal PBFs, have heretofore th been proposed. We derive new classes of algebraicallysimple morder smoothing regularizers for networks N T of projective basis functions f(W, :r) = 5: big [,c v 5 + v/0] + u0, with general transfer functions g[.]. These regularizers are: RG(m,m) = y}u}ll,Jll 2m GlobalForm RL(m,m) = y}u}ll,Jll 2m LocalForm With appropriate constant factors, these regularizers bound the corresponding mt*order smoothing integral Of(W,a: ) 2 In the above expressions, {v j} are the projection vectors, W denotes all the network weights {u j, u0, v j, v0}, and (x) is a weighting function (not necessarily the input density) on the Ddimensional input space. The global and local cases are distinguished by different choices of (x).
Use of Bias Term in Projection Pursuit Learning Improves Approximation and Convergence Properties
 IEEE Trans. Neural Networks
, 1996
"... In a regression problem, one is given a d dimensional random vector X, the components of which are called predictor variables, and a random variable, Y , called response. A regression surface describes a general relationship between variables X and Y . One nonparametric regression technique that h ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
In a regression problem, one is given a d dimensional random vector X, the components of which are called predictor variables, and a random variable, Y , called response. A regression surface describes a general relationship between variables X and Y . One nonparametric regression technique that has been successfully applied to highdimensional data is projection pursuit regression (PPR). In this method, the regression surface is approximated by a sum of empirically determined univariate functions of linear combinations of the predictors. Projection pursuit learning (PPL) proposed by Hwang et al. formulates PPR using a twolayer feedforward neural network. One of the main differences between PPR and PPL is that the smoothers in PPR are nonparametric, whereas those in PPL are based on Hermite functions of some predefined highest order R. While the convergence property of PPR is already known, that for PPL has not been thoroughly studied. In this paper, we demonstrate that PPL networks...
Improving the Approximation and Convergence Capabilities of Projection Pursuit Learning
 Neural Processing Letters
, 1995
"... : Projection pursuit regression (PPR) is a statistical technique that has been successfully applied to highdimensional data. Projection pursuit learning (PPL) formulates PPR in a neural network framework. One major difference between PPR and PPL is that the smoothers in PPR are nonparametric, where ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
: Projection pursuit regression (PPR) is a statistical technique that has been successfully applied to highdimensional data. Projection pursuit learning (PPL) formulates PPR in a neural network framework. One major difference between PPR and PPL is that the smoothers in PPR are nonparametric, whereas those in PPL are based on Hermite functions of some predefined highest order R. While the convergence property of PPR is already known, we demonstrate that PPL networks do not have the universal approximation and strong convergence properties for any finite R. But, by including a bias term in each linear combination of the predictor variables, PPL networks can regain these capabilities, independent of the exact choice of R. It is also shown experimentally that this modification improves the generalization performance in regression problems, and creates smoother decision surfaces for classification problems. INTRODUCTION Multilayer feedforward network is a very popular tool for pattern ...