Results 1  10
of
28
The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems
, 1992
"... We present an analysis of how the generalization performance (expected test set error) relates to the expected training set error for nonlinear learning systems, such as multilayer perceptrons and radial basis functions. The principal result is the following relationship (computed to second order ..."
Abstract

Cited by 172 (2 self)
 Add to MetaCart
We present an analysis of how the generalization performance (expected test set error) relates to the expected training set error for nonlinear learning systems, such as multilayer perceptrons and radial basis functions. The principal result is the following relationship (computed to second order) between the expected test set and training set errors: hE test ()i 0 hE train ()i + 2oe 2 eff p eff () n : (1) Here, n is the size of the training sample , oe 2 eff is the effective noise variance in the response variable(s), is a regularization or weight decay parameter, and p eff () is the effective number of parameters in the nonlinear model. The expectations h i of training set and test set errors are taken over possible training sets and training and test sets 0 respectively. The effective number of parameters p eff () usually differs from the true number of model parameters p for nonlinear or regularized models; this theoretical conclusion is supported by M...
Prediction risk and architecture selection for neural networks
, 1994
"... Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimati ..."
Abstract

Cited by 73 (2 self)
 Add to MetaCart
Abstract. We describe two important sets of tools for neural network modeling: prediction risk estimation and network architecture selection. Prediction risk is defined as the expected performance of an estimator in predicting new observations. Estimated prediction risk can be used both for estimating the quality of model predictions and for model selection. Prediction risk estimation and model selection are especially important for problems with limited data. Techniques for estimating prediction risk include data resampling algorithms such as nonlinear cross–validation (NCV) and algebraic formulae such as the predicted squared error (PSE) and generalized prediction error (GPE). We show that exhaustive search over the space of network architectures is computationally infeasible even for networks of modest size. This motivates the use of heuristic strategies that dramatically reduce the search complexity. These strategies employ directed search algorithms, such as selecting the number of nodes via sequential network construction (SNC) and pruning inputs and weights via sensitivity based pruning (SBP) and optimal brain damage (OBD) respectively.
Regression Modeling in BackPropagation and Projection Pursuit Learning
, 1994
"... We studied and compared two types of connectionist learning methods for modelfree regression problems in this paper. One is the popular backpropagation learning (BPL) well known in the artificial neural networks literature; the other is the projection pursuit learning (PPL) emerged in recent years ..."
Abstract

Cited by 67 (1 self)
 Add to MetaCart
We studied and compared two types of connectionist learning methods for modelfree regression problems in this paper. One is the popular backpropagation learning (BPL) well known in the artificial neural networks literature; the other is the projection pursuit learning (PPL) emerged in recent years in the statistical estimation literature. Both the BPL and the PPL are based on projections of the data in directions determined from interconnection weights. However, unlike the use of fixed nonlinear activations (usually sigmoidal) for the hidden neurons in BPL, the PPL systematically approximates the unknown nonlinear activations. Moreover, the BPL estimates all the weights simultaneously at each iteration, while the PPL estimates the weights cyclically (neuronbyneuron and layerbylayer) at each iteration. Although the BPL and the PPL have comparable training speed when based on a GaussNewton optimization algorithm, the PPL proves more parsimonious in that the PPL requires a fewer hi...
Constructive Algorithms for Structure Learning in Feedforward Neural Networks for Regression Problems
 IEEE Transactions on Neural Networks
, 1997
"... In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole ..."
Abstract

Cited by 66 (2 self)
 Add to MetaCart
In this survey paper, we review the constructive algorithms for structure learning in feedforward neural networks for regression problems. The basic idea is to start with a small network, then add hidden units and weights incrementally until a satisfactory solution is found. By formulating the whole problem as a state space search, we first describe the general issues in constructive algorithms, with special emphasis on the search strategy. A taxonomy, based on the differences in the state transition mapping, the training algorithm and the network architecture, is then presented. Keywords Constructive algorithm, structure learning, state space search, dynamic node creation, projection pursuit regression, cascadecorrelation, resourceallocating network, group method of data handling. I. Introduction A. Problems with Fixed Size Networks I N recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. Among...
Selecting Neural Network Architectures via the Prediction Risk: Application to Corporate Bond Rating Prediction
 In Proc. of the First Int'l Conf. on AI Applications on Wall Street
, 1991
"... The notion of generalization can be defined precisely as the prediction risk, the expected performance of an estimator on new observations. In this paper, we propose the prediction risk as a measure of the generalization ability of multilayer perceptron networks and use it to select the optimal ne ..."
Abstract

Cited by 33 (2 self)
 Add to MetaCart
The notion of generalization can be defined precisely as the prediction risk, the expected performance of an estimator on new observations. In this paper, we propose the prediction risk as a measure of the generalization ability of multilayer perceptron networks and use it to select the optimal network architecture. The prediction risk must be estimated from the available data; here we approximate the prediction risk by v fold crossvalidation and asymtotic estimates of generalized crossvalidation or Akaike's final prediction error. We apply the technique to the problem of predicting corporate bond ratings. This problem is very attractive as a case study, since it is characterized by the limited availability of the data and by the lack of complete a priori information that could be used to impose a structure to the network architecture. 1 Generalization and Prediction Risk The notion of generalization can be defined precisely as the prediction risk, the expected performance of ...
Impact of Active Dendrites and Structural Plasticity on the Memory Capacity of Neural Tissue
, 2001
"... values averaged over longer timescales. Nevertheless, age capacities for cells with nonlinear subunits and to the extent that shortterm synaptic dynamics are a show that this capacity is accessible to a structural pervasive phenomenon in vivo, involving substantial learning rule that combines rando ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
values averaged over longer timescales. Nevertheless, age capacities for cells with nonlinear subunits and to the extent that shortterm synaptic dynamics are a show that this capacity is accessible to a structural pervasive phenomenon in vivo, involving substantial learning rule that combines random synapse forma changes in synaptic efficacy from moment to moment tion with activitydependent stabilization/elimination. based on the recent activation history of the synapse, In a departure from the common view that memories the straightforward mapping of stable numerical weights are encoded in the overall connection strengths be from a connectionist learning system onto synapses in tween neurons, our results suggest that longterm in the brain becomes more strained (Liaw and Berger, formation storage in neural tissue could reside primar 1996; Abbott et al.
Knowledge Warehouse: An Architectural Integration of Knowledge Management, Decision Support, Artificial Intelligence and Data Warehousing
, 2002
"... Decision support systems (DSS) are becoming increasingly more critical to the daily operation of organizations. Data warehousing, an integral part of this, provides an infrastructure that enables businesses to extract, cleanse, and store vast amounts of data. The basic purpose of a data warehouse is ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
Decision support systems (DSS) are becoming increasingly more critical to the daily operation of organizations. Data warehousing, an integral part of this, provides an infrastructure that enables businesses to extract, cleanse, and store vast amounts of data. The basic purpose of a data warehouse is to empower the knowledge workers with information that allows them to make decisions based on a solid foundation of fact. However, only a fraction of the needed information exists on computers; the vast majority of a firm's intellectual assets exist as knowledge in the minds of its employees. What is needed is a new generation of knowledgeenabled systems that provides the infrastructure needed to capture, cleanse, store, organize, leverage, and disseminate not only data and information but also the knowledge of the firm. The purpose of this paper is to propose, as an extension to the data warehouse model, a knowledge warehouse (KW) architecture that will not only facilitate the capturing and coding of knowledge but also enhance the retrieval and sharing of knowledge across the organization. The knowledge warehouse proposed here suggests a different direction for DSS in the next decade. This new direction is based on an expanded purpose of DSS. That is, the purpose of DSS in knowledge improvement. This expanded purpose of DSS also suggests that the effectiveness of a DSS will, in the future, be measured based on how well it promotes and enhances knowledge, how well it improves the mental model(s) and understanding of the decision maker(s) and thereby how well it improves his/her decision making. D 2002 Elsevier Science B.V. All rights reserved.
Design of Neural Network Filters
 Electronics Institute, Technical University of Denmark
, 1993
"... Emnet for n rv rende licentiatafhandling er design af neurale netv rks ltre. Filtre baseret pa neurale netv rk kan ses som udvidelser af det klassiske line re adaptive lter rettet mod modellering af uline re sammenh nge. Hovedv gten l gges pa en neural netv rks implementering af den ikkerekursive, ..."
Abstract

Cited by 21 (12 self)
 Add to MetaCart
Emnet for n rv rende licentiatafhandling er design af neurale netv rks ltre. Filtre baseret pa neurale netv rk kan ses som udvidelser af det klassiske line re adaptive lter rettet mod modellering af uline re sammenh nge. Hovedv gten l gges pa en neural netv rks implementering af den ikkerekursive, uline re adaptive model med additiv st j. Formalet er at klarl gge en r kke faser forbundet med design af neural netv rks arkitekturer med henblik pa at udf re forskellige \blackbox " modellerings opgaver sa som: System identi kation, invers modellering og pr diktion af tidsserier. De v senligste bidrag omfatter: Formulering af en neural netv rks baseret kanonisk lter repr sentation, der danner baggrund for udvikling af et arkitektur klassi kationssystem. I hovedsagen drejer det sig om en skelnen mellem globale og lokale modeller. Dette leder til at en r kke kendte neurale netv rks arkitekturer kan klassi ceres, og yderligere abnes der mulighed for udvikling af helt nye strukturer. I denne sammenh ng ndes en gennemgang af en r kke velkendte arkitekturer. I s rdeleshed l gges der v gt pa behandlingen af multilags perceptron neural netv rket.
On the regularization of forgetting recursive least square
 IEEE Transactions on Neural Networks
, 1999
"... Abstract — In this paper, the regularization of employing the forgetting recursive least square (FRLS) training technique on feedforward neural networks is studied. We derive our result from the corresponding equations for the expected prediction error and the expected training error. By comparing t ..."
Abstract

Cited by 16 (10 self)
 Add to MetaCart
Abstract — In this paper, the regularization of employing the forgetting recursive least square (FRLS) training technique on feedforward neural networks is studied. We derive our result from the corresponding equations for the expected prediction error and the expected training error. By comparing these error equations with other equations obtained previously from the weight decay method, we have found that the FRLS technique has an effect which is identical to that of using the simple weight decay method. This new finding suggests that the FRLS technique is another online approach for the realization of the weight decay effect. Besides, we have shown that, under certain conditions, both the model complexity and the expected prediction error of the model being trained by the FRLS technique are better than the one trained by the standard RLS method. Index Terms—Feedforward neural network, forgetting recursive least square, model complexity, prediction error, regularization, weight decay. I.
A Theory of CrossValidation Error
, 1993
"... This paper presents a theory of error in crossvalidation testing of algorithms for predicting realvalued attributes. The theory justifies the claim that predicting realvalued attributes requires balancing the conflicting demands of simplicity and accuracy. Furthermore, the theory indicates precis ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
This paper presents a theory of error in crossvalidation testing of algorithms for predicting realvalued attributes. The theory justifies the claim that predicting realvalued attributes requires balancing the conflicting demands of simplicity and accuracy. Furthermore, the theory indicates precisely how these conflicting demands must be balanced, in order to minimize crossvalidation error. A general theory is presented, then it is developed in detail for linear regression and instancebased learning.