Results 1  10
of
13
Improving Regression Estimation: Averaging Methods for Variance Reduction with Extensions to General Convex Measure Optimization
, 1993
"... ..."
Learning in Linear Neural Networks: a Survey
 IEEE Transactions on neural networks
, 1995
"... Networks of linear units are the simplest kind of networks, where the basic questions related to learning, generalization, and selforganisation can sometimes be answered analytically. We survey most of the known results on linear networks, including: (1) backpropagation learning and the structure ..."
Abstract

Cited by 56 (4 self)
 Add to MetaCart
Networks of linear units are the simplest kind of networks, where the basic questions related to learning, generalization, and selforganisation can sometimes be answered analytically. We survey most of the known results on linear networks, including: (1) backpropagation learning and the structure of the error function landscape; (2) the temporal evolution of generalization; (3) unsupervised learning algorithms and their properties. The connections to classical statistical ideas, such as principal component analysis (PCA), are emphasized as well as several simple but challenging open questions. A few new results are also spread across the paper, including an analysis of the effect of noise on backpropagation networks and a unified view of all unsupervised algorithms. Keywords linear networks, supervised and unsupervised learning, Hebbian learning, principal components, generalization, local minima, selforganisation I. Introduction This paper addresses the problems of supervise...
Learning Goal Oriented Bayesian Networks for Telecommunications Risk Management
 In Proceedings of the 13th International Conference on Machine Learning
, 1996
"... This paper discusses issues related to Bayesian network model learning for unbalanced binary classification tasks. In general, the primary focus of current research on Bayesian network learning systems (e.g., K2 and its variants) is on the creation of the Bayesian network structure that fits the dat ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
This paper discusses issues related to Bayesian network model learning for unbalanced binary classification tasks. In general, the primary focus of current research on Bayesian network learning systems (e.g., K2 and its variants) is on the creation of the Bayesian network structure that fits the database best. It turns out that when applied with a specific purpose in mind, such as classification, the performance of these network models may be very poor. We demonstrate that Bayesian network models should be created to meet the specific goal or purpose intended for the model. We first present a goaloriented algorithm for constructing Bayesian networks for predicting uncollectibles in telecommunications riskmanagement datasets. Second, we argue and demonstrate that current Bayesian network learning methods may fail to perform satisfactorily in real life applications since they do not learn models tailored to a specific goal or purpose. Third, we discuss the performance of "goal oriented"...
Automatic Early Stopping Using Cross Validation: Quantifying the Criteria
 Neural Networks
, 1997
"... Cross validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting ("early stopping"). The exact criterion used for cross validation based early stopping, however, is chosen in an adhoc ..."
Abstract

Cited by 25 (0 self)
 Add to MetaCart
Cross validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting ("early stopping"). The exact criterion used for cross validation based early stopping, however, is chosen in an adhoc fashion by most researchers or training is stopped interactively. To aid a more wellfounded selection of the stopping criterion, 14 different automatic stopping criteria from 3 classes were evaluated empirically for their efficiency and effectiveness in 12 different classification and approximation tasks using multi layer perceptrons with RPROP training. The experiments show that on the average slower stopping criteria allow for small improvements in generalization (on the order of 4%), but cost about factor 4 longer training time. 1 Training for generalization When training a neural network, one is usually interested in obtaining a network with optimal generalization performance. Genera...
Early Stopping  but when?
 Neural Networks: Tricks of the Trade, volume 1524 of LNCS, chapter 2
, 1997
"... . Validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting ("early stopping"). The exact criterion used for validationbased early stopping, however, is usually chosen in an adhoc fa ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
. Validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting ("early stopping"). The exact criterion used for validationbased early stopping, however, is usually chosen in an adhoc fashion or training is stopped interactively. This trick describes how to select a stopping criterion in a systematic fashion; it is a trick for either speeding learning procedures or improving generalization, whichever is more important in the particular situation. An empirical investigation on multilayer perceptrons shows that there exists a tradeoff between training time and generalization: From the given mix of 1296 training runs using different 12 problems and 24 different network architectures I conclude slower stopping criteria allow for small improvements in generalization (here: about 4% on average), but cost much more training time (here: about factor 4 longer on average). 1 Early ...
No Free Lunch for Early Stopping
, 1999
"... Introduction Early stopping of training is one of the methods that aim to prevent overtraining due to too powerful a model class, noisy training examples, or a small training set. We study early stopping at a predetermined training error level. If there is no prior information other than the traini ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Introduction Early stopping of training is one of the methods that aim to prevent overtraining due to too powerful a model class, noisy training examples, or a small training set. We study early stopping at a predetermined training error level. If there is no prior information other than the training examples, all models with the same training error should be equally likely to be chosen as the early stopping solution. When this is the case, we show that for general linear models, early stopping at any training error level above the training error minimum increases the expected generalization error. Moreover , we also show that the generalization error is an increasing function of the training error. Our results are nonasymptotic and independent of the presence or nature of the training data noise, and they hold when instead of generalization error , test e
Geometry of Early Stopping in Linear Networks
 Advances in Neural Information Processing Systems
, 1996
"... A theory of early stopping as applied to linear models is presented. The backpropagation learning algorithm is modeled as gradient descent in continuous time. Given a training set and a validation set, all weight vectors found by early stopping must lie on a certain quadric surface, usually an ellip ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
A theory of early stopping as applied to linear models is presented. The backpropagation learning algorithm is modeled as gradient descent in continuous time. Given a training set and a validation set, all weight vectors found by early stopping must lie on a certain quadric surface, usually an ellipsoid. Given a training set and a candidate early stopping weight vector, all validation sets have leastsquares weights lying on a certain plane. This latter fact can be exploited to estimate the probability of stopping at any given point along the trajectory from the initial weight vector to the leastsquares weights derived from the training set, and to estimate the probability that training goes on indefinitely. The prospects for extending this theory to nonlinear models are discussed. 1 INTRODUCTION `Early stopping' is the following training procedure: Split the available data into a training set and a "validation" set. Start with initial weights close to zero. Apply gradient descent (ba...
PRINCIPLES OF NEURAL SPATIAL INTERACTION MODELLING
"... The focus of this paper is on the neural network approach to modelling origindestination flows across geographic space. The novelty about neural spatial interaction models lies in their ability to model nonlinear processes between spatial flows and their determinants, with few – if any – a priori ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The focus of this paper is on the neural network approach to modelling origindestination flows across geographic space. The novelty about neural spatial interaction models lies in their ability to model nonlinear processes between spatial flows and their determinants, with few – if any – a priori assumptions of the data generating process. The paper draws attention to models based on the theory of feedforward networks with a single hidden layer, and discusses some important issues that are central for successful application development. The scope is limited to feedforward neural spatial interaction models that have gained increasing attention in recent years. It is argued that failures in applications can usually be attributed to inadequate learning and/or inadequate complexity of the network model. Parameter estimation and a suitably chosen number of hidden units are, thus, of crucial importance for the success of real world applications. The paper views network learning as an optimization problem, describes various learning procedures, provides insights into current best practice to optimize complexity and suggests the use of the bootstrap pairs approach to evaluate the model’s generalization performance. 1.
Bagging and Boosting Negatively Correlated Neural Networks
, 2008
"... In this paper we propose two cooperative ensemble learning algorithms, NegBagg and NegBoost, for designing neural network (NN) ensembles. The proposed algorithms train different individual NNs in an ensemble incrementally using the negative correlation learning algorithm. Bagging and boosting algori ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
In this paper we propose two cooperative ensemble learning algorithms, NegBagg and NegBoost, for designing neural network (NN) ensembles. The proposed algorithms train different individual NNs in an ensemble incrementally using the negative correlation learning algorithm. Bagging and boosting algorithms are used in NegBagg and NegBoost, respectively, to create different training sets for different NNs in the ensemble. The idea behind using negative correlation learning in conjunction with bagging/boosting algorithm is to facilitate interaction and cooperation among NNs during their training. Both NegBagg and NegBoost use a constructive approach to determine automatically the number of hidden neurons for NNs. NegBoost also uses the constructive approach to determine automatically the number of NNs for the ensemble. The two algorithms have been tested on a number of benchmark problems in machine learning and neural networks, including Australian credit card assessment, breast cancer, diabetes, glass, heart disease, letter recognition, satellite, soybean and waveform problems. The experimental results show that NegBagg and NegBoost require a small number of training epochs to produce compact NN ensembles with good generalization.
An adaptive merging and growing algorithm for designing artificial neural networks
 in Proc. Int. Joint Conf. Neural Netw., Hong Kong
"... Abstract—This paper presents a new algorithm, called adaptive merging and growing algorithm (AMGA), in designing artificial neural networks (ANNs). This algorithm merges and adds hidden neurons during the training process of ANNs. The merge operation introduced in AMGA is a kind of a mixed mode oper ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract—This paper presents a new algorithm, called adaptive merging and growing algorithm (AMGA), in designing artificial neural networks (ANNs). This algorithm merges and adds hidden neurons during the training process of ANNs. The merge operation introduced in AMGA is a kind of a mixed mode operation, which is equivalent to pruning two neurons and adding one neuron. Unlike most previous studies, AMGA puts emphasis on autonomous functioning in the design process of ANNs. This is the main reason why AMGA uses an adaptive not a predefined fixed strategy in designing ANNs. The adaptive strategy merges or adds hidden neurons based on the learning ability of hidden neurons or the training progress of ANNs. In order to reduce the amount of retraining after modifying ANN architectures, AMGA prunes hidden neurons by merging correlated hidden neurons and adds hidden neurons by splitting existing hidden neurons. The proposed AMGA has been tested on a number of benchmark problems in machine learning and ANNs, including breast cancer, Australian credit card assessment, and diabetes, gene, glass, heart, iris, and thyroid problems. The experimental results show that AMGA can design compact ANN architectures with good generalization ability compared to other algorithms. Index Terms—Adding neurons, artificial neural network (ANN) design, generalization ability, merging neurons, retraining.