Results 11  20
of
81
Structurally Adaptive Modular Networks for NonStationary Environments
 IEEE Transactions on Neural Networks
"... This paper introduces a neural network capable of dynamically adapting its architecture to realize time variant nonlinear inputoutput maps. This network has its roots in the mixture of experts framework but uses a localized model for the gating network. Modules or experts are grown or pruned depen ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
This paper introduces a neural network capable of dynamically adapting its architecture to realize time variant nonlinear inputoutput maps. This network has its roots in the mixture of experts framework but uses a localized model for the gating network. Modules or experts are grown or pruned depending on the complexity of the modeling problem. The structural adaptation procedure addresses the model selection problem and typically leads to much better parameter estimation. Batch mode learning equations are extended to obtain online update rules enabling the network to model time varying environments. Simulation results are presented throughout the paper to support the proposed techniques. This research was supported in part by ARO contracts DAAH0494G0417 and 049510494 and NSF grant ECS 9307632. Contents 1 Introduction 3 2 Background on Mixture of Experts 4 2.1 Generic Mixture of Experts Architecture : : : : : : : : : : : : : : : : : : : : : : : 4 2.2 Drawbacks of a Global...
Investigation of the CasCor Family of Learning Algorithms
 NEURAL NETWORKS
, 1996
"... Six learning algorithms are investigated and compared empirically. All of them are based on variants of the candidate training idea of the Cascade Correlation method. The comparison was performed using 42 different datasets from the Proben1 benchmark collection. The results indicate: (1) for these p ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Six learning algorithms are investigated and compared empirically. All of them are based on variants of the candidate training idea of the Cascade Correlation method. The comparison was performed using 42 different datasets from the Proben1 benchmark collection. The results indicate: (1) for these problems it is slightly better not to cascade the hidden units, (2) error minimization candidate training is better that covariance maximization for regression problems but may be a little worse for classification problems, (3) for most learning tasks, considering validation set errors during the selection of the best candidate will not lead to improved networks, but for a few tasks it will. Section  Computational Analysis.
Early Stopping  but when?
 Neural Networks: Tricks of the Trade, volume 1524 of LNCS, chapter 2
, 1997
"... . Validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting ("early stopping"). The exact criterion used for validationbased early stopping, however, is usually chosen in an adhoc fa ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
. Validation can be used to detect when overfitting starts during supervised training of a neural network; training is then stopped before convergence to avoid the overfitting ("early stopping"). The exact criterion used for validationbased early stopping, however, is usually chosen in an adhoc fashion or training is stopped interactively. This trick describes how to select a stopping criterion in a systematic fashion; it is a trick for either speeding learning procedures or improving generalization, whichever is more important in the particular situation. An empirical investigation on multilayer perceptrons shows that there exists a tradeoff between training time and generalization: From the given mix of 1296 training runs using different 12 problems and 24 different network architectures I conclude slower stopping criteria allow for small improvements in generalization (here: about 4% on average), but cost much more training time (here: about factor 4 longer on average). 1 Early ...
Robust Combining of Disparate Classifiers through Order Statistics
 Pattern Analysis and Applications
, 2001
"... Integrating the outputs of multiple classifiers via combiners or metalearners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are larg ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
Integrating the outputs of multiple classifiers via combiners or metalearners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the i th order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.
Operator Adaptation in Evolutionary Computation and its Application to Structure Optimization of Neural Networks
, 2001
"... In this study, we give a brief overview of search strategy adaptation in evolutionary computation. The ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
In this study, we give a brief overview of search strategy adaptation in evolutionary computation. The
Decimated Input Ensembles for Improved Generalization
, 1999
"... Using an ensemble of classifiers instead of a single classifier has been demonstrated to improve generalization performance in many difficult problems. However, for this improvement to take place it is necessary to make the classifiers in an ensemble more complementary. In this paper, we highlight t ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Using an ensemble of classifiers instead of a single classifier has been demonstrated to improve generalization performance in many difficult problems. However, for this improvement to take place it is necessary to make the classifiers in an ensemble more complementary. In this paper, we highlight the need to reduce the correlation among the component classifiers and investigate one method for correlation reduction: input decimation. We elaborate on input decimation, a method that uses the discriminating features of the inputs to decouple classifiers. By presenting different parts of the feature set to each individual classifier, input decimation generates a diverse pool of classifiers. Experimental results confirm that input decimation combining improves generalization performance. I. Introduction A classification learning task involves constructing a mapping from input instances (normally described by several features) to the (most likely) class to which the instances belong. Superv...
Some Notes on Neural Learning Algorithm Benchmarking
, 1995
"... New neural learning algorithms are often benchmarked only poorly. This article gathers some important DOs and DON'Ts for researchers in order to improve on that situation. The essential requirements are (1) Volume: benchmarking has to be broad enough, i.e., must use several problems; (2) Validity ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
New neural learning algorithms are often benchmarked only poorly. This article gathers some important DOs and DON'Ts for researchers in order to improve on that situation. The essential requirements are (1) Volume: benchmarking has to be broad enough, i.e., must use several problems; (2) Validity: common errors that invalidate the results have to be avoided; (3) Reproducibility: benchmarking has to be documented well enough to be completely reproducible; and (4) Comparability: benchmark results should, if possible, be directly comparable with the results achieved by others using different algorithms.
Robust order statistics based ensemble for distributed data mining
 In Advances in Distributed and Parallel Knowledge Discovery
, 2000
"... Integrating the outputs of multiple classifiers via combiners or metalearners has led to substantial improvements in several difficult pattern recognition problems. In the typical setting investigated till now, each classifier is trained on data taken or resampled from a common data set, or randoml ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Integrating the outputs of multiple classifiers via combiners or metalearners has led to substantial improvements in several difficult pattern recognition problems. In the typical setting investigated till now, each classifier is trained on data taken or resampled from a common data set, or randomly selected partitions thereof, and thus experiences similar quality of training data. However, in distributed data mining involving heterogeneous databases, the nature, quality and quantity of data available to each site/classifier may vary substantially, leading to large discrepancies in their performance. In this chapter we introduce and investigate a family of metaclassifiers based on order statistics, for robust handling of such cases. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when such combiners are used. We show analytically that the selection of the median, the maximum and in general, the ith order statistic improves classification performance. Furthermore, we introduce the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and empirically show that they are significantly superior in the presence of outliers or uneven classifier performance. So they can be fruitfully applied to several heterogeneous distributed data mining situations, specially when it is
An Evolutionary Method to Find Good BuildingBlocks for Architectures of Artificial Neural Networks
, 1996
"... This paper deals with the combination of Evolutionary Algorithms and Artificial Neural Networks (ANN). A new method is presented, to find good buildingblocks for architectures of Artificial Neural Networks. The method is based on Cellular Encoding, a representation scheme by F. Gruau, and on Genet ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
This paper deals with the combination of Evolutionary Algorithms and Artificial Neural Networks (ANN). A new method is presented, to find good buildingblocks for architectures of Artificial Neural Networks. The method is based on Cellular Encoding, a representation scheme by F. Gruau, and on Genetic Programming by J. Koza. First it will be shown that a modified Cellular Encoding technique is able to find good architectures even for nonboolean networks. With the help of a graphdatabase and a new graphrewriting method, it is secondly possible to build architectures from modular structures. The information about buildingblocks for architectures is obtained by statistically analyzing the data in the graphdatabase. Simulation results for two realworld problems are given. 1 Introduction One of the major problems using ANN's is the design of their architecture. The architecture of an ANN greatly influences its performance. If the architecture is too small, the net is not able to learn t...
SAProp: Optimization of Multilayer Perceptron Parameters using Simulated Annealing
, 1998
"... . A general problem in model selection is to obtain the right parameters that make a model fit observed data. If the model selected is a Multilayer Perceptron (MLP) trained with Backpropagation (BP), it is necessary to find appropriate initial weights and learning parameters. This paper proposes ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
. A general problem in model selection is to obtain the right parameters that make a model fit observed data. If the model selected is a Multilayer Perceptron (MLP) trained with Backpropagation (BP), it is necessary to find appropriate initial weights and learning parameters. This paper proposes a method that combines Simulated Annealing (SimAnn) and BP to train MLPs with a single hidden layer, termed SAProp. SimAnn selects the initial weights and the learning rate of the network. SAProp combines the advantages of the stochastic search performed by the Simulated Annealing over the MLP parameter space and the local search of the BP algorithm. The application of the proposed method to several realworld benchmark problems shows that MLPs evolved using SAProp achieve a higher level of generalization than other perceptron training algorithms, such as QuickPropagation (QP) or RPROP, and other evolutive algorithms, such as GLVQ. 1 Introduction Whatever the application usi...