Results 11  20
of
21
LinearLeastSquares Initialization of Multilayer Perceptrons Through Backpropagation of the Desired Response
"... Abstract—Training multilayer neural networks is typically carried out using descent techniques such as the gradientbased backpropagation (BP) of error or the quasiNewton approaches including the Levenberg–Marquardt algorithm. This is basically due to the fact that there are no analytical methods t ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Abstract—Training multilayer neural networks is typically carried out using descent techniques such as the gradientbased backpropagation (BP) of error or the quasiNewton approaches including the Levenberg–Marquardt algorithm. This is basically due to the fact that there are no analytical methods to find the optimal weights, so iterative local or global optimization techniques are necessary. The success of iterative optimization procedures is strictly dependent on the initial conditions, therefore, in this paper, we devise a principled novel method of backpropagating the desired response through the layers of a multilayer perceptron (MLP), which enables us to accurately initialize these neural networks in the minimum meansquareerror sense, using the analytic linear least squares solution. The generated solution can be used as an initial condition to standard iterative optimization algorithms. However, simulations demonstrate that in most cases, the performance achieved through the proposed initialization scheme leaves little room for further improvement in the meansquareerror (MSE) over the training set. In addition, the performance of the network optimized with the proposed approach also generalizes well to testing data. A rigorous derivation of the initialization algorithm is presented and its high performance is verified with a number of benchmark training problems including chaotic timeseries prediction, classification, and nonlinear system identification with MLPs. Index Terms—Approximate leastsquares training of multilayer perceptrons (MLPs), backpropagation (BP) of desired response, neural network initialization. I.
Homotopy Approaches For The Analysis And Solution Of Neural Network And Other Nonlinear Systems Of Equations
, 1995
"... Increasingly models, mappings, systems and algorithms used for signal processing need to be nonlinear in order to meet performance specifications in communications, computing and control systems applications. Simple computational models have been developed, including neural networks, which can effic ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Increasingly models, mappings, systems and algorithms used for signal processing need to be nonlinear in order to meet performance specifications in communications, computing and control systems applications. Simple computational models have been developed, including neural networks, which can efficiently implement a variety of nonlinear mappings through appropriate choice of model parameters. However, the design of arbitrary nonlinear mappings using these models and measured data requires both understanding how realizable (finite) systems perform if optimized given finite data, and a method for computing globally optimal system parameters. In this thesis, we use constructive homotopy methods both to geometrically explore the mapping capabilities of finite neural networks, and to rigorously develop a robust method for computing optimal solutions to systems of nonlinear equations which, like neural network equations, have an unknown number of solutionsand may have solutions at infinity.
Neural Networks for Signal Processing
"... i Abstract In this thesis, methods for optimization of neural network architectures are examined in order to achieve better generalization ability from the neural networks at tasks within signal processing. The feedforward networks described have one hidden layer of units with tanh activation fun ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
i Abstract In this thesis, methods for optimization of neural network architectures are examined in order to achieve better generalization ability from the neural networks at tasks within signal processing. The feedforward networks described have one hidden layer of units with tanh activation functions and linear output units. The major topics described in the thesis are: ffl Reducing the number of free parameters in the network architecture by pruning of parameters. Pruning is based on estimates (Optimal Brain Damage) of which parameters induce the least increase in the network performance criterion (the costfunction) when they are removed from the network. ffl Finding methods for estimation of the generalization ability of the network from the learning data set. A generalization error estimate (Akaike's Final Prediction Error estimate) is used for choosing the optimal network architecture among different pruned network configurations. ffl Using methods for online tuning of the...
Accelerating the convergence speed of neural networks
"... learning methods using least squares ..."
Software for Data Analysis With Graphical Models
 In Fifth International Artificial Intelligence and Statistics Workshop, Ft Lauderdale, FL
, 1995
"... Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical infer ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical inference. This offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper illustrates the framework with an example and then presents some basic techniques for the task: problem decomposition and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software. 1 Introduction This paper argues that the data analysis tasks of learning and knowledge discovery can be handled using graphical models. This metalevel use of graphical models was first suggested by Spiegelhalter an...
Information Merging in Neural Modelling
"... . The paper addresses the problem of defining a neural system which combines pieces of independent information available in both the data and parameters spaces. The problem is approached in the framework of the probabilistic interpretation of neural modelling: in order to take into account the i ..."
Abstract
 Add to MetaCart
. The paper addresses the problem of defining a neural system which combines pieces of independent information available in both the data and parameters spaces. The problem is approached in the framework of the probabilistic interpretation of neural modelling: in order to take into account the indetermination associated to the training process, a distribution in the weight space is associated to each solution, and the network resulting from the combination is obtained by merging the distributions associated to the different solutions. The effectiveness of the proposed procedure is shown by applying it to feedforward neural networks trained on a classification task. 1 Introduction The problem of training by taking into account information available in the parameter space can rise in the case of neural modelling: if we have available a neural network net1 trained on a data set t1 (consisting of n1 samples) and a new, independent data set t2 (n2 samples) drawn from the same pro...
Characterizing Network Complexity and Learning Efficiency by the Ratio of Weight Interdependence to Sensitivity
"... We extend previous research on parameter dynamics of digital filters to examine weight sensitivity and interdependence in feedforward networks. Weight sensitivity refers to the effect of small weight perturbations on the network's output, and weight interdependence refers to the degree of colineari ..."
Abstract
 Add to MetaCart
We extend previous research on parameter dynamics of digital filters to examine weight sensitivity and interdependence in feedforward networks. Weight sensitivity refers to the effect of small weight perturbations on the network's output, and weight interdependence refers to the degree of colinearity between weights. A combined measure of the weight space (#), defined as the ratio of weight interdependence to sensitivity, is explored in networks with hiddenunit activation functions of different complexity in the contexts of learning (1) a nonlinearly separable bivariate normal classification task, (2) the XOR problem, (3) sigmoidal functions, and (4) sine functions. Simulations show that networks with more complex activation functions give rise to a smaller # and more rapid learning, suggesting that weight sensitivity and interdependence together are indicative of network complexity and are predictive of learning efficiency. Weight Sensitivity and Interdependence 3 Characterizing N...
Exact Hessian Calculation in Feedforward FIR Neural Networks
 PROCEEDINGS OF THE IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS
, 1998
"... FIR neural networks are feedforward neural networks with regular scalar synapses replaced by linear finite impulse response filters. This paper introduces the Second Order Temporal Backpropagation algorithm which enables the exact calculation of the second order error derivatives for a FIR neural ne ..."
Abstract
 Add to MetaCart
FIR neural networks are feedforward neural networks with regular scalar synapses replaced by linear finite impulse response filters. This paper introduces the Second Order Temporal Backpropagation algorithm which enables the exact calculation of the second order error derivatives for a FIR neural network. This method is based on the error gradient calculation method first proposed by Wan and referred to as Temporal Backpropagation. A reduced FIR synapse model obtained by ignoring unnecessary time lags is proposed to reduce the number of network parameters.
Characterizing Network Complexity and Classification Efficiency by the Ratio of Weight Interdependence to Sensitivity
, 1999
"... We extend previous research on digital filter structures and parameter sensitivity to the relationship between the nature of hiddenunit activation function, weight sensitivity and interdependence, and classification learning in neural networks. Weight sensitivity indicates the extent of variations ..."
Abstract
 Add to MetaCart
We extend previous research on digital filter structures and parameter sensitivity to the relationship between the nature of hiddenunit activation function, weight sensitivity and interdependence, and classification learning in neural networks. Weight sensitivity indicates the extent of variations in a network's output when reacting to small perturbations in its weights; whereas weight interdependence indicates the degree of colinearity between weights. A combined measure (t ), defined as the ratio of weight interdependence to sensitivity, was examined in three feedforward networks employing different hiddenunit activation functions in the context of a nonlinearly separable twochoice classification task. Simulation results show that t reflects the complexity of hiddenunit activation function and determines the rate of learning quadratic classification boundary. Networks with more complex hiddenunit activation function evince a smaller t and more rapid classification learning.
iii TABLE OF CONTENTS
"... 2002 This work is dedicated to all scientists and researchers, who have lived in pursuit of knowledge, and have dedicated themselves to the advancement of science. ACKNOWLEDGMENTS I would like to start by thanking my supervisor, Dr. Jose C. Principe, for his encouraging and inspiring style that made ..."
Abstract
 Add to MetaCart
2002 This work is dedicated to all scientists and researchers, who have lived in pursuit of knowledge, and have dedicated themselves to the advancement of science. ACKNOWLEDGMENTS I would like to start by thanking my supervisor, Dr. Jose C. Principe, for his encouraging and inspiring style that made possible the completion of this work. Without his guidance, imagination, and enthusiasm, which I admire, this dissertation would not have been possible. I also wish to thank the members of my committee, Dr. John G. Harris, Dr. Tan F. Wong, and Dr. Mark C.K. Yang, for their valuable time and interest in serving on my supervisory committee, as well as their comments, which helped improve the quality of this dissertation. Throughout the course my PhD research, I have been in interaction with many CNEL colleagues and I have benefited from the valuable discussions we had together