Results 11 - 20
of
32
A generalized learning paradigm exploiting the structure of feedforward neural networks
- IEEE Trans. Neural Networks
, 1996
"... In this paper a general class of fast learning algorithms for feedforward neural networks is introduced and described. The approach exploits the separability of each layer into linear and nonlinear blocks and consists of two steps. The first step is the descent of the error functional in the space o ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
In this paper a general class of fast learning algorithms for feedforward neural networks is introduced and described. The approach exploits the separability of each layer into linear and nonlinear blocks and consists of two steps. The first step is the descent of the error functional in the space of the outputs of the linear blocks (descent in the neuron space), which can be performed using any preferred optimization strategy. In the second step, each linear block is optimized separately by using a Least Squares (LS) criterion. To demonstrate the effectiveness of the new approach, a detailed treatment of a gradient descent in the neuron space is conducted. The main properties of this approach are the higher speed of convergence with respect to methods that employ an ordinary gradient descent in the weight space (Backpropagation, BP), better numerical conditioning and lower computational cost compared to techniques based on the Hessian matrix. The numerical stability is assured by the use of robust LS linear system solvers, operating directly on the input data of each layer. Experimental results obtained in three problems are described, which confirm the effectiveness of the new method.
Variable Selection with Optimal Cell Damage
- In Proceedings of ICANN'94
, 1994
"... this paper we will show how NNs can be used for variable selection with a criterion based upon the evaluation of a variable usefulness. Various methods have been proposed to assess the value of a weight (e.g. saliency [Le Cun et al. 90] in the Optimal Brain-Damage -OBD- procedure): along similar ide ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
this paper we will show how NNs can be used for variable selection with a criterion based upon the evaluation of a variable usefulness. Various methods have been proposed to assess the value of a weight (e.g. saliency [Le Cun et al. 90] in the Optimal Brain-Damage -OBD- procedure): along similar ideas, we derive a method, called Optimal Cell Damage -OCD-, which evaluates the usefulness of input variables in a Multi-Layer Network and prunes the least useful. Variable selection is thus achieved during training of the classifier, which ensures that the selected set of variables is adequate for the classifier complexity. Variable selection is thus viewed here as an extension of weight pruning. One can also use a regularization approach to variable selection, which we will discuss elsewhere [Cibas et al., 94]. We will illustrate the behavior of our method on two relatively small problems: prediction of a synthetic time series and classification of waveforms [Breiman et al., 84]. These two problems are representative of relatively hard problems, where we can control the noise level, which is known to be an important feature of a data set when assessing the validity of a technique. The paper is organized as follows: section 2 introduces notations and results from the literature; section 3 describes the problems used for testing our methods and section 4 results of variable selection by OCD. 2. Variable Selection 2.1. Definitions Let be given a random variable pair (X, Y), X
Efficient Training of Feed-Forward Neural Networks
, 1997
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.3 Optimization strategy : : : : : : : : : : : : ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.3 Optimization strategy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 62 A.4 The Backpropagation algorithm : : : : : : : : : : : : : : : : : : : : : : : : 63 A.5 Conjugate direction methods : : : : : : : : : : : : : : : : : : : : : : : : : : 63 A.5.1 Conjugate gradients : : : : : : : : : : : : : : : : : : : : : : : : : : 65 A.5.2 The CGL algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.5.3 The BFGS algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.6 The SCG algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.7 Test results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 70 A.7.1 Comparison metric : : : : : : : : : : : : : : : : : : : : : : : :...
Linear-Least-Squares Initialization of Multilayer Perceptrons Through Backpropagation of the Desired Response
"... Abstract—Training multilayer neural networks is typically carried out using descent techniques such as the gradient-based backpropagation (BP) of error or the quasi-Newton approaches including the Levenberg–Marquardt algorithm. This is basically due to the fact that there are no analytical methods t ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract—Training multilayer neural networks is typically carried out using descent techniques such as the gradient-based backpropagation (BP) of error or the quasi-Newton approaches including the Levenberg–Marquardt algorithm. This is basically due to the fact that there are no analytical methods to find the optimal weights, so iterative local or global optimization techniques are necessary. The success of iterative optimization procedures is strictly dependent on the initial conditions, therefore, in this paper, we devise a principled novel method of backpropagating the desired response through the layers of a multilayer perceptron (MLP), which enables us to accurately initialize these neural networks in the minimum mean-square-error sense, using the analytic linear least squares solution. The generated solution can be used as an initial condition to standard iterative optimization algorithms. However, simulations demonstrate that in most cases, the performance achieved through the proposed initialization scheme leaves little room for further improvement in the mean-square-error (MSE) over the training set. In addition, the performance of the network optimized with the proposed approach also generalizes well to testing data. A rigorous derivation of the initialization algorithm is presented and its high performance is verified with a number of benchmark training problems including chaotic time-series prediction, classification, and nonlinear system identification with MLPs. Index Terms—Approximate least-squares training of multilayer perceptrons (MLPs), backpropagation (BP) of desired response, neural network initialization. I.
Bayesian Non-Linear Modelling with Neural Networks
, 1995
"... this paper is illustrated in figure 6e. If we give a probabilistic interpretation to the model, then we can evaluate the `evidence' for alternative values of the control parameters. Over-complex models turn out to be less probable, and the quantity ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
this paper is illustrated in figure 6e. If we give a probabilistic interpretation to the model, then we can evaluate the `evidence' for alternative values of the control parameters. Over-complex models turn out to be less probable, and the quantity
Exact Calculation of the Product of the Hessian Matrix of Feed-Forward Network Error Functions and a Vector in O(N) Time
"... Several methods for training feed-forward neural networks require second order information from the Hessian matrix of the error function. Although it is possible to calculate the Hessian matrix exactly it is often not desirable because of the computation and memory requirements involved. Some learni ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Several methods for training feed-forward neural networks require second order information from the Hessian matrix of the error function. Although it is possible to calculate the Hessian matrix exactly it is often not desirable because of the computation and memory requirements involved. Some learning techniques does, however, only need the Hessian matrix times a vector. This paper presents a method to calculate the Hessian matrix times a vector in O(N ) time, where N is the number of variables in the network. This is in the same order as the calculation of the gradient to the error function. The usefulness of this algorithm is demonstrated by improvement of existing learning techniques. 1 Introduction The second derivative information of the error function associated with feed-forward neural networks forms an N \Theta N matrix, which is usually referred to as the Hessian matrix. Second derivative information is needed in several learning algorithms, e.g., in some conjugate gradient a...
Homotopy Approaches For The Analysis And Solution Of Neural Network And Other Nonlinear Systems Of Equations
, 1995
"... Increasingly models, mappings, systems and algorithms used for signal processing need to be nonlinear in order to meet performance specifications in communications, computing and control systems applications. Simple computational models have been developed, including neural networks, which can effic ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Increasingly models, mappings, systems and algorithms used for signal processing need to be nonlinear in order to meet performance specifications in communications, computing and control systems applications. Simple computational models have been developed, including neural networks, which can efficiently implement a variety of nonlinear mappings through appropriate choice of model parameters. However, the design of arbitrary nonlinear mappings using these models and measured data requires both understanding how realizable (finite) systems perform if optimized given finite data, and a method for computing globally optimal system parameters. In this thesis, we use constructive homotopy methods both to geometrically explore the mapping capabilities of finite neural networks, and to rigorously develop a robust method for computing optimal solutions to systems of nonlinear equations which, like neural network equations, have an unknown number of solutionsand may have solutions at infinity.
Extended Bayesian Learning
- Proceedings of ESANN 97, European Symposium on Artificial neural networks, Bruges
, 1997
"... . In Bayesian learning one represents the relative degree of believe in different values of the weight vector - including biases - by considering a probability distribution function over weight space. In general, this a priori probability is expected to come from a Gaussian with zero mean and flexib ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
. In Bayesian learning one represents the relative degree of believe in different values of the weight vector - including biases - by considering a probability distribution function over weight space. In general, this a priori probability is expected to come from a Gaussian with zero mean and flexible variance which is callded a hyperparameter. It can be optimized automatically during training by maximizing the evidence. The extended Bayesian learning (EBL) approach consists of considering a more general form of priors by using several weight classes and by considering the mean of the Gaussian distribution to be another hyperparameter. We propose an algorithm which determines automatically the optimal number of different weight classes and where the weights can change from one class to another. Our approach is applied in several benchmark problems and outperforms simple Bayesian learning as well as other optimization strategies. 1. Introduction We begin by considering the problem of ...
Neural Networks: A Pattern Recognition Perspective
, 1996
"... Introduction Neural networks have been exploited in a wide variety of applications, the majority of which are concerned with pattern recognition in one form or another. However, it has become widely acknowledged that the effective solution of all but the simplest of such problems requires a princip ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Introduction Neural networks have been exploited in a wide variety of applications, the majority of which are concerned with pattern recognition in one form or another. However, it has become widely acknowledged that the effective solution of all but the simplest of such problems requires a principled treatment, in other words one based on a sound theoretical framework. From the perspective of pattern recognition, neural networks can be regarded as an extension of the many conventional techniques which have been developed over several decades. Lack of understanding of the basic principles of statistical pattern recognition lies at the heart of many of the common mistakes in the application of neural networks. In this chapter we aim to show that the `black box' stigma of neural networks is largely unjustified, and that there is actually considerable insight available into the way in which neural networks operate, and how to use them effectively. Some of the ke

