Results 11  20
of
36
LinearLeastSquares Initialization of Multilayer Perceptrons Through Backpropagation of the Desired Response
"... Abstract—Training multilayer neural networks is typically carried out using descent techniques such as the gradientbased backpropagation (BP) of error or the quasiNewton approaches including the Levenberg–Marquardt algorithm. This is basically due to the fact that there are no analytical methods t ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Training multilayer neural networks is typically carried out using descent techniques such as the gradientbased backpropagation (BP) of error or the quasiNewton approaches including the Levenberg–Marquardt algorithm. This is basically due to the fact that there are no analytical methods to find the optimal weights, so iterative local or global optimization techniques are necessary. The success of iterative optimization procedures is strictly dependent on the initial conditions, therefore, in this paper, we devise a principled novel method of backpropagating the desired response through the layers of a multilayer perceptron (MLP), which enables us to accurately initialize these neural networks in the minimum meansquareerror sense, using the analytic linear least squares solution. The generated solution can be used as an initial condition to standard iterative optimization algorithms. However, simulations demonstrate that in most cases, the performance achieved through the proposed initialization scheme leaves little room for further improvement in the meansquareerror (MSE) over the training set. In addition, the performance of the network optimized with the proposed approach also generalizes well to testing data. A rigorous derivation of the initialization algorithm is presented and its high performance is verified with a number of benchmark training problems including chaotic timeseries prediction, classification, and nonlinear system identification with MLPs. Index Terms—Approximate leastsquares training of multilayer perceptrons (MLPs), backpropagation (BP) of desired response, neural network initialization. I.
Secondorder backpropagation algorithms for a stagewisepartitioned separable Hessian matrix
 IN PROC. OF 2005 INT’L JOINT CONF. ON NEURAL NETWORKS (SEE WWW.IEOR.BERKELEY.EDU/PEOPLE/FACULTY/DREYFUSPUBS/IJCNNESJ05.PDF
, 2005
"... Recent advances in computer technology allow the implementation of some important methods that were assigned lower priority in the past due to their computational burdens. Secondorder backpropagation (BP) is such a method that computes the exact Hessian matrix of a given objective function. We des ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
(Show Context)
Recent advances in computer technology allow the implementation of some important methods that were assigned lower priority in the past due to their computational burdens. Secondorder backpropagation (BP) is such a method that computes the exact Hessian matrix of a given objective function. We describe two algorithms for feedforward neuralnetwork (NN) learning with emphasis on how to organize Hessian elements into a socalled stagewisepartitioned blockarrow matrix form: (1) stagewise BP, an extension of the discretetime optimalcontrol stagewise Newton of Dreyfus 1966; and (2) nodewise BP, based on direct implementation of the chain rule for differentiation attributable to Bishop 1992. The former, a more systematic and costefficient implementation in both memory and operation, progresses in the same layerbylayer (i.e., stagewise) fashion as the widelyemployed firstorder BP computes the gradient vector. We also show intriguing separable structures of each block in the partitioned Hessian, disclosing the rank of blocks.
Neural Networks for Signal Processing
"... i Abstract In this thesis, methods for optimization of neural network architectures are examined in order to achieve better generalization ability from the neural networks at tasks within signal processing. The feedforward networks described have one hidden layer of units with tanh activation fun ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
i Abstract In this thesis, methods for optimization of neural network architectures are examined in order to achieve better generalization ability from the neural networks at tasks within signal processing. The feedforward networks described have one hidden layer of units with tanh activation functions and linear output units. The major topics described in the thesis are: ffl Reducing the number of free parameters in the network architecture by pruning of parameters. Pruning is based on estimates (Optimal Brain Damage) of which parameters induce the least increase in the network performance criterion (the costfunction) when they are removed from the network. ffl Finding methods for estimation of the generalization ability of the network from the learning data set. A generalization error estimate (Akaike's Final Prediction Error estimate) is used for choosing the optimal network architecture among different pruned network configurations. ffl Using methods for online tuning of the...
Chaos Control on Universal Learning Networks
"... Abstract—A new chaos control method is proposed which is useful for taking advantage of chaos and avoiding it. The proposed method is based on the following facts: 1) chaotic phenomena can be generated and eliminated by controlling maximum Lyapunov exponent of systems and 2) maximum Lyapunov exponen ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Abstract—A new chaos control method is proposed which is useful for taking advantage of chaos and avoiding it. The proposed method is based on the following facts: 1) chaotic phenomena can be generated and eliminated by controlling maximum Lyapunov exponent of systems and 2) maximum Lyapunov exponent can be formulated and calculated by using higher order derivatives of Universal Learning Networks (ULN’s). ULN’s consist of a number of interconnected nodes where the nodes may have any continuously differentiable nonlinear functions in them and each pair of nodes can be connected by multiple branches with arbitrary time delays. A generalized learning algorithm has been derived for the ULN’s, in which both the firstorder derivatives (gradients) and the higher order derivatives are incorporated. In simulations, parameters of ULN’s with bounded node outputs are adjusted for maximum Lyapunov component to approach the target value. And, it has been shown that a fully connected ULN with three sigmoidal function nodes is able to generate and eliminate chaotic behaviors by adjusting the parameters. Index Terms—Chaos, highorder derivatives calculation, Lyapunov exponent, neural networks, universal learning networks. I.
Homotopy Approaches For The Analysis And Solution Of Neural Network And Other Nonlinear Systems Of Equations
, 1995
"... Increasingly models, mappings, systems and algorithms used for signal processing need to be nonlinear in order to meet performance specifications in communications, computing and control systems applications. Simple computational models have been developed, including neural networks, which can effic ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Increasingly models, mappings, systems and algorithms used for signal processing need to be nonlinear in order to meet performance specifications in communications, computing and control systems applications. Simple computational models have been developed, including neural networks, which can efficiently implement a variety of nonlinear mappings through appropriate choice of model parameters. However, the design of arbitrary nonlinear mappings using these models and measured data requires both understanding how realizable (finite) systems perform if optimized given finite data, and a method for computing globally optimal system parameters. In this thesis, we use constructive homotopy methods both to geometrically explore the mapping capabilities of finite neural networks, and to rigorously develop a robust method for computing optimal solutions to systems of nonlinear equations which, like neural network equations, have an unknown number of solutionsand may have solutions at infinity.
EÆcient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables
"... Abstract. We discuss Bayesian methods for model averaging and model selection among Bayesiannetwork models with hidden variables. In particular, we examine largesample approximations for the marginal likelihood of naiveBayes models in which the root node is hidden. Such models are useful for clu ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We discuss Bayesian methods for model averaging and model selection among Bayesiannetwork models with hidden variables. In particular, we examine largesample approximations for the marginal likelihood of naiveBayes models in which the root node is hidden. Such models are useful for clustering or unsupervised learning. We consider a Laplace approximation and the less accurate but more computationally eÆcient approximation known as the Bayesian Information Criterion (BIC), which is equivalent to Rissanen's (1987) MinimumDescription Length (MDL). Also, we consider approximations that ignore some odiagonal elements of the observed information matrix and an approximation proposed by Cheeseman and Stutz (1995). We evaluate the accuracy of these approximations using a MonteCarlo gold standard. In experiments with articial and real examples, we nd that (1) none of the approximations are accurate when used for model averaging, (2) all of the approximations, with the exception of BIC/MDL, are accurate for model selection, (3) among the accurate approximations, the Cheeseman{Stutz and Diagonal approximations are the most computationally eÆcient, (4) all of the approximations, with the exception of BIC/MDL, can be sensitive to the prior distribution over model parameters, and (5) the Cheeseman{Stutz approximation can be more accurate than the other approximations, including the Laplace approximation, in situations where the parameters in the maximum a posteriori conguration are near a boundary.
Global feedforward neural network learning for classification and regression
 Proceedings of the Energy Minimization Methods in Computer Vision and Pattern Recognition, Sophia Antipolis
"... Abstract. This paper addresses the issues of global optimality and training of a Feedforward Neural Network (FNN) error funtion incorporating the weight decay regularizer. A network with a single hiddenlayer and a single outputunit is considered. Explicit vector and matrix canonical forms for the J ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. This paper addresses the issues of global optimality and training of a Feedforward Neural Network (FNN) error funtion incorporating the weight decay regularizer. A network with a single hiddenlayer and a single outputunit is considered. Explicit vector and matrix canonical forms for the Jacobian and Hessian of the network are presented. Convexity analysis is then performed utilizing the known canonical structure of the Hessian. Next, global optimality characterization of the FNN error function is attempted utilizing the results of convex characterization and a convex monotonic transformation. Based on this global optimality characterization, an iterative algorithm is proposed for global FNN learning. Numerical experiments with benchmark examples show better convergence of our network learning as compared to many existing methods in the literature. The network is also shown to generalize well for a face recognition problem. 1
Learning in Networks
, 1995
"... Intelligent systems require software incorporating probabilistic reasoning, and often times learning. Networks provide a framework and methodology for creating this kind of software. This paper introduces network models based on chain graphs with deterministic nodes. Chain graphs are defined as a hi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Intelligent systems require software incorporating probabilistic reasoning, and often times learning. Networks provide a framework and methodology for creating this kind of software. This paper introduces network models based on chain graphs with deterministic nodes. Chain graphs are defined as a hierarchical combination of Bayesian and Markov networks. To model learning, plates on chain graphs are introduced to model independent samples. The paper concludes by discussing various operations that can be performed on chain graphs with plates as a simplification process or to generate learning algorithms. Un systeme intelligent doit necessairement inclure un module de raisonement probabiliste et meme bien souvent des mechanismes d'apprentissage. Les reseaux offrent un cadre et une methodologie pour creer de tels logiciels. Ce papier introduit des modeles de reseaux bases sur les graphes en chaine avec noeuds deterministes. Un graphe en chaine est defini comme etant une combinaison hierarc...
Accelerating the convergence speed of neural networks
"... learning methods using least squares ..."
(Show Context)
Software for Data Analysis With Graphical Models
 In Fifth International Artificial Intelligence and Statistics Workshop, Ft Lauderdale, FL
, 1995
"... Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical infer ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Probabilistic graphical models are being used widely in artificial intelligence and statistics, for instance, in diagnosis and expert systems, as a framework for representing and reasoning with probabilities and independencies. They come with corresponding algorithms for performing statistical inference. This offers a unifying framework for prototyping and/or generating data analysis algorithms from graphical specifications. This paper illustrates the framework with an example and then presents some basic techniques for the task: problem decomposition and the calculation of exact Bayes factors. Other tools already developed, such as automatic differentiation, Gibbs sampling, and use of the EM algorithm, make this a broad basis for the generation of data analysis software. 1 Introduction This paper argues that the data analysis tasks of learning and knowledge discovery can be handled using graphical models. This metalevel use of graphical models was first suggested by Spiegelhalter an...