Results 1  10
of
10
Fast Exact Multiplication by the Hessian
 Neural Computation
, 1994
"... Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly ca ..."
Abstract

Cited by 93 (5 self)
 Add to MetaCart
(Show Context)
Just storing the Hessian H (the matrix of second derivatives d^2 E/dw_i dw_j of the error E with respect to each pair of weights) of a large neural network is difficult. Since a common use of a large matrix like H is to compute its product with various vectors, we derive a technique that directly calculates Hv, where v is an arbitrary vector. This allows H to be treated as a generalized sparse matrix. To calculate Hv, we first define a differential operator R{f(w)} = (d/dr)f(w + rv)_{r=0}, note that R{grad_w} = Hv and R{w} = v, and then apply R{} to the equations used to compute grad_w. The result is an exact and numerically stable procedure for computing Hv, which takes about as much computation, and is about as local, as a gradient evaluation. We then apply the technique to backpropagation networks, recurrent backpropagation, and stochastic Boltzmann Machines. Finally, we show that this technique can be used at the heart of many iterative techniques for computing various properties of H, obviating the need for direct methods.
Computing Second Derivatives in FeedForward Networks: a Review
 IEEE Transactions on Neural Networks
, 1994
"... . The calculation of second derivatives is required by recent training and analyses techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We here review and develop exact and approximate ..."
Abstract

Cited by 36 (4 self)
 Add to MetaCart
(Show Context)
. The calculation of second derivatives is required by recent training and analyses techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We here review and develop exact and approximate algorithms for calculating second derivatives. For networks with jwj weights, simply writing the full matrix of second derivatives requires O(jwj 2 ) operations. For networks of radial basis units or sigmoid units, exact calculation of the necessary intermediate terms requires of the order of 2h + 2 backward/forwardpropagation passes where h is the number of hidden units in the network. We also review and compare three approximations (ignoring some components of the second derivative, numerical differentiation, and scoring). Our algorithms apply to arbitrary activation functions, networks, and error functions (for instance, with connections that skip layers, or radial basis functions, or ...
Adaptive critic learning techniques for engine torque and airfuel ratio control
 IEEE Trans. Syst., Man, Cybern. B, Cybern
, 2008
"... Abstract—A new approach for engine calibration and control is proposed. In this paper, we present our research results on the implementation of adaptive critic designs for selflearning control of automotive engines. A class of adaptive critic designs that can be classified as (modelfree) actionde ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
(Show Context)
Abstract—A new approach for engine calibration and control is proposed. In this paper, we present our research results on the implementation of adaptive critic designs for selflearning control of automotive engines. A class of adaptive critic designs that can be classified as (modelfree) actiondependent heuristic dynamic programming is used in this research project. The goals of the present learning control design for automotive engines include improved performance, reduced emissions, and maintained optimum performance under various operating conditions. Using the data from a test vehicle with a V8 engine, we developed a neural network model of the engine and neural network controllers based on the idea of approximate dynamic programming to achieve optimal control. We have developed and simulated selflearning neural network controllers for both engine torque (TRQ) and exhaust air–fuel ratio (AFR) control. The goal of TRQ control and AFR control is to track the commanded values. For both control problems, excellent neural network controller transient performance has been achieved. Index Terms—Adaptive critic designs (ACDs), adaptive dynamic programming, air–fuel ratio (AFR) control, approximate dynamic programming, automotive engine control, torque control. I.
New millennium AI and the convergence of history
 Challenges to Computational Intelligence
, 2007
"... Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At the same time there has been rapid progress in practical methods for learning true sequenceprocessing programs, as opposed to traditional methods limited to stationary pattern association. Here we will briefly review some of the new results, and speculate about future developments, pointing out that the time intervals between the most notable events in over 40,000 years or 2 9 lifetimes of human history have sped up exponentially, apparently converging to zero within the next few decades. Or is this impression just a byproduct of the way humans allocate memory space to past events? 1
Unified Formulation for Training Recurrent Networks with Derivative Adaptive Critics
 in Proceedings of the 1997 International Conference on Neural Networks
, 1997
"... ..."
(Show Context)
Parametric CMAC Networks: Fundamentals and Applications of a Fast Convergence Neural Structure
"... Abstract—This paper shows fundamentals and applications of the parametric cerebellar model arithmetic computer (PCMAC) network: a neural structure derived from the Albus CMAC algorithm and Takagi–Sugeno–Kang parametric fuzzy inference systems. It resembles the original CMAC proposed by Albus in t ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract—This paper shows fundamentals and applications of the parametric cerebellar model arithmetic computer (PCMAC) network: a neural structure derived from the Albus CMAC algorithm and Takagi–Sugeno–Kang parametric fuzzy inference systems. It resembles the original CMAC proposed by Albus in the sense that it is a local network, (i.e., for a given input vector, only a few of the networks nodes—or neurons—will be active and will effectively contribute to the corresponding network output). The internal mapping structure is built in such a way that it implements, for each CMAC memory location, one linear parametric equation of the network input strengths. This mapping can be corresponded to a hidden layer in a multilayer perceptron (MLP) structure. The output of the active equations are then weighted and averaged to generate the actual outputs to the network. A practical comparison between the proposed network and other structures is, thus, accomplished. PCMAC, MLP, and CMAC networks are applied to approximate a nonlinear function. Results show advantages of the proposed algorithm based on the computational efforts needed by each network to perform nonlinear function approximation. Also, PCMAC is used to solve a practical problem at mobile telephony, approximating an RF mapping at a given region to help operational people while maintaining service quality. Index Terms—Cerebellar model arithmetic computers (CMACs), communication systems, mobile communication, neural networks. I.
Learning in Networks
, 1995
"... Intelligent systems require software incorporating probabilistic reasoning, and often times learning. Networks provide a framework and methodology for creating this kind of software. This paper introduces network models based on chain graphs with deterministic nodes. Chain graphs are defined as a hi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Intelligent systems require software incorporating probabilistic reasoning, and often times learning. Networks provide a framework and methodology for creating this kind of software. This paper introduces network models based on chain graphs with deterministic nodes. Chain graphs are defined as a hierarchical combination of Bayesian and Markov networks. To model learning, plates on chain graphs are introduced to model independent samples. The paper concludes by discussing various operations that can be performed on chain graphs with plates as a simplification process or to generate learning algorithms. Un systeme intelligent doit necessairement inclure un module de raisonement probabiliste et meme bien souvent des mechanismes d'apprentissage. Les reseaux offrent un cadre et une methodologie pour creer de tels logiciels. Ce papier introduit des modeles de reseaux bases sur les graphes en chaine avec noeuds deterministes. Un graphe en chaine est defini comme etant une combinaison hierarc...
Determination of the Regularization Parameter for Support Vector Machines via Vasconcelos ’ Genetic Algorithm
"... Abstract: This paper presents a genetic algorithm (GA) methodology for training a support vector machine (SVM). The SVM method may be viewed as a quadratic optimization problem with linear constraints, where the objective is to minimize the error learning rate and the VapnikChervonenkis (VC) dimen ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract: This paper presents a genetic algorithm (GA) methodology for training a support vector machine (SVM). The SVM method may be viewed as a quadratic optimization problem with linear constraints, where the objective is to minimize the error learning rate and the VapnikChervonenkis (VC) dimension in order to get an Optimal Separating Hyperplane (OSH) that classifies two sets of data. A SVM is a very good tool for classification problems which displays an excellent generalization ability. In order to test our method we solve the XOR problem, a canonical nonlinearly separable problem. We used a genetic algorithm (GA) called Vasconcelos ’ GA (VGA). The genome was selected to solve the dual SVM problem, where each individual corresponds to a Lagrange multiplier. Our interest lay in finding the “best ” value of C (the socalled “regularization ” parameter); C reflects a tradeoff between the performance of the trained SVM and its allowed level of misclassification. We solved the problem in two ways: (a) We provided C, as is traditional in the normal treatment of the problem; (b) We implemented a complementary approach, wherein C is also included in the genome. In case (b) VGA finds C’s value freeing the user from having to find it from heuristics. We report an exact solution for case (a) and, importantly, encouraging results in which the error in the solution for case (b) is practically zero.
DRAFT Invited paperfor 50th Sessionof the InternationalStatisticalInstitute, Beijing, China, August,1995. LEARNING IN NETWORKS
, 1995
"... Intelligent systems require software incorporating probabilistic reasoning, and often times learning. Networks provide a framework and methodology for creating this kind of software. This paper introduces network models based on chain graphs with deterministic nodes. Chain graphs are defined as a hi ..."
Abstract
 Add to MetaCart
(Show Context)
Intelligent systems require software incorporating probabilistic reasoning, and often times learning. Networks provide a framework and methodology for creating this kind of software. This paper introduces network models based on chain graphs with deterministic nodes. Chain graphs are defined as a hierarchical combination of Bayesian and Markov networks. To model learning, plates on chain graphs are introduced to model independent samples. The paper concludes by discussing various operations that can be performed on chain graphs with plates as a simplification process or to generate learning algorithms. Un systeme intelligent doit necessairement inclure un module de raisonement probabiliste et meme bien souvent des mechanismes d'apprentissage. Les reseaux offrent un cadre et une methodologie pour creer de tels logiciels. Ce papier introduit des modeles de reseaux bases sur les graphes en chaine avec noeuds deterministes. Un graphe en chaine est defini comme etant une combinaison hierarchique de reseaux Bayesiens et de reseaux de Markov. Afin de modeliser l'apprentissage, j'introduit des couches dans ces