Results 11  20
of
80
Extracting Comprehensible Models from Trained Neural Networks
, 1996
"... To Mom, Dad, and Susan, for their support and encouragement. ..."
Abstract

Cited by 69 (4 self)
 Add to MetaCart
To Mom, Dad, and Susan, for their support and encouragement.
Trading Spaces: Computation, Representation and the Limits of Uninformed Learning
 BEHAVIORAL AND BRAIN SCIENCES
, 1997
"... It is widely appreciated (e.g. Marr, 1982) that the difficulty of a particular computation varies according to how the input data are presented. What is less well understood is the effect of this computation/representation tradeoff within familiar learning paradigms. We argue that existing learn ..."
Abstract

Cited by 63 (12 self)
 Add to MetaCart
It is widely appreciated (e.g. Marr, 1982) that the difficulty of a particular computation varies according to how the input data are presented. What is less well understood is the effect of this computation/representation tradeoff within familiar learning paradigms. We argue that existing learning algorithms are often poorly equipped to solve problems involving a certain type of important and widespread statistical regularity, which we call `type2 regularity'. The solution in these cases is to trade achieved representation against computational search. We investigate several ways in which such a tradeoff may be pursued including simple incremental learning, modular connectionism, and the developmental hypothesis of `representational redescription'. In addition, the most distinctive features of human cognition  language and culture  may themselves be viewed as adaptations enabling this representation/computation tradeoff to be pursued on an even grander scale.
Local Gain Adaptation in Stochastic Gradient Descent
 In Proc. Intl. Conf. Artificial Neural Networks
, 1999
"... Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The res ..."
Abstract

Cited by 58 (13 self)
 Add to MetaCart
Gain adaptation algorithms for neural networks typically adjust learning rates by monitoring the correlation between successive gradients. Here we discuss the limitations of this approach, and develop an alternative by extending Sutton's work on linear systems to the general, nonlinear case. The resulting online algorithms are computationally little more expensive than other acceleration techniques, do not assume statistical independence between successive training patterns, and do not require an arbitrary smoothing parameter. In our benchmark experiments, they consistently outperform other acceleration methods, and show remarkable robustness when faced with noni. i.d. sampling of the input space.
Efficient weight learning for Markov logic networks
 In Proceedings of the Eleventh European Conference on Principles and Practice of Knowledge Discovery in Databases
, 2007
"... Abstract. Markov logic networks (MLNs) combine Markov networks and firstorder logic, and are a powerful and increasingly popular representation for statistical relational learning. The stateoftheart method for discriminative learning of MLN weights is the voted perceptron algorithm, which is ess ..."
Abstract

Cited by 57 (7 self)
 Add to MetaCart
Abstract. Markov logic networks (MLNs) combine Markov networks and firstorder logic, and are a powerful and increasingly popular representation for statistical relational learning. The stateoftheart method for discriminative learning of MLN weights is the voted perceptron algorithm, which is essentially gradient descent with an MPE approximation to the expected sufficient statistics (true clause counts). Unfortunately, these can vary widely between clauses, causing the learning problem to be highly illconditioned, and making gradient descent very slow. In this paper, we explore several alternatives, from perweight learning rates to secondorder methods. In particular, we focus on two approaches that avoid computing the partition function: diagonal Newton and scaled conjugate gradient. In experiments on standard SRL datasets, we obtain orderofmagnitude speedups, or more accurate models given comparable learning times. 1
A tutorial on energybased learning
 Predicting Structured Data
, 2006
"... EnergyBased Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in ..."
Abstract

Cited by 42 (6 self)
 Add to MetaCart
EnergyBased Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variables. Inference consists in clamping the value of observed variables and finding configurations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables are given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discriminative and generative approaches, as well as graphtransformer networks, conditional random fields, maximum margin Markov networks, and several manifold learning methods. Probabilistic models must be properly normalized, which sometimes requires evaluating intractable integrals over the space of all possible variable configurations. Since EBMs have no requirement for proper normalization, this problem is naturally circumvented. EBMs can be viewed as a form of nonprobabilistic factor graphs, and they provide considerably more flexibility in the design of architectures and training criteria than probabilistic approaches. 1
Mathematical Programming in Neural Networks
 ORSA Journal on Computing
, 1993
"... This paper highlights the role of mathematical programming, particularly linear programming, in training neural networks. A neural network description is given in terms of separating planes in the input space that suggests the use of linear programming for determining these planes. A more standard d ..."
Abstract

Cited by 40 (13 self)
 Add to MetaCart
This paper highlights the role of mathematical programming, particularly linear programming, in training neural networks. A neural network description is given in terms of separating planes in the input space that suggests the use of linear programming for determining these planes. A more standard description in terms of a mean square error in the output space is also given, which leads to the use of unconstrained minimization techniques for training a neural network. The linear programming approach is demonstrated by a brief description of a system for breast cancer diagnosis that has been in use for the last four years at a major medical facility. 1 What is a Neural Network? A neural network is a representation of a map between an input space and an output space. A principal aim of such a map is to discriminate between the elements of a finite number of disjoint sets in the input space. Typically one wishes to discriminate between the elements of two disjoint point sets in the ndim...
Feedforward Neural Nets as Models for Time Series Forecasting
 ORSA Journal of Computing
, 1993
"... We have studied neural networks as models for time series forecasting, and our research compares the BoxJenkins method against the neural network method for long and short term memory series. Our work was inspired by previously published works that yielded inconsistent results about comparative per ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
We have studied neural networks as models for time series forecasting, and our research compares the BoxJenkins method against the neural network method for long and short term memory series. Our work was inspired by previously published works that yielded inconsistent results about comparative performance. We have since experimented with 16 time series of differing complexity using neural networks. The performance of the neural networks is compared with that of the BoxJenkins method. Our experiments indicate that for time series with long memory, both methods produced comparable results. However, for series with short memory, neural networks outperformed the BoxJenkins model. Because neural networks can be easily built for multiplestepahead forecasting, they present a better long term forecast model than the BoxJenkins method. We discussed the representation ability, the model building process and the applicability of the neural net approach. Neural networks appear to provide a ...
Fast Training Algorithms For MultiLayer Neural Nets
, 1993
"... Training a multilayer neural net by backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. This paper describes an algorithm which is much faster than backpropagation and for which it is not necessary to specify the number of hidden units in advance ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
Training a multilayer neural net by backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. This paper describes an algorithm which is much faster than backpropagation and for which it is not necessary to specify the number of hidden units in advance. The relationship with other fast pattern recognition algorithms, such as algorithms based on kd trees, is mentioned. The algorithm has been implemented and tested on articial problems such as the parity problem and on real problems arising in speech recognition. Experimental results, including training times and recognition accuracy, are given. Generally, the algorithm achieves accuracy as good as or better than nets trained using backpropagation, and the training process is much faster than backpropagation. Accuracy is comparable to that for the \nearest neighbour" algorithm, which is slower and requires more storage space. Comments Only the Abstract is given here. The full paper ap...
Improving the Convergence of the Backpropagation Algorithm Using Learning Rate Adaptation Methods
, 1999
"... This article focuses on gradientbased backpropagation algorithms that use either a common adaptive learning rate for all weights or an individual adaptive learning rate for each weight and apply the Goldstein/Armijo line search. The learningrate adaptation is based on descent techniques and estima ..."
Abstract

Cited by 27 (15 self)
 Add to MetaCart
This article focuses on gradientbased backpropagation algorithms that use either a common adaptive learning rate for all weights or an individual adaptive learning rate for each weight and apply the Goldstein/Armijo line search. The learningrate adaptation is based on descent techniques and estimates of the local Lipschitz constant that are obtained without additional error function and gradient evaluations. The proposed algorithms improve the backpropagation training in terms of both convergence rate and convergence characteristics, such as stable learning and robustness to oscillations. Simulations are conducted to compare and evaluate the convergence behavior of these gradientbased training algorithms with several popular training methods.
Computing Second Derivatives in FeedForward Networks: a Review
 IEEE Transactions on Neural Networks
, 1994
"... . The calculation of second derivatives is required by recent training and analyses techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We here review and develop exact and approximate ..."
Abstract

Cited by 27 (4 self)
 Add to MetaCart
. The calculation of second derivatives is required by recent training and analyses techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We here review and develop exact and approximate algorithms for calculating second derivatives. For networks with jwj weights, simply writing the full matrix of second derivatives requires O(jwj 2 ) operations. For networks of radial basis units or sigmoid units, exact calculation of the necessary intermediate terms requires of the order of 2h + 2 backward/forwardpropagation passes where h is the number of hidden units in the network. We also review and compare three approximations (ignoring some components of the second derivative, numerical differentiation, and scoring). Our algorithms apply to arbitrary activation functions, networks, and error functions (for instance, with connections that skip layers, or radial basis functions, or ...