Results 1  10
of
491
Optimal Brain Damage
, 1990
"... We have used informationtheoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improved sp ..."
Abstract

Cited by 510 (5 self)
 Add to MetaCart
We have used informationtheoretic ideas to derive a class of practical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvements can be expected: better generalization, fewer training examples required, and improved
The cascadecorrelation learning architecture
 Advances in Neural Information Processing Systems 2
, 1990
"... CascadeCorrelation is a new architecture and supervised learning algorithm for artificial neural networks. Instead of just adjusting the weights in a network of fixed topology, CascadeCorrelation begins with a minimal network, then automatically trains and adds new hidden units one by one, creatin ..."
Abstract

Cited by 801 (6 self)
 Add to MetaCart
CascadeCorrelation is a new architecture and supervised learning algorithm for artificial neural networks. Instead of just adjusting the weights in a network of fixed topology, CascadeCorrelation begins with a minimal network, then automatically trains and adds new hidden units one by one
Boosting the margin: A new explanation for the effectiveness of voting methods
 IN PROCEEDINGS INTERNATIONAL CONFERENCE ON MACHINE LEARNING
, 1997
"... One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this ..."
Abstract

Cited by 897 (52 self)
 Add to MetaCart
that techniques used in the analysis of Vapnik’s support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. We also show theoretically and experimentally that boosting is especially effective at increasing the margins
Loopy belief propagation for approximate inference: An empirical study. In:
 Proceedings of Uncertainty in AI,
, 1999
"... Abstract Recently, researchers have demonstrated that "loopy belief propagation" the use of Pearl's polytree algorithm in a Bayesian network with loops can perform well in the context of errorcorrecting codes. The most dramatic instance of this is the near Shannonlimit performanc ..."
Abstract

Cited by 676 (15 self)
 Add to MetaCart
in a more gen eral setting? We compare the marginals com puted using loopy propagation to the exact ones in four Bayesian network architectures, including two realworld networks: ALARM and QMR. We find that the loopy beliefs of ten converge and when they do, they give a good approximation
The Sample Complexity of Pattern Classification With Neural Networks: The Size of the Weights is More Important Than the Size of the Network
, 1997
"... Sample complexity results from computational learning theory, when applied to neural network learning for pattern classification problems, suggest that for good generalization performance the number of training examples should grow at least linearly with the number of adjustable parameters in the ne ..."
Abstract

Cited by 213 (15 self)
 Add to MetaCart
than the number of weights. For example, consider a twolayer feedforward network of sigmoid units, in which the sum of the magnitudes of the weights associated with each unit is bounded by A and the input dimension is n. We show that the misclassification probability is no more than a certain error
A scaled conjugate gradient algorithm for fast supervised learning
 NEURAL NETWORKS
, 1993
"... A supervised learning algorithm (Scaled Conjugate Gradient, SCG) with superlinear convergence rate is introduced. The algorithm is based upon a class of optimization techniques well known in numerical analysis as the Conjugate Gradient Methods. SCG uses second order information from the neural netwo ..."
Abstract

Cited by 451 (0 self)
 Add to MetaCart
and avoids a time consuming linesearch, which CGB and BFGS uses in each iteration in order to determine an appropriate step size.
Incorporating problem dependent structural information in the architecture of a neural network often lowers the overall complexity. The smaller the complexity of the neural
Policy gradient methods for reinforcement learning with function approximation.
 In NIPS,
, 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract

Cited by 439 (20 self)
 Add to MetaCart
policy. Large applications of reinforcement learning (RL) require the use of generalizing function approximators such neural networks, decisiontrees, or instancebased methods. The dominant approach for the last decade has been the valuefunction approach, in which all function approximation effort goes
Growing Cell Structures  A Selforganizing Network for Unsupervised and Supervised Learning
 Neural Networks
, 1993
"... We present a new selforganizing neural network model having two variants. The first variant performs unsupervised learning and can be used for data visualization, clustering, and vector quantization. The main advantage over existing approaches, e.g., the Kohonen feature map, is the ability of the m ..."
Abstract

Cited by 300 (11 self)
 Add to MetaCart
We present a new selforganizing neural network model having two variants. The first variant performs unsupervised learning and can be used for data visualization, clustering, and vector quantization. The main advantage over existing approaches, e.g., the Kohonen feature map, is the ability
For Valid Generalization, the Size of the Weights is More Important Than the Size of the Network
, 1997
"... This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than the nu ..."
Abstract

Cited by 51 (1 self)
 Add to MetaCart
This paper shows that if a large neural network is used for a pattern classification problem, and the learning algorithm finds a network with small weights that has small squared error on the training patterns, then the generalization performance depends on the size of the weights rather than
On Neural Networks with Minimal Weights
 In Proc. of Neural Information Processing Systems 8
, 1995
"... Linear threshold elements are the basic building blocks of artificial neural networks. A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually, they can be very big integers exponential in the number of t ..."
Abstract

Cited by 3 (3 self)
 Add to MetaCart
Linear threshold elements are the basic building blocks of artificial neural networks. A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually, they can be very big integers exponential in the number
Results 1  10
of
491