Results 11  20
of
50
How to Make Best Use of Evolutionary Learning
 in Complex Systems: From Local Interactions to Global Phenomena
, 1996
"... Evolutionary learning has been developing rapidly in the last decade. It is a powerful and general learning approach which has been used successfully in both symbolic systems, e.g., rulebased systems, and subsymbolic systems, e.g., artificial neural networks. However, most evolutionary learning sys ..."
Abstract

Cited by 22 (13 self)
 Add to MetaCart
(Show Context)
Evolutionary learning has been developing rapidly in the last decade. It is a powerful and general learning approach which has been used successfully in both symbolic systems, e.g., rulebased systems, and subsymbolic systems, e.g., artificial neural networks. However, most evolutionary learning systems have paid little attention to the fact that they are populationbased learning. The common practice is to select the best individual in the last generation as the final learned system. Such practice in essence treats these learning systems as optimisation ones. This paper emphasises the difference between a learning system and an optimisation one, and shows that such difference requires a different approach to populationbased learning and that the current practice of selecting the best individual as the learned system is not the best choice. The paper then argues that a population contains more information than the best individual and thus should be used as the final learned system. Tw...
Bayesian Regularisation and Pruning using a Laplace Prior
 Neural Computation
, 1994
"... Standard techniques for improved generalisation from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy indicates a Laplace rather than a G ..."
Abstract

Cited by 19 (0 self)
 Add to MetaCart
(Show Context)
Standard techniques for improved generalisation from neural networks include weight decay and pruning. Weight decay has a Bayesian interpretation with the decay function corresponding to a prior over weights. The method of transformation groups and maximum entropy indicates a Laplace rather than a Gaussian prior. After training, the weights then arrange themselves into two classes: (1) those with a common sensitivity to the data error (2) those failing to achieve this sensitivity and which therefore vanish. Since the critical value is determined adaptively during training, pruningin the sense of setting weights to exact zerosbecomes a consequence of regularisation alone. The count of free parameters is also reduced automatically as weights are pruned. A comparison is made with results of MacKay using the evidence framework and a Gaussian regulariser. 1 Introduction Neural networks designed for regression or classification need to be trained using some form of stabilisation or re...
Karhunen Loève Feature Extraction For Neural Handwritten Character Recognition
, 1992
"... this paper investigate the effectiveness of Karhunen Lo`eve transforms as classifiable features for handwritten digit recognition. The issues of interest include: 1. What is the optimal feature length? Generalization on an unseen test database is obtained as a function of the dimensionality of the b ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
this paper investigate the effectiveness of Karhunen Lo`eve transforms as classifiable features for handwritten digit recognition. The issues of interest include: 1. What is the optimal feature length? Generalization on an unseen test database is obtained as a function of the dimensionality of the basis space in which characters are represented;
Comparison of Handprinted Digit Classifiers
, 1993
"... this report were trained and tested using feature vectors derived from the digit images of NIST Special Database 3 [13]. This database consists of binary 128 by 128 pixel raster images segmented from Normalized Binary Image Feature E tractor Discriminant Functions Class Finder Re ector Hypothesized ..."
Abstract

Cited by 15 (3 self)
 Add to MetaCart
this report were trained and tested using feature vectors derived from the digit images of NIST Special Database 3 [13]. This database consists of binary 128 by 128 pixel raster images segmented from Normalized Binary Image Feature E tractor Discriminant Functions Class Finder Re ector Hypothesized Class Accept or Re ect Figure 1: Components of Classification System the sample forms of 2100 writers published on CD as [14]. External results on segmentation and recognition of this database have been reported [15]. The relative difficulties of the NIST OCR databases have been discussed in [16]. For this study samples are drawn randomly from the first 250 writers to yield a training set of 7480 digits with a priori class probabilities all equal to 0:1. Even for digits, depending on the application, certain classes may be more prevalent; in banking tasks, for example, "0" is more common. The test set is similarly constructed from the second 250 writers yielding 23140 samples. The images are size normalized by pixel deletion, stroke width is bounded by binary erosion and dilation, and consistent orientation is effected by row shearing. Com onents of Classi cation System
Efficient Training of FeedForward Neural Networks
, 1997
"... : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.3 Optimization strategy : : : : : : : : : : : : ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.2.1 Motivation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 61 A.3 Optimization strategy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 62 A.4 The Backpropagation algorithm : : : : : : : : : : : : : : : : : : : : : : : : 63 A.5 Conjugate direction methods : : : : : : : : : : : : : : : : : : : : : : : : : : 63 A.5.1 Conjugate gradients : : : : : : : : : : : : : : : : : : : : : : : : : : 65 A.5.2 The CGL algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.5.3 The BFGS algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.6 The SCG algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 67 A.7 Test results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 70 A.7.1 Comparison metric : : : : : : : : : : : : : : : : : : : : : : : :...
Products of Gaussians and Probabilistic Minor Component Analysis
 Neural Computation
, 2002
"... Recently Hinton introduced the Products of Experts (PoE) architecture for density estimation, where individual expert probabilities are multiplied and renormalized. We consider products of Gaussian \pancakes" equally elongated in all directions except one, and prove that the maximum likelihood ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Recently Hinton introduced the Products of Experts (PoE) architecture for density estimation, where individual expert probabilities are multiplied and renormalized. We consider products of Gaussian \pancakes" equally elongated in all directions except one, and prove that the maximum likelihood solution for the model gives rise to a minor component analysis solution. We also discuss the covariance structure of sums and products of Gaussian pancakes or 1factor probabilistic principal component analysis (PPCA) models. 1
Learning Distributed Representations of Concepts from Relational Data
 IEEE Transactions on Knowledge and Data Engineering
, 2000
"... In this paper we discuss methods for generalizing over relational data. Our approach is to learn distributed representations for the concepts that coincide with their semantic features and then to use these representations to make inferences. We present Linear Relational Embedding (LRE), a met ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
In this paper we discuss methods for generalizing over relational data. Our approach is to learn distributed representations for the concepts that coincide with their semantic features and then to use these representations to make inferences. We present Linear Relational Embedding (LRE), a method that learns a mapping from the concepts into a featurespace by imposing the constraint that relations in this featurespace are modeled by linear operations. We then show that this linearity constrains the type of relations that LRE can represent. Finally, we introduce NonLinear Relational Embedding (NLRE), and show that it can represent any relation.
Binary Decision Clustering for Neural Network Based OCR
, 1994
"... This paper presents a clustering ..."
Combining Conjugate Direction Methods with Stochastic Approximation of Gradients
 Proc. 9th Intl. Workshop Artificial Intelligence and Statistics
, 2002
"... The method of conjugate directions provides a very eective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
The method of conjugate directions provides a very eective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient.
JETNET 3.0  A Versatile Artificial Neural Network Package
, 1993
"... this paper quantities written in sansserif denote matrices and quantities written in boldface denote vectors ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
this paper quantities written in sansserif denote matrices and quantities written in boldface denote vectors