Results 1  10
of
14
Evolving Artificial Neural Networks
, 1999
"... This paper: 1) reviews different combinations between ANN's and evolutionary algorithms (EA's), including using EA's to evolve ANN connection weights, architectures, learning rules, and input features; 2) discusses different search operators which have been used in various EA's; and 3) points out po ..."
Abstract

Cited by 411 (6 self)
 Add to MetaCart
This paper: 1) reviews different combinations between ANN's and evolutionary algorithms (EA's), including using EA's to evolve ANN connection weights, architectures, learning rules, and input features; 2) discusses different search operators which have been used in various EA's; and 3) points out possible future research directions. It is shown, through a considerably large literature review, that combinations between ANN's and EA's can lead to significantly better intelligent systems than relying on ANN's or EA's alone
Using mutual information for selecting features in supervised neural net learning
 IEEE Transactions on Neural Networks
, 1994
"... AbstractThis paper investigates the application of the mutual infor “ criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variabl ..."
Abstract

Cited by 198 (1 self)
 Add to MetaCart
AbstractThis paper investigates the application of the mutual infor “ criterion to evaluate a set of candidate features and to select an informative subset to be used as input data for a neural network classifier. Because the mutual information measures arbitrary dependencies between random variables, it is suitable for assessing the “information content ” of features in complex classification tasks, where methods bases on linear relations (like the correlation) are prone to mistakes. The fact that the mutual information is independent of the coordinates chosen permits a robust estimation. Nonetheless, the use of the mutual information for tasks characterized by high input dimensionality requires suitable approximations because of the prohibitive demands on computation and samples. An algorithm is proposed that is based on a “greedy ” selection of the features and that takes both the mutual information with respect to the output class and with respect to the alreadyselected features into account. Finally the results of a series of experiments are discussed. Index TermsFeature extraction, neural network pruning, dimensionality reduction, mutual information, supervised learning,
A Review of Evolutionary Artificial Neural Networks
, 1993
"... Research on potential interactions between connectionist learning systems, i.e., artificial neural networks (ANNs), and evolutionary search procedures, like genetic algorithms (GAs), has attracted a lot of attention recently. Evolutionary ANNs (EANNs) can be considered as the combination of ANNs and ..."
Abstract

Cited by 154 (23 self)
 Add to MetaCart
Research on potential interactions between connectionist learning systems, i.e., artificial neural networks (ANNs), and evolutionary search procedures, like genetic algorithms (GAs), has attracted a lot of attention recently. Evolutionary ANNs (EANNs) can be considered as the combination of ANNs and evolutionary search procedures. This paper first distinguishes among three kinds of evolution in EANNs, i.e., the evolution of connection weights, of architectures and of learning rules. Then it reviews each kind of evolution in detail and analyses critical issues related to different evolutions. The review shows that although a lot of work has been done on the evolution of connection weights and of architectures, few attempts have been made to understand the evolution of learning rules. Interactions among different evolutions are seldom mentioned in current research. However, the evolution of learning rules and its interactions with other kinds of evolution play a vital role in EANNs. As t...
Human Face Recognition and the Face Image Set's Topology
"... If we consider an n x n image as an n 2 dimensional vector, then images of faces can be considered as points in this n 2 dimensional image space. Our previous studies of physical transformations of the face, including translation, small rotations and illumination changes, showed that the set of ..."
Abstract

Cited by 58 (8 self)
 Add to MetaCart
If we consider an n x n image as an n 2 dimensional vector, then images of faces can be considered as points in this n 2 dimensional image space. Our previous studies of physical transformations of the face, including translation, small rotations and illumination changes, showed that the set of face images consists of relatively simple connected subregions in image space [1]. Consequently linear matching techniques can be used to obtain reliable face recognition. However for more general transformations, such as large rotations or scale changes, the face subregions become highly nonconvex. We have therefore developed a scalespace matching technique that allows us to take advantage of knowledge about important geometrical transformations and about the topology of the face subregion in image space. While recognition of faces is the focus of this paper, the algorithm is sufficiently general to be applicable to a large variety of object recognition tasks. List of Symbols ffi: Gr...
Fast Training Algorithms For MultiLayer Neural Nets
, 1993
"... Training a multilayer neural net by backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. This paper describes an algorithm which is much faster than backpropagation and for which it is not necessary to specify the number of hidden units in advance ..."
Abstract

Cited by 30 (0 self)
 Add to MetaCart
Training a multilayer neural net by backpropagation is slow and requires arbitrary choices regarding the number of hidden units and layers. This paper describes an algorithm which is much faster than backpropagation and for which it is not necessary to specify the number of hidden units in advance. The relationship with other fast pattern recognition algorithms, such as algorithms based on kd trees, is mentioned. The algorithm has been implemented and tested on articial problems such as the parity problem and on real problems arising in speech recognition. Experimental results, including training times and recognition accuracy, are given. Generally, the algorithm achieves accuracy as good as or better than nets trained using backpropagation, and the training process is much faster than backpropagation. Accuracy is comparable to that for the \nearest neighbour" algorithm, which is slower and requires more storage space. Comments Only the Abstract is given here. The full paper ap...
Combining Exploratory Projection Pursuit And Projection Pursuit Regression With Application To Neural Networks
 Neural Computation
, 1992
"... We present a novel classification and regression method that combines exploratory projection pursuit (unsupervised training) with projection pursuit regression (supervised training), to yield a new family of cost/complexity penalty terms. Some improved generalization properties are demonstrated on r ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
We present a novel classification and regression method that combines exploratory projection pursuit (unsupervised training) with projection pursuit regression (supervised training), to yield a new family of cost/complexity penalty terms. Some improved generalization properties are demonstrated on real world problems. 1 Introduction Parameter estimation becomes difficult in highdimensional spaces due to the increasing sparseness of the data. Therefore, when a low dimensional representation is embedded in the data, dimensionality reduction methods become useful. One such method  projection pursuit regression (Friedman and Stuetzle, 1981) (PPR) is capable of performing dimensionality reduction by composition, namely, it constructs an approximation to the desired response function using a composition of lower dimensional smooth functions. These functions depend on low dimensional projections through the data. When the dimensionality of the problem is in the thousands, even projection...
Robust Linear Discriminant Trees
 In AI&Statistics95 [7
"... We present a new method for the induction of classification trees with linear discriminants as the partitioning function at each internal node. This paper presents two main contributions: first, a novel objective function called soft entropy which is used to identify optimal coefficients for the lin ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
We present a new method for the induction of classification trees with linear discriminants as the partitioning function at each internal node. This paper presents two main contributions: first, a novel objective function called soft entropy which is used to identify optimal coefficients for the linear discriminants, and second, a novel method for removing outliers called iterative refiltering which boosts performance on many datasets. These two ideas are presented in the context of a single learning algorithm called DTSEPIR, which is compared with the CART and OC1 algorithms. 36.1 Introduction Recursive partitioning classifiers, or decision trees, are an important nonparametric function representation in statistics and machine learning (Friedman 1977, Breiman, Friedman, Olshen & Stone 1984, Quinlan 1986, Quinlan 1993). Their wide and successful use in fielded applications and their simple intuitive appeal make decision tree learning algorithms an important area of study. In this p...
Selecting Good Speech Features for Recognition
, 1996
"... This paper describes a method to select a suitable feature for speech recognition using information theoretic measure. Conventional speech recognition systems heuristically choose a portion of frequency components, cepstrum, melcepstrum, energy, and their time differences of speech waveforms as the ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
This paper describes a method to select a suitable feature for speech recognition using information theoretic measure. Conventional speech recognition systems heuristically choose a portion of frequency components, cepstrum, melcepstrum, energy, and their time differences of speech waveforms as their speech features. However, these systems never have good performance if the selected features are not suitable for speech recognition. Since the recognition rate is the only performance measure of speech recognition system, it is hard to judge how suitable the selected feature is. To solve this problem, it is essential to analyze the feature itself, and measure how good the feature itself is. Good speech features should contain all of the classrelated information and as small amount of the classirrelevant variation as possible. In this paper, we suggest a method to measure the classrelated information and the amount of the classirrelevant variation based on the Shannon's information the...
Dynamic Probability Estimator for Machine Learning
 IEEE Trans. on Neural Networks
"... Abstract—An efficient algorithm for dynamic estimation of probabilities without division on unlimited number of input data is presented. The method estimates probabilities of the sampled data from the raw sample count, while keeping the total count value constant. Accuracy of the estimate depends on ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract—An efficient algorithm for dynamic estimation of probabilities without division on unlimited number of input data is presented. The method estimates probabilities of the sampled data from the raw sample count, while keeping the total count value constant. Accuracy of the estimate depends on the counter size, rather than on the total number of data points. Estimator follows variations of the incoming data probability within a fixed window size, without explicit implementation of the windowing technique. Total design area is very small and all probabilities are estimated concurrently. Dynamic probability estimator was implemented using a programmable gate array from Xilinx. The performance of this implementation is evaluated in terms of the area efficiency and execution time. This method is suitable for the highly integrated design of artificial neural networks where a large number of dynamic probability estimators can work concurrently. Index Terms—Classification, entropy, machine learning, neural network hardware, probability estimator.
On the Combination of Supervised and Unsupervised Learning
"... The bias/variance dilemma is addressed in the context of neural networks. A bias constraint based on prior knowledge about the underlying distribution of the data is discussed as a mean for reducing the overall error measure of a classifier. 1 Introduction The problem of optimal learning in Artifi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The bias/variance dilemma is addressed in the context of neural networks. A bias constraint based on prior knowledge about the underlying distribution of the data is discussed as a mean for reducing the overall error measure of a classifier. 1 Introduction The problem of optimal learning in Artificial Neural Networks is approached through a minimization of some averaged distance between the estimator and the target, on a training sample set. Under the commonly used mean squared error (MSE) distance the error can be decomposed into two terms: bias and variance (see below.) Unfortunately, although the bias does go down through this minimization procedure, the variance may go up, thus, reducing the overall performance of the estimator. This is due to the fact that the observations contain noise either in the input space X or in the regression space Y , and to the fact that only the empirical risk is minimized based on a small sample space. The problem pointed above should not be confuse...