Results 1 - 10
of
15
Statistical pattern recognition: A review
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2000
"... The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques ..."
Abstract
-
Cited by 487 (20 self)
- Add to MetaCart
The primary goal of pattern recognition is supervised or unsupervised classification. Among the various frameworks in which pattern recognition has been traditionally formulated, the statistical approach has been most intensively studied and used in practice. More recently, neural network techniques and methods imported from statistical learning theory have bean receiving increasing attention. The design of a recognition system requires careful attention to the following issues: definition of pattern classes, sensing environment, pattern representation, feature extraction and selection, cluster analysis, classifier design and learning, selection of training and test samples, and performance evaluation. In spite of almost 50 years of research and development in this field, the general problem of recognizing complex patterns with arbitrary orientation, location, and scale remains unsolved. New and emerging applications, such as data mining, web searching, retrieval of multimedia data, face recognition, and cursive handwriting recognition, require robust and efficient pattern recognition techniques. The objective of this review paper is to summarize and compare some of the well-known methods used in various stages of a pattern recognition system and identify research topics and applications which are at the forefront of this exciting and challenging field.
FERNN: An Algorithm for Fast Extraction of Rules from Neural Networks
- Applied Intelligence
, 2000
"... Before symbolic rules are extracted from a trained neural network, the network is usually pruned so as to obtain more concise rules. Typical pruning algorithms require retraining the network which incurs additional cost. This paper presents FERNN, a fast method for extracting rules from trained neur ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Before symbolic rules are extracted from a trained neural network, the network is usually pruned so as to obtain more concise rules. Typical pruning algorithms require retraining the network which incurs additional cost. This paper presents FERNN, a fast method for extracting rules from trained neural networks without network retraining. Given a fully connected trained feedforward network with a single hidden layer, FERNN first identifies the relevant hidden units by computing their information gains. For each relevant hidden unit, its activation values is divided into two subintervals such that the information gain is maximized. FERNN finds the set of relevant network connections from the input units to this hidden unit by checking the magnitudes of their weights. The connections with large weights are identified as relevant. Finally, FERNN generates rules that distinguish the two subintervals of the hidden activation values in terms of the network inputs. Experimental results show th...
Classification-Based Objective Functions
- Machine Learning. In
, 2007
"... Abstract. Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classification-based objective functions, an intuitive approach to training artificial neural networks on classification problems. Classification-based ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract. Backpropagation, similar to most learning algorithms that can form complex decision surfaces, is prone to overfitting. This work presents classification-based objective functions, an intuitive approach to training artificial neural networks on classification problems. Classification-based learning attempts to guide the network directly to correct pattern classification rather than using an implicit search of common error minimization heuristics, such as sum-squared-error (SSE) and cross-entropy (CE). CB1 is presented here as a novel objective function for learning classification problems. It seeks to directly minimize classification error by backpropagating error only on misclassified patterns from culprit output nodes. CB1 discourages weight saturation and overfitting and achieves higher accuracy on classification problems than optimizing SSE or CE. Experiments on a large OCR data set have shown CB1 to significantly increase generalization accuracy over SSE or CE optimization, from 97.86 % and 98.10%, respectively, to 99.11%. Comparable results are achieved over several data sets from the UC Irvine Machine Learning Database Repository, with an average increase in accuracy from 90.7 % and 91.3 % using optimized SSE and CE networks, respectively, to 92.1 % for CB1. Analysis indicates that CB1 performs a fundamentally different search of the feature space than optimizing SSE or CE and produces significantly different solutions.
Partial Retraining: A New Approach to Input Relevance Determination
, 1999
"... In this article we introduce partial retraining, an algorithm to determine the relevance of the input variables of a trained neural network. We place this algorithm in the context of other approaches to relevance determination. Numerical experiments on both artificial and real-world problems show th ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
In this article we introduce partial retraining, an algorithm to determine the relevance of the input variables of a trained neural network. We place this algorithm in the context of other approaches to relevance determination. Numerical experiments on both artificial and real-world problems show that partial retraining outperforms its competitors, which include methods based on constant substitution, analysis of weight magnitudes, and "optimal brain surgeon". 1 Introduction Feedforward neural networks are able to learn the relationship between input and output variables. Even when knowledge about the problem is limited, as for example in cases where no explicit physical or economical model can be built, neural networks may still capture some of the underlying principles. Especially with a lack of domain knowledge, the usual approach in neural network modeling is to include all input variables that may have an effect on the output. This approach is suboptimal in several aspects. First...
2002a): ELeaRNT: Evolutionary learning of rich neural network topologies
, 2002
"... In this paper we focus on the problem of using a genetic algorithm for model selection within a Bayesian framework. We propose to reduce the model selection problem to a search problem solved using evolutionary computation to explore a posterior distribution over the model space. As a case study, we ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper we focus on the problem of using a genetic algorithm for model selection within a Bayesian framework. We propose to reduce the model selection problem to a search problem solved using evolutionary computation to explore a posterior distribution over the model space. As a case study, we introduce ELeaRNT (Evolutionary Learning of Rich Neural Network Topologies), a genetic algorithm which evolves a particular class of models, namely, Rich Neural Networks (RNN), in order to find an optimal domain-specific non-linear function approximator with a good generalization capability. In order to evolve this kind of neural networks, ELeaRNT uses a Bayesian fitness function. The experimental results prove that ELeaRNT using a Bayesian fitness function finds, in a completely automated way, networks well-matched to the analysed problem, with acceptable complexity.
Pruning Using Parameter and Neuronal Metrics
- Neural Computation
, 1999
"... In this article, we introduce a measure of optimality for architecture selection algorithms for neural networks: the distance from the original network to the new network in a metric that is defined by the probability distributions of all possible networks. We derive two pruning algorithms, one base ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this article, we introduce a measure of optimality for architecture selection algorithms for neural networks: the distance from the original network to the new network in a metric that is defined by the probability distributions of all possible networks. We derive two pruning algorithms, one based on a metric in parameter space and another one based on a metric in neuron space, which are closely related to well-known architecture selection algorithms, such as GOBS. Furthermore, our framework extends the theoretically range of validity of GOBS and therefore can explain results observed in previous experiments. In addition, we give some computational improvements for these algorithms. 1 Introduction A neural network trained on a problem for which its architecture is too small to capture the underlying data structure, will not yield satisfactory training and testing performance. On the other hand, a neural network with too large an architecture can even fit the noise in the training d...
The Improvement and Comparison of different Algorithms for Optimizing Neural Networks . . .
, 1997
"... A recently published idea is to use the A*-Algorithm to optimize the topology of Neural Networks. In this paper, optimization techniques are investigated that combine the A*-Algorithm with different parallel training algorithms, namely the backpropagation algorithm and several hybrid algorithms. The ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
A recently published idea is to use the A*-Algorithm to optimize the topology of Neural Networks. In this paper, optimization techniques are investigated that combine the A*-Algorithm with different parallel training algorithms, namely the backpropagation algorithm and several hybrid algorithms. The hybrid algorithms combine the backpropagation's steepest descent method with different sets of genetic operators. The different algorithms are compared with respect to the quality of the solution and the computation time. We show that by using the hybrid algorithms, the topology optimization is significantly improved. Keywords: Neural Network Optimization, A*-Algorithm, Genetic Training, Competitive Backpropagation, MasPar MP-2 1 Introduction The optimization of Neural Networks (NNs) consists essentially of two problems. These are the optimization of the network topology and the adjustment of the weights. For optimizing the network topology many investigations have been done. The well kn...
Feature Selection
, 2002
"... terns inside available data, by using specific statistical techniques [2]. Even if they are around from almost 50 years, pattern recognition approaches have recently gained a new popularity, due to emerging applications which are not only challanging, but also computationally expensive and very dema ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
terns inside available data, by using specific statistical techniques [2]. Even if they are around from almost 50 years, pattern recognition approaches have recently gained a new popularity, due to emerging applications which are not only challanging, but also computationally expensive and very demanding like data mining (identifying a pattern or a correlation among data or an outlier in millions of multidimensional patterns), document classification (searching text documents), forecasting, multimedia organization and retrieval in databases, flexible information retrieval (product retrieval in e-commerce applications, solution retrieval in help-desk support), etc. . . The statistical approach to pattern recognition represents a pattern as a set of d features or attributes, by viewing it as a d-dimensional feature vector. Classical concepts from statistical decision theory [13] are then used to establish decision boundaries among pattern classes. The recognition system operates in two
Improving Speech Recognition Learning through Lazy Training
"... Backpropagation, like most high-order learning algorithms, is prone to overfitting. We present a novel approach, called lazy training, for reducing the overfit in multiple-output networks. Lazy training has been shown to reduce the error of optimized neural networks by more than half on a large OCR ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Backpropagation, like most high-order learning algorithms, is prone to overfitting. We present a novel approach, called lazy training, for reducing the overfit in multiple-output networks. Lazy training has been shown to reduce the error of optimized neural networks by more than half on a large OCR data set and on several problems from the UCI machine learning database. Here, lazy training is shown to be effective in a multi-layered adaptive learning system, reducing the error of an optimized backpropagation network in a speech recognition system by 55.0% on the TIDIGITS corpus.
AN INCREMENTAL FRAMEWORK BASED ON CROSS-VALIDATION FOR ESTIMATING THE ARCHITECTURE OF A MULTILAYER PERCEPTRON
, 2009
"... We define the problem of optimizing the architecture of a multilayer perceptron (MLP) as a state space search and propose the MOST (Multiple Operators using Statistical Tests) framework that incrementally modifies the structure and checks for improvement using cross-validation. We consider five vari ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We define the problem of optimizing the architecture of a multilayer perceptron (MLP) as a state space search and propose the MOST (Multiple Operators using Statistical Tests) framework that incrementally modifies the structure and checks for improvement using cross-validation. We consider five variants that implement forward/backward search, using single/multiple operators and searching depth-first/breadth-first. On 44 classification and 30 regression datasets, we exhaustively search for the optimal and evaluate the goodness based on: (1) Order, the accuracy with respect to the optimal and (2) Rank, the computational complexity. We check for the effect of two resampling methods (5 × 2, ten-fold cv), four statistical tests (5 × 2cvt, ten-foldcvt, Wilcoxon, sign) and two corrections for multiple comparisons (Bonferroni, Holm). We also compare with Dynamic Node Creation (DNC) and Cascade Correlation (CC). Our results show that: (1) On most datasets, networks with few hidden units are optimal, (2) forward searching finds simpler architectures, (3) variants using single node additions (deletions) generally stop early and get stuck in simple (complex) networks, (4) choosing the best of multiple operators finds networks closer to the optimal, (5) MOST variants generally find simpler networks having lower or comparable error rates than DNC and CC.

