Results 1  10
of
79
On The Problem Of Local Minima In Backpropagation
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1992
"... Supervised Learning in MultiLayered Neural Networks (MLNs) has been recently proposed through the wellknown Backpropagation algorithm. This is a gradient method which can get stuck in local minima, as simple examples can show. In this paper, some conditions on the network architecture and the lear ..."
Abstract

Cited by 72 (17 self)
 Add to MetaCart
Supervised Learning in MultiLayered Neural Networks (MLNs) has been recently proposed through the wellknown Backpropagation algorithm. This is a gradient method which can get stuck in local minima, as simple examples can show. In this paper, some conditions on the network architecture and the learning environment are proposed which ensure the convergence of the Backpropagation algorithm. It is proven in particular that the convergence holds if the classes are linearlyseparable. In this case, the experience gained in several experiments shows that MLNs exceed perceptrons in generalization to new examples. Index Terms MultiLayered Networks, learning environment, Backpropagation, pattern recognition, linearlyseparable classes. I. Introduction Supervised learning in MultiLayered Networks can be accomplished thanks to Backpropagation (BP ) ([19, 25, 31]). Its application to several different subjects [25], and, particularly, to pattern recognition ([3, 6, 8, 20, 27, 29]), has bee...
Sastry,”Varieties of Learning Automata: An Overview
 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS
, 2002
"... Abstract—Automata models of learning systems introduced in the 1960s were popularized as learning automata (LA) in a survey paper in 1974 [1]. Since then, there have been many fundamental advances in the theory as well as applications of these learning models. In the past few years, the structure of ..."
Abstract

Cited by 51 (0 self)
 Add to MetaCart
Abstract—Automata models of learning systems introduced in the 1960s were popularized as learning automata (LA) in a survey paper in 1974 [1]. Since then, there have been many fundamental advances in the theory as well as applications of these learning models. In the past few years, the structure of LA has been modified in several directions to suit different applications. Concepts such as parameterized learning automata (PLA), generalized learning automata (GLA), and continuous actionset learning automata (CALA) have been proposed, analyzed, and applied to solve many significant learning problems. Furthermore, groups of LA forming teams and feedforward networks have been shown to converge to desired solutions under appropriate learning algorithms. Modules of LA have been used for parallel operation with consequent increase in speed of convergence. All of these concepts and results are relatively new and are scattered in technical literature. An attempt has been made in this paper to bring together the main ideas involved in a unified framework and provide pointers to relevant references. Index Terms—Continuous actionset learning automata (CALA), generalized learning automata (GLA), modules of learning automata, parameterized learning automata (PLA), teams and networks of learning automata. I.
Approximation theory of the MLP model in neural networks
 ACTA NUMERICA
, 1999
"... In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are appr ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
In this survey we discuss various approximationtheoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are approximationtheoretic in character. Most of the research we will discuss is of very recent vintage. We will report on what has been done and on various unanswered questions. We will not be presenting practical (algorithmic) methods. We will, however, be exploring the capabilities and limitations of this model. In the first
Are artificial neural networks black boxes
 IEEE Trans. Neural Networks
, 1997
"... Abstract — Artificial neural networks are efficient computing models which have shown their strengths in solving hard problems in artificial intelligence. They have also been shown to be universal approximators. Notwithstanding, one of the major criticisms is their being black boxes, since no satisf ..."
Abstract

Cited by 38 (3 self)
 Add to MetaCart
Abstract — Artificial neural networks are efficient computing models which have shown their strengths in solving hard problems in artificial intelligence. They have also been shown to be universal approximators. Notwithstanding, one of the major criticisms is their being black boxes, since no satisfactory explanation of their behavior has been offered. In this paper, we provide such an interpretation of neural networks so that they will no longer be seen as black boxes. This is stated after establishing the equality between a certain class of neural nets and fuzzy rulebased systems. This interpretation is built with fuzzy rules using a new fuzzy logic operator which is defined after introducing the concept of fduality. In addition, this interpretation offers an automated knowledge acquisition procedure. Index Terms — Equality between neural nets and fuzzy rulebased systems, fduality, fuzzy additive systems, interpretation of neural nets, ior operator. I.
Designing a Neural Network for Forecasting Financial and Economic Time Series
, 1996
"... Artificial neural networks are universal and highly flexible function xpproximators first used in the fields of cognitive science and engineering. In recent years, neural network applications in finance for such tasks as pattern recognition, classification, and time series forecasting have dramati ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
Artificial neural networks are universal and highly flexible function xpproximators first used in the fields of cognitive science and engineering. In recent years, neural network applications in finance for such tasks as pattern recognition, classification, and time series forecasting have dramatically increased. However, the large number of parameters that must be selected to develop a neural network forecasting model have meant that the design process still involves much trial and error. The objective of this paper is to provide a practical introductory guide in the design of a neural network for forecasting economic time series data. An eightstep procedure to design a neural network forecasting model is explained including a discussion of tradeoffs in parameter selection, some common pitfalls, and points of disagreement among practitioners.
Degraded Text Recognition Using Visual And Linguistic Context
, 1995
"... Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depend ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depending on the extent of context used, there are different levels of postprocessing. In current commercial OCR systems, wordlevel postprocessing methods, such as dictionarylookup, have been applied successfully. However, many OCR errors cannot be corrected by wordlevel postprocessing. To overcome this limitation, passagelevel postprocessing, in which global contextual information is utilized, is necessary. In most current studies on passagelevel postprocessing, linguistic context is the major resource to be exploited. This thesis addresses problems in degraded text recognition and discusses potential solutions through passagelevel postprocessing. The objective is to develop a postprocessin...
GAL: Networks that grow when they learn and shrink when they forget
 INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE
, 1991
"... Learning when limited to modification of some parameters has a limited scope; the capability to modify the system structure is also needed to get a wider range of the learnable. In the case of artificial neural networks, learning by iterative adjustment of synaptic weights can only succeed if t ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
Learning when limited to modification of some parameters has a limited scope; the capability to modify the system structure is also needed to get a wider range of the learnable. In the case of artificial neural networks, learning by iterative adjustment of synaptic weights can only succeed if the network designer predefines an appropriate network structure, i.e., number of hidden layers, units, and the size and shape of their receptive and projective fields. This paper advocates the view that the network structure should not, as usually done, be determined by trialanderror but should be computed by the learning algorithm. Incremental learning algorithms can modify the network structure by addition and/or removal of units and/or links. A survey of current connectionist literature is given on this line of thought. "Grow and Learn" (GAL) is a new algorithm that learns an association at oneshot due to being incremental and using a local representation. During the socalled...
Feedforward Neural Networks for Nonparametric Regression
, 1998
"... Feed forward neural networks (FFNN) with an unconstrained random number of hidden neurons define flexible nonparametric regression models. In Müller and Rios Insua (1998) we have argued that variable architecture models with random size hidden layer significantly reduce posterior multimodality typi ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
Feed forward neural networks (FFNN) with an unconstrained random number of hidden neurons define flexible nonparametric regression models. In Müller and Rios Insua (1998) we have argued that variable architecture models with random size hidden layer significantly reduce posterior multimodality typical for posterior distributions in neural network models. In this chapter we review the model proposed in Müller and Rios Insua (1998) and extend it to a nonparametric model by allowing unconstrained size of the hidden layer. This is made possible by introducing a Markov chain Monte Carlo posterior simulation scheme using reversible jump (Green 1995) steps to move between different size architectures.
A Theory Of Classifier Combination: The Neural Network Approach
, 1995
"... There is a trend in recent OCR development to improve system performance by combining recognition results of several complementary algorithms. This thesis examines the classifier combination problem under strict separation of the classifier and combinator design. None other than the fact that every ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
There is a trend in recent OCR development to improve system performance by combining recognition results of several complementary algorithms. This thesis examines the classifier combination problem under strict separation of the classifier and combinator design. None other than the fact that every classifier has the same input and output specification is assumed about the training, design or implementation of the classifiers. A general theory of combination should possess the following properties. It must be able to combine anytype of classifiers regardless of the level of information contents in the outputs. In addition, a general combinator must be able to combine any mixture of classifier types and utilize all information available. Since classifier independence is difficult to achieve and to detect, it is essential for a combinator to handle correlated classifiers robustly. Although the performance of a robust (against correlation) combinator can be improved by adding classifiers indiscriminantly, it is generally of interest to achieve comparable performance with the minimum number of classifiers. Therefore, the combinator should have the ability to eliminate redundant classifiers. Furthermore, it is desirable to have a complexity control mechanism for the combinator. In the past, simplifications come from assumptions and constraints imposed by the system designers. In the general theory, there should be a mechanism to reduce solution complexity by exercising nonclassifierspecific constraints. Finally, a combinator should capture classifier/image dependencies. Nearly all combination methods have ignored the fact that classifier performances (and outputs) depend on various image characteristics, and this dependency is manifested in classifier output patterns in relation to input imag...
A partitioned neural network approach for vowel classification using smoothed time/frequency features
 IEEE Trans. on Speech and Audio Processing
, 1999
"... A novel pattern classification technique and a new feature extraction method are described and tested for vowel classification. The pattern classification technique partitions an Nway classification task into N*(N1)/2 twoway classification tasks. Each twoway classification task is performed usin ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
A novel pattern classification technique and a new feature extraction method are described and tested for vowel classification. The pattern classification technique partitions an Nway classification task into N*(N1)/2 twoway classification tasks. Each twoway classification task is performed using a neural network classifier that is trained to discriminate the two members of one pair of categories. Multiple twoway classification decisions are then combined to form an Nway decision. Some of the advantages of the new classification approach include the partitioning of the task allowing independent feature and classifier optimization for each pair of categories, lowered sensitivity of classification performance on network parameters, a reduction in the amount of training data required, and potential for superior performance relative to a single large network. The features described in this paper, closely related to the cepstral coefficients and delta cepstra commonly used in speech analysis, are developed using a unified mathematical framework which allows arbitrary nonlinear frequency, amplitude, and time scales to compactly represent the spectral/temporal characteristics of speech. This classification approach, combined with a featureranking algorithm which selected the 35 most discriminative spectral/temporal features for each vowel pair, resulted in 71.5 % accuracy for classification of 16 vowels extracted from the TIMIT database. These results, significantly higher than other published results for the same task, illustrate the potential for the methods presented in this paper. EDICS: SA1.6.3, SA1.6.1