Results 1 - 10
of
60
On The Problem Of Local Minima In Backpropagation
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1992
"... Supervised Learning in Multi-Layered Neural Networks (MLNs) has been recently proposed through the well-known Backpropagation algorithm. This is a gradient method which can get stuck in local minima, as simple examples can show. In this paper, some conditions on the network architecture and the lear ..."
Abstract
-
Cited by 60 (16 self)
- Add to MetaCart
Supervised Learning in Multi-Layered Neural Networks (MLNs) has been recently proposed through the well-known Backpropagation algorithm. This is a gradient method which can get stuck in local minima, as simple examples can show. In this paper, some conditions on the network architecture and the learning environment are proposed which ensure the convergence of the Backpropagation algorithm. It is proven in particular that the convergence holds if the classes are linearly-separable. In this case, the experience gained in several experiments shows that MLNs exceed perceptrons in generalization to new examples. Index Terms- Multi-Layered Networks, learning environment, Backpropagation, pattern recognition, linearly-separable classes. I. Introduction Supervised learning in Multi-Layered Networks can be accomplished thanks to Backpropagation (BP ) ([19, 25, 31]). Its application to several different subjects [25], and, particularly, to pattern recognition ([3, 6, 8, 20, 27, 29]), has bee...
Approximation theory of the MLP model in neural networks
- ACTA NUMERICA
, 1999
"... In this survey we discuss various approximation-theoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are appr ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
In this survey we discuss various approximation-theoretic problems that arise in the multilayer feedforward perceptron (MLP) model in neural networks. Mathematically it is one of the simpler models. Nonetheless the mathematics of this model is not well understood, and many of these problems are approximation-theoretic in character. Most of the research we will discuss is of very recent vintage. We will report on what has been done and on various unanswered questions. We will not be presenting practical (algorithmic) methods. We will, however, be exploring the capabilities and limitations of this model. In the first
GAL: Networks that grow when they learn and shrink when they forget
- INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE
, 1991
"... Learning when limited to modification of some parameters has a limited scope; the capability to modify the system structure is also needed to get a wider range of the learnable. In the case of artificial neural networks, learning by iterative adjustment of synaptic weights can only succeed if t ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Learning when limited to modification of some parameters has a limited scope; the capability to modify the system structure is also needed to get a wider range of the learnable. In the case of artificial neural networks, learning by iterative adjustment of synaptic weights can only succeed if the network designer predefines an appropriate network structure, i.e., number of hidden layers, units, and the size and shape of their receptive and projective fields. This paper advocates the view that the network structure should not, as usually done, be determined by trial-and-error but should be computed by the learning algorithm. Incremental learning algorithms can modify the network structure by addition and/or removal of units and/or links. A survey of current connectionist literature is given on this line of thought. "Grow and Learn" (GAL) is a new algorithm that learns an association at one-shot due to being incremental and using a local representation. During the so-called...
Degraded Text Recognition Using Visual And Linguistic Context
, 1995
"... Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depend ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
Recognition of degraded text is a challenging problem. To improve the performance of an OCR system on degraded images of text, postprocessing techniques are critical. The objective of postprocessing is to correct errors or to resolve ambiguities in OCR results by using contextual information. Depending on the extent of context used, there are different levels of postprocessing. In current commercial OCR systems, word-level postprocessing methods, such as dictionary-lookup, have been applied successfully. However, many OCR errors cannot be corrected by word-level postprocessing. To overcome this limitation, passage-level postprocessing, in which global contextual information is utilized, is necessary. In most current studies on passage-level postprocessing, linguistic context is the major resource to be exploited. This thesis addresses problems in degraded text recognition and discusses potential solutions through passage-level postprocessing. The objective is to develop a postprocessin...
Feedforward Neural Networks for Nonparametric Regression
, 1998
"... Feed forward neural networks (FFNN) with an unconstrained random number of hidden neurons define flexible non-parametric regression models. In Müller and Rios Insua (1998) we have argued that variable architecture models with random size hidden layer significantly reduce posterior multimodality typi ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Feed forward neural networks (FFNN) with an unconstrained random number of hidden neurons define flexible non-parametric regression models. In Müller and Rios Insua (1998) we have argued that variable architecture models with random size hidden layer significantly reduce posterior multimodality typical for posterior distributions in neural network models. In this chapter we review the model proposed in Müller and Rios Insua (1998) and extend it to a non-parametric model by allowing unconstrained size of the hidden layer. This is made possible by introducing a Markov chain Monte Carlo posterior simulation scheme using reversible jump (Green 1995) steps to move between different size architectures.
A Theory Of Classifier Combination: The Neural Network Approach
, 1995
"... There is a trend in recent OCR development to improve system performance by combining recognition results of several complementary algorithms. This thesis examines the classifier combination problem under strict separation of the classifier and combinator design. None other than the fact that every ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
There is a trend in recent OCR development to improve system performance by combining recognition results of several complementary algorithms. This thesis examines the classifier combination problem under strict separation of the classifier and combinator design. None other than the fact that every classifier has the same input and output specification is assumed about the training, design or implementation of the classifiers. A general theory of combination should possess the following properties. It must be able to combine anytype of classifiers regardless of the level of information contents in the outputs. In addition, a general combinator must be able to combine any mixture of classifier types and utilize all information available. Since classifier independence is difficult to achieve and to detect, it is essential for a combinator to handle correlated classifiers robustly. Although the performance of a robust (against correlation) combinator can be improved by adding classifiers indiscriminantly, it is generally of interest to achieve comparable performance with the minimum number of classifiers. Therefore, the combinator should have the ability to eliminate redundant classifiers. Furthermore, it is desirable to have a complexity control mechanism for the combinator. In the past, simplifications come from assumptions and constraints imposed by the system designers. In the general theory, there should be a mechanism to reduce solution complexity by exercising non-classifier-specific constraints. Finally, a combinator should capture classifier/image dependencies. Nearly all combination methods have ignored the fact that classifier performances (and outputs) depend on various image characteristics, and this dependency is manifested in classifier output patterns in relation to input imag...
Designing a Neural Network for Forecasting Financial and Economic Time Series
, 1996
"... Artificial neural networks are universal and highly flexible function xpproximators first used in the fields of cognitive science and engineering. In recent years, neural network applications in finance for such tasks as pattern recognition, classification, and time series forecasting have dramati ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Artificial neural networks are universal and highly flexible function xpproximators first used in the fields of cognitive science and engineering. In recent years, neural network applications in finance for such tasks as pattern recognition, classification, and time series forecasting have dramatically increased. However, the large number of parameters that must be selected to develop a neural network forecasting model have meant that the design process still involves much trial and error. The objective of this paper is to provide a practical introductory guide in the design of a neural network for forecasting economic time series data. An eight-step procedure to design a neural network forecasting model is explained including a discussion of tradeoffs in parameter selection, some common pitfalls, and points of disagreement among practitioners.
A partitioned neural network approach for vowel classification using smoothed time/frequency features
- IEEE Trans. on Speech and Audio Processing
, 1999
"... A novel pattern classification technique and a new feature extraction method are described and tested for vowel classification. The pattern classification technique partitions an N-way classification task into N*(N-1)/2 two-way classification tasks. Each two-way classification task is performed usin ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
A novel pattern classification technique and a new feature extraction method are described and tested for vowel classification. The pattern classification technique partitions an N-way classification task into N*(N-1)/2 two-way classification tasks. Each two-way classification task is performed using a neural network classifier that is trained to discriminate the two members of one pair of categories. Multiple two-way classification decisions are then combined to form an N-way decision. Some of the advantages of the new classification approach include the partitioning of the task allowing independent feature and classifier optimization for each pair of categories, lowered sensitivity of classification performance on network parameters, a reduction in the amount of training data required, and potential for superior performance relative to a single large network. The features described in this paper, closely related to the cepstral coefficients and delta cepstra commonly used in speech analysis, are developed using a unified mathematical framework which allows arbitrary nonlinear frequency, amplitude, and time scales to compactly represent the spectral/temporal characteristics of speech. This classification approach, combined with a feature-ranking algorithm which selected the 35 most discriminative spectral/temporal features for each vowel pair, resulted in 71.5 % accuracy for classification of 16 vowels extracted from the TIMIT database. These results, significantly higher than other published results for the same task, illustrate the potential for the methods presented in this paper. EDICS: SA1.6.3, SA1.6.1
A Brief History of Connectionism
- Neural Computing Surveys
, 1998
"... Connectionist research is firmly established within the scientific community, especially within the multi-disciplinary field of cognitive science. This diversity, however, has created an environment which makes it difficult for connectionist researchers to remain aware of recent advances in the fiel ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Connectionist research is firmly established within the scientific community, especially within the multi-disciplinary field of cognitive science. This diversity, however, has created an environment which makes it difficult for connectionist researchers to remain aware of recent advances in the field, let alone understand how the field has developed. This paper attempts to address this problem by providing a brief guide to connectionist research. The paper begins by defining the basic tenets of connectionism. Next, the development of connectionist research is traced, commencing with connectionism's philosophical predecessors, moving to early psychological and neuropsychological influences, followed by the mathematical and computing contributions to connectionist research. Current research is then reviewed, focusing specifically on the different types of network architectures and learning rules in use. The paper concludes by suggesting that neural network research---at least in cognitiv...
Seismic discrimination with artificial neural networks: Preliminary results with regional spectral data
- Bulletin of the Seismological Society of America
, 1990
"... An application of artificial neural networks (ANN) for discrimination between natural earthquakes and underground nuclear explosions has been studied using distance corrected spectral data of regional seismic phases. Pn, Pg, and Lg spectra have been analyzed from 83 western U.S. earthquakes and 87 N ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
An application of artificial neural networks (ANN) for discrimination between natural earthquakes and underground nuclear explosions has been studied using distance corrected spectral data of regional seismic phases. Pn, Pg, and Lg spectra have been analyzed from 83 western U.S. earthquakes and 87 Nevada Test Site explosions recorded at the four broadband seismic stations operated by Lawrence Livermore National Laboratory. Distance corrections are ap-plied to the raw spectra using existing frequency-dependent Q models for the Basin and Range. The spectra are sampled logarithmically at 41 points between 0,1 and 10 Hz for each phase and checked for adequate signal-to-noise ra-tios (S/N> 2). The ANN was implemented on a SUN 4/110 workstation using a backpropagation-feedforward architecture. We find that, using even simple ANN architectures (82 input units, 1 hidden unit, and 2 output units), powerful discrimination systems can be designed. In order to regionalize the data char-acteristics, a separate neural network was assigned to each station. For this data set, the rate of correct recognition for untrained data is over 93 per cent for both earthquakes and explosions at any single station. Using a majority voting scheme with a network of four stations, the rate of correct recognition is over 97 per cent. Although the performance of the ANN is similar to that of the Fisher linear discriminant, the ANN exhibits a number of computational advantages over the conventional method. Finally, examination of the network weights suggests that, in addition to spectral shape, a criterion that the ANN utilized to discriminate between the two populations was the Lg/Pg spectral amplitude ratios.

