Results 1 -
7 of
7
Computing Second Derivatives in Feed-Forward Networks: a Review
- IEEE Transactions on Neural Networks
, 1994
"... . The calculation of second derivatives is required by recent training and analyses techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We here review and develop exact and approximate ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
. The calculation of second derivatives is required by recent training and analyses techniques of connectionist networks, such as the elimination of superfluous weights, and the estimation of confidence intervals both for weights and network outputs. We here review and develop exact and approximate algorithms for calculating second derivatives. For networks with jwj weights, simply writing the full matrix of second derivatives requires O(jwj 2 ) operations. For networks of radial basis units or sigmoid units, exact calculation of the necessary intermediate terms requires of the order of 2h + 2 backward/forward-propagation passes where h is the number of hidden units in the network. We also review and compare three approximations (ignoring some components of the second derivative, numerical differentiation, and scoring). Our algorithms apply to arbitrary activation functions, networks, and error functions (for instance, with connections that skip layers, or radial basis functions, or ...
A partitioned neural network approach for vowel classification using smoothed time/frequency features
- IEEE Trans. on Speech and Audio Processing
, 1999
"... A novel pattern classification technique and a new feature extraction method are described and tested for vowel classification. The pattern classification technique partitions an N-way classification task into N*(N-1)/2 two-way classification tasks. Each two-way classification task is performed usin ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
A novel pattern classification technique and a new feature extraction method are described and tested for vowel classification. The pattern classification technique partitions an N-way classification task into N*(N-1)/2 two-way classification tasks. Each two-way classification task is performed using a neural network classifier that is trained to discriminate the two members of one pair of categories. Multiple two-way classification decisions are then combined to form an N-way decision. Some of the advantages of the new classification approach include the partitioning of the task allowing independent feature and classifier optimization for each pair of categories, lowered sensitivity of classification performance on network parameters, a reduction in the amount of training data required, and potential for superior performance relative to a single large network. The features described in this paper, closely related to the cepstral coefficients and delta cepstra commonly used in speech analysis, are developed using a unified mathematical framework which allows arbitrary nonlinear frequency, amplitude, and time scales to compactly represent the spectral/temporal characteristics of speech. This classification approach, combined with a feature-ranking algorithm which selected the 35 most discriminative spectral/temporal features for each vowel pair, resulted in 71.5 % accuracy for classification of 16 vowels extracted from the TIMIT database. These results, significantly higher than other published results for the same task, illustrate the potential for the methods presented in this paper. EDICS: SA1.6.3, SA1.6.1
Cost functions to estimate a posteriori probabilities in multiclass problems
- IEEE Trans. Neural Networks
, 1999
"... Abstract—The problem of designing cost functions to estimate a posteriori probabilities in multiclass problems is addressed in this paper. We establish necessary and sufficient conditions that these costs must satisfy in one-class one-output networks whose outputs are consistent with probability law ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—The problem of designing cost functions to estimate a posteriori probabilities in multiclass problems is addressed in this paper. We establish necessary and sufficient conditions that these costs must satisfy in one-class one-output networks whose outputs are consistent with probability laws. We focus our attention on a particular subset of the corresponding cost functions; those which verify two usually interesting properties: symmetry and separability (well-known cost functions, such as the quadratic cost or the cross entropy are particular cases in this subset). Finally, we present a universal stochastic gradient learning rule for single-layer networks, in the sense of minimizing a general version of these cost functions for a wide family of nonlinear activation functions. Index Terms — Neural networks, pattern classification, probability estimation.
Differential Learning Leads to Efficient Neural Network Classifiers
- In IEEE Proceedings of the 1993 International Conference on Acoustics, Speech, and Signal Processing
, 1992
"... We outline a differential theory of learning for statistical pattern classification. When applied to neural networks, the theory leads to an efficient differential learning strategy based on classification figure-of-merit (CFM) objective functions [5]. Differential learning guarantees the highest pr ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We outline a differential theory of learning for statistical pattern classification. When applied to neural networks, the theory leads to an efficient differential learning strategy based on classification figure-of-merit (CFM) objective functions [5]. Differential learning guarantees the highest probability of generalization for a classifier with limited functional complexity, trained with a limited number of examples. The theory is significant for this and two other reasons: ffl It proves that the current probabilistic learning strategy for neural network classifiers (employing error measure objective functions such as mean-squared error and the Kullback-Leibler distance measure) is inefficient, and therefore sub-optimal. ffl It explains why current theoretical estimates of the training sample size and classifier functional complexity needed for generalization are often orders of magnitude higher than the true information/computational resources needed for the classification task. W...
Two Papers on Feed-Forward Networks
, 1991
"... REPORT DOCUMENTATION PAGE I OMB No. 0704-0188 Pubtic reporting burden for this collection of information _s estimated to average 1 hour per response, including the time for reviewing instructlo=ns, searching existing data source-_, gather ncj and maintaining the data needed and compieting and review ..."
Abstract
- Add to MetaCart
REPORT DOCUMENTATION PAGE I OMB No. 0704-0188 Pubtic reporting burden for this collection of information _s estimated to average 1 hour per response, including the time for reviewing instructlo=ns, searching existing data source-_, gather ncj and maintaining the data needed and compieting and reviewing the col)ection of information Send comments rec_arding this burden estimate or any other a.sp_c_=t of t,hrs collection of information, including suggestions for reducing this Ouraen, to Washington Headquarters Services, Directorate tot reformat on Operat ons and Reports, 215 Jenersun
Wave Solder Process Control Modeling Using A Neural Network Approach
- In Intelligent Engineering Systems Through Arti® cial Neural Networks
, 1994
"... : We discuss the formulation and results of a simple backpropagation approach to the control of wave soldering of printed circuit cards. Small lot sizes and a large number of different circuit card designs have complicated selection of the tunable process settings at the large manufacturer we w ..."
Abstract
- Add to MetaCart
: We discuss the formulation and results of a simple backpropagation approach to the control of wave soldering of printed circuit cards. Small lot sizes and a large number of different circuit card designs have complicated selection of the tunable process settings at the large manufacturer we worked with. Use of a neural network predictive model results in improved precision relative to the currently used multivariate linear model. INTRODUCTION The wave solder process involves (1) preheating, (2) fluxing, (3) soldering using a wave of solder, (4) cleaning, and (5) quality control. The process must be adapted according to the design (mass, size, component density, component type, etc.) of the circuit card to optimize quality, i.e. minimize solder connection defects. Process parameters which are controllable are the preheat temperatures and the line speed. Circuit card manufacturers produce products of great diversity in small lot sizes, compounding the selection of good process...
parameters such as cr and A, that is Pr(lw , or) = Pr(). Posterior probabilities of network weights are as follows. For regression with Gaussian error and unknown a,
"... This paper has covered Bayesian theory relevant to the problem of training feed-forward connectionist networks. We now sketch out how this might be put together in practice, assuming a standard gradient descent algorithm as used during search ..."
Abstract
- Add to MetaCart
This paper has covered Bayesian theory relevant to the problem of training feed-forward connectionist networks. We now sketch out how this might be put together in practice, assuming a standard gradient descent algorithm as used during search

