### Table 2: Comparison of General Network, General Networkh, ParallelNetwork1, ParallelNetwork2 , ParallelNetwork3, and ParallelNetwork4 for the data set 1 when con dence level c is 0.7. For sub net- works, ParallelNetwork3 and ParallelNetwork4 have less number of parameters than ParallelNetwork1 and ParallelNetwork2. The training data set is 4,094 characters (748 words) and the test data set is 934 characters.

in Applying Parallel Learning Models of Artificial Neural Networks To Letters Recognition From Phonemes

"... In PAGE 3: ...7% for the general multilayered neural network. Table2 com- pares the General Network, General Networkh, Parallel Network1, Parallel Network2, Parallel Network3, and Parallel Network4 for data set 1. General Networkh has the same number of weights as the parallel net- works, Parallel Network1 and Parallel Network2.... ..."

### Table 1: Average out-of-sample negative log-likelihood obtained with the various models on four data sets (standard deviations of the average in parenthesis and p-value to test the null hypotheses that a model has same true generalization error as the pruned neural network). The pruned neural network was better than all the other models in in all cases, and the pair-wise difference is always statistically significant (except with respect to the pruned LARC on Audiology).

2000

"... In PAGE 6: ... 307 cases are used for training and 376 for testing. Table1 clearly shows that the proposed model yields promising results since the pruned neural network was superior to all the other models in all 4 cases, and the pairwise differ- ences with the other models are statistically significant in all 4 cases (except Audiology, where the difference with the network without hidden units, LARC, is not significant). 4 Conclusion In this paper we have proposed a new application of multi-layer neural networks to the mod- elization of high-dimensional distributions, in particular for discrete data (but the model could also be applied to continuous or mixed discrete / continuous data).... ..."

Cited by 5

### Table 1: Average out-of-sample negative log-likelihood obtained with the various models on four data sets (standard deviations of the average in parenthesis and p-value to test the null hypotheses that a model has same true generalization error as the pruned neural network). The pruned neural network was better than all the other models in in all cases, and the pair-wise difference is always statistically significant (except with respect to the pruned LARC on Audiology).

2000

"... In PAGE 6: ... 307 cases are used for training and 376 for testing. Table1 clearly shows that the proposed model yields promising results since the pruned neural network was superior to all the other models in all 4 cases, and the pairwise differ- ences with the other models are statistically significant in all 4 cases (except Audiology, where the difference with the network without hidden units, LARC, is not significant). 4 Conclusion In this paper we have proposed a new application of multi-layer neural networks to the mod- elization of high-dimensional distributions, in particular for discrete data (but the model could also be applied to continuous or mixed discrete / continuous data).... ..."

Cited by 5

### Table 1: Average out-of-sample negative log-likelihood obtained with the various models on four data sets (standard deviations of the average in parenthesis and p-value to test the null hypotheses that a model has same true generalization error as the pruned neural network). The pruned neural network was better than all the other models in in all cases, and the pair-wise difference is always statistically significant (except with respect to the pruned LARC on Audiology).

2000

"... In PAGE 6: ... 307 cases are used for training and 376 for testing. Table1 clearly shows that the proposed model yields promising results since the pruned neural network was superior to all the other models in all 4 cases, and the pairwise differ- ences with the other models are statistically significant in all 4 cases (except Audiology, where the difference with the network without hidden units, LARC, is not significant). 4 Conclusion In this paper we have proposed a new application of multi-layer neural networks to the mod- elization of high-dimensional distributions, in particular for discrete data (but the model could also be applied to continuous or mixed discrete / continuous data).... ..."

Cited by 5

### Table 1: Mean square errors of the predictions of the Multi-Layer Perceptron (MLP), Radial Basis Functions network (RBF), AppART and GasART using the Mackey-Glass problem test set. Neural model Mean Square Error Nodes needed

### Table 1 summarizes the results on the test set of different approaches before us- ing AdaBoost. The Diabolo classifier (even without hand-selected sub-classes in the training set) performs quite well with respect to the multi-layer perceptrons. The exper- iments suggest that fully connected neural networks are not well suited for this task: small nets do poorly on both training and test sets, while large nets overfit.

1997

"... In PAGE 5: ...the classification is invariant), therefore incorporating prior knowledge on the task. Table1 . Error rates for different unboosted classifiers Diabolo classifier fully connected MLP no subclasses hand-selected 22-10-10 22-30-10 22-80-10 train: 2.... ..."

Cited by 4

### Table 4: Weight discretization in other neural network models.

"... In PAGE 5: ...2 Quantization E ects in Other Neural Network Models Also for other neural network models the e ects of a coarse quantization of the weightvalues on recall and learning have been investigated. The small number of weight discretization algorithms proposed can be partly explained from the fact that the required accuracy for successful learning in these models is lower than for gradient descent learning in multilayer networks ( Table4 ). An interesting example of a hardware implementation is Bellcore apos;s implementation of a Boltzmann machine and Mean-Field learning, whichallows on-chip learning with only 5-bit weights [Alspector-92].... ..."

### Table 7.7: A comparison of the recognition rates for testing, the number of weights and feed- forward CPU time for a single pattern presentation are given for a Single Layer Perceptron (SLP), a fully connected Multi-Layer Perceptron (MLP), a Radial Basis Function (RBF) net- work, a sparsely connected Multi-Layer Perceptron (sMLP) and a sparsely connected Higher Order Neural Network (HONN). The standard networks are evaluated as a replacement of the entire Hierarchical Neural Network (HNN) architecture. All networks received as input binary contour images of sizes between 32 32 and 128 128 and classi ed the ventricle patterns into 4 classes. The con gurations for the fully connected MLP and RBF network were optimised. The recognition results are averaged over 6 networks, except for the RBF network which is based on a single experiment only. The networks were not regularised, thus, over training may be present. The CPU times are measured on a SUN-SPARC 10 workstation.

### Table 2 Comparison of Results With Ordinary Least Squares (OLS) and Tobit Regression Models

"... In PAGE 9: ...LP producing an R2 of .22 with training data and .04 with test data (see Table 1). OLS and Tobit Regression Models OLS and Tobit regression models had lower levels of predictive accuracy than did the best performing neutral networks. As indicated in Table2 , R2 for OLS regression was .... In PAGE 10: ...training data and an R2 of .059 on test data (see Table2 ). Although not statistically signifi- cant (w2 = 11:24, p gt;:05), the Tobit regression model generated greater predictive accu- racy than did OLS regression but lagged the best performing ANN by a considerable margin (R2 of .... ..."

### Table 4 shows the confusion matrix that results from training on the same data used to generate Figure 5. This table

2000

"... In PAGE 9: ... Details of the neural network architectures and the complete set of results are given in [25]. Figure 5 - Figure 7 and Table4 - Table 6 present a representative sample of the results pertaining to the... In PAGE 10: ...Table4... In PAGE 21: ...Page 21 of 23 Table4 : Neural network results for the case of no rank-order filtering. The top value in each cell is the multi-layer perceptron result; the bottom value is the radial-basis function neural network result.... ..."

Cited by 1