### Table 4 Leave-one-out classification performances for the LDA, Logit and LS-SVM model using the optimized input sets. LDA LOGIT LS-SVM

2004

"... In PAGE 27: ... Notice the similarities between both curves during the input removal process. Table4 reports the performances of all classifiers using the optimally pruned set of inputs. Again it can be observed that the LS-SVM and LS- SVMBay classifiers yield very good performances when compared to the LDA and Logit classifiers.... ..."

### Table 1: results of the comparison of: k-NN, LS-SVM and KDE, using leave-one-out validation.

### Table 1: This table shows the mse performances (Boston housing data problem) on a test set. We use ensemble models based on collections of learning algorithms of either size 8 or 16 (the number of models is indicated by the second number in Table 1, example NOX16 is an ensemble model based on 16 individual models on the NOX prediction). We show the mean (mea), median (mea) and variance (var) of these distributions after 100 randomization. In the last two columns the result of a Wilcoxon ranksum test are given: h indicates the test that the two distributions are different and p is the corresponding p-value. In the last two rows the mse performance of a standard LS-SVM is given. Again the mean (mea), median (mea) and variance (var) of these distributions are given after 100 randomization.

"... In PAGE 15: ... This indicates a statistically significantly difference between the averages of the two groups. The results are summarized in Table1 . In this table we show the mse performances on a test set.... In PAGE 15: ... In this table we show the mse performances on a test set. We use ensemble models based on collections or learning algorithms of either size 8 or 16 (the number of models is indicated by the second number in Table1 , example NOX16 is a ensemble model based on 16 individual models on the NOX prediction). We give the mean (mea), median (mea) and variance (var) of these distributions after randomization.... In PAGE 16: ... Notice that besides the fact that the mean mse of the coupled ensemble is always better, also the variance is smaller. We observe that this property holds for the majority of experiments in Table1 . An intuitive explanation of this behavior is the following.... In PAGE 20: ...7e-3 Table 2: Misclassification rates on a test set (Tic-Tac-Toe (TTT), Australian Credit Card Data Set (ACR) and the Adult Data Set (ADULT)). The number of models is indicated by the second number in Table1 , example TTT11 is an ensemble model based on 11 individual models on, the TTT prediction. We give the mean (mea), median (mea) and variance (var) of these distributions after 100 randomization.... ..."

### Table 2 Benelux data set: description of the 40 candidate inputs. The inputs include various liquidity (L) , solvency (S), profitability (P) and size (V) measures. Trends (Tr) are used to describe the evolution of the ratios (R). The results of backward input selection are presented by reporting the number of remaining inputs in the LDA, LOGIT and LS-SVM model when an input is removed. These ranking numbers are underlined when the corresponding input is used in the model having optimal leave-one-out cross-validation performance. Hence, inputs with low importance have a high number, while the most important input has rank 1. Input Variable Description

2004

"... In PAGE 24: ...selected from financial statement data, using standard liquidity, profitability and solvency measures. As can be seen from Table2 , both ratios as well as trends of ratios are considered. The data were preprocessed as follows.... ..."

### Table 3 Leave-one-out classification performances (percentages) for the LDA, Logit and LS- SVM model using the full candidate input set. The corresponding p-values (per- centages) are denoted between parentheses. LDA LOGIT LS-SVM

2004

"... In PAGE 25: ... The generalization performance is assessed by means of the leave-one-out cross-validation error, which is a common measure in the bankruptcy prediction literature [22]. In Table3 , we have contrasted the PCC, PCCp, PCCn and AUROC performance of the LS-SVM (26) and the Bayesian LS-SVM decision rule (59) classifier with the performance of the... ..."

### Table 5: Comparison of the test set performance of the COSSO SVM with those of SVM, LS-SVM, LDA, QDA, Logit, C4.5, oneR, IB, Naive Bayes, and the Majority Rule. The results of the other algorithms are taken from the paper Gestel et. al. (2004). BUPA Ionosphere Pima Indian Sonar MR Wisc. BC

"... In PAGE 14: ...Hao Helen Zhang the Wisconsin Breast Cancer data. The basic features of the datasets and the perfor- mances of di erent algorithms are summarized in Table5 . Following Gestel, Suykens, Baesens, Viaene, Vantheienen, Dedene, Moor, and Vandewalle (2004), for each dataset we randomly select 2/3 of the data for training and tuning, and test on the remaining 1/3 of the data.... In PAGE 14: ... We do this randomization 10 times and report the average test set performance and sample standard deviation for the COSSO SVM. The best average test set performances are denoted in bold face for each dataset in Table5 . The ad- ditive COSSO SVM is tted and its performances on these benchmark datasets are very competitive to other algorithms.... ..."

### Table 1 Mean and variances of obtained distances between estimated and true nonlinearities in a SISO example.

"... In PAGE 9: ...to those obtained from the Hermite- and Gaussian ap- proach using different values for nf. The results are dis- played in Table1 . Note that the LS-SVM technique... ..."

### Table 2: Technology Mapping results

"... In PAGE 8: ... The results show that the Boolean approach reduces the number of matching algorithm calls, nd smaller area circuits in better CPU time, and reduces the initial network graph because generic 2-input base function are used. Table2 presents a comparison between SIS and Land for the library 44-2.genlib, which is distributed with the SIS package.... ..."

### Table 1. Parameters used in MPIPE algorithm for architecture optimization of the neural tree.

2003

"... In PAGE 4: ... De ne the probability and the target probability of the best program as P(PROGb) = Y Id;w:used to generate PROGb P(Id;w) (8) and PTARGET = P(PROGb) +(1 P(PROGb)) lr quot; + FIT(PROGel) quot; + FIT(PROGb) (9) where FIT(PROGb) and FIT(PROGel) denote the tness of the best and elitist program. The lr is the learning rate and quot; is the tness constant (see Table1 ). In order to increase the probability P(PROGb), repeat the following process until P(PROGb) PTARGET : P(Id;w)=P(Id;w)+clr lr (1 P(Id;w)) (10) where clr is a constant in uencing the number of iterations.... ..."