### Table 2: Results are averaged over 25 realizations of 2100 training and 900 testing documents for RCV1 C, E, G, M categories using kernel PCA with expected linear kernel (and regular PCA).

in Abstract

"... In PAGE 6: ... in any kernel machine. Table2 and 3 attempt to quantify the performance of the kernel PCA under expected linear and RBF ker- nels. To this end, we formed four di erent multiple label tasks, C, E, G, and M (Table 1).... ..."

### Table-3. Comparative performance of symbolic kernel PCA Method using optimal parameters

### Table 5. Average Accuracies for SVM classifier with different kernels with and without doing PCA on the test data Features SVM Kernel Type % Overall Male Female

"... In PAGE 11: ...stimated. These coefficients were then used for classification. The SVM was used for classifying the data and all the four kernel functions were tried. The results are shown in Table5 . Here, we show the results both with and without PCA for comparison.... ..."

### Table 6: The overall accuracy (%) using Kernel PCA subspace majority voting for the flrst stage estimation and FDA subspace majority voting as the second stage reflnement. The accuracy is 47.3%.

### Table 1: Summary of the kernels used in the experiments

"... In PAGE 3: ...Table 1: Summary of the kernels used in the experiments In order to get the initialisations for the sources, kernel PCA was applied to the data. A number of different types of kernels and parameters were used as listed in Table1 . These were then all used for brief simulations with the NFA algorithm to see which provided the best results.... ..."

### Table 7.10: Accuracy, Expected Performance Loss (L) and Gain (G) for classification of iterative methods using Kernel Density Estimation and PCA-transformed feature space. The first 25 princi- pal components account for 90% of the information. Using all PCs Using first PCs

2007

### Table 1: Test error rates on the USPS handwritten digit database for linear Support Vector machines trained on nonlinear principal components extracted by PCA with kernel (20), for degrees 1 through 7. In the case of degree 1, we are doing standard PCA, with the number of nonzero eigenvalues being at most the dimensionality of the space, 256. Clearly, nonlinear principal components a ord test error rates which are superior to the linear case (degree 1).

1998

"... In PAGE 11: ... It simply tries to separate the training data by a hyperplane with large margin. Table1 illustrates two advantages of using nonlinear kernels: rst, performance of a linear classi er trained on nonlinear principal components is better than for the same number of linear components; second, the performance for nonlinear components can be further improved by using more components than possible in the linear case. The 4Thanks to L.... ..."

Cited by 567

### Table 1: Test error rates on the MPI chair database for linear Support Vector machines trained on nonlinear principal components extracted by PCA with kernel (22), for degrees 1 through 7. In the case of degree 1, we are doing standard PCA, with the number of nonzero Eigenvalues being at most the dimensionality of the space, 256; thus, we can extract at most 256 principal components. The performance for the nonlinear cases (degree gt; 1) is signi cantly better than for the linear case, illustrating the utility of the extracted nonlinear components for classi cation.

"... In PAGE 8: ... To assess the utility of the components, we trained a linear Support Vector classi er (Vapnik amp; Chervonenkis, 1979; Cortes amp; Vapnik, 1995) on the classi cation task.7 Table1 summarizes our ndings: in all cases, nonlinear components as ex- tracted by polynomial kernels (cf. Eq.... ..."

### TABLE 3.1: Test error rates on the MPI chair database for linear Support Vector machines trained on nonlinear principal components extracted by PCA with polynomial kernel (2.26), for degrees 1 through 7. In the case of degree 1, we are doing standard PCA, with the number of nonzero Eigenvalues being at most the dimensionality of the space, 256; thus, we can extract at most 256 principal components. The performance for the nonlinear cases (degree gt; 1) is signi cantly better than for the linear case, illustrating the utility of the extracted nonlinear components for classi cation.

1997