#### DMCA

## Input Space Versus Feature Space in Kernel-Based Methods (1999)

Venue: | IEEE TRANSACTIONS ON NEURAL NETWORKS |

Citations: | 131 - 3 self |

### Citations

13302 | Statistical Learning Theory,
- Vapnik
- 1995
(Show Context)
Citation Context ...rried out in terms of dot products can be made nonlinear by substituting an a priori chosen kernel. Examples of such algorithms include the potential function method, SV Machines, and kernel PCA [1], =-=[34]-=- [29]. The price that one has to pay for this elegance, however, is that the solutions are only obtained as expansions in terms of input patterns mapped into feature space. For instance, the normal ve... |

2464 | Generalized Additive Models.
- HASTIE, TIBSHIRANI
- 1990
(Show Context)
Citation Context ...less to first map the inputs into functions, i.e., into infinite-dimensional objects. However, for a given dataset, it is possible to approximate from (11) by only evaluating it on these points (cf., =-=[15]-=-, [18], [20], [33]).s1002 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Definition 3: For given patterns , we call (13) the empirical kernel map with regard to . Example: Consid... |

1868 | A training algorithm for optimal margin classifiers.
- Boser, Guyon, et al.
- 1992
(Show Context)
Citation Context ...Max-Planck-Institut für biologische Kybernetik, 72076 Tübingen, Germany. Publisher Item Identifier S 1045-9227(99)07268-9. (1) 1045–9227/99$10.00 © 1999 IEEE , and taking the dot product there, i=-=.e., [4]-=- By virtue of this property, we shall call a feature map associated with Any linear algorithm which can be carried out in terms of dot products can be made nonlinear by substituting an a priori chosen... |

1866 |
Spline Models for Observational Data,
- Wahba
- 1990
(Show Context)
Citation Context ...mated within accuracy as a dot product in , between images of B. The Reproducing Kernel Map We can also think of the feature space as a reproducing kernel Hilbert space (RKHS). To see this [2], [23], =-=[36]-=-, [24], [13], recall that a RKHS is a Hilbert space of functions on some set such that all evaluation functionals, i.e., the maps , are continuous. In that case, by the Riesz 1 Those readers who are c... |

1578 | Nonlinear component analysis as a kernel eigenvalue problem.
- Scholkopf, Smola, et al.
- 1998
(Show Context)
Citation Context ... out in terms of dot products can be made nonlinear by substituting an a priori chosen kernel. Examples of such algorithms include the potential function method, SV Machines, and kernel PCA [1], [34] =-=[29]-=-. The price that one has to pay for this elegance, however, is that the solutions are only obtained as expansions in terms of input patterns mapped into feature space. For instance, the normal vector ... |

1299 |
Theory of reproducing kernels.
- Aronszajn
- 1950
(Show Context)
Citation Context ... be approximated within accuracy as a dot product in , between images of B. The Reproducing Kernel Map We can also think of the feature space as a reproducing kernel Hilbert space (RKHS). To see this =-=[2]-=-, [23], [36], [24], [13], recall that a RKHS is a Hilbert space of functions on some set such that all evaluation functionals, i.e., the maps , are continuous. In that case, by the Riesz 1 Those reade... |

476 |
Backpropagation applied to handwritten zip code recognition,”
- LeCun, Boser, et al.
- 1989
(Show Context)
Citation Context ...e obtain a four-dimensional parameterization). 2) Handwritten Digit Denoising: To test our approach on real-world data, we also applied the algorithm to the USPS database of handwritten digits (e.g., =-=[16]-=-, [24]) of 7291 training patterns and 2007 test patterns (size 16 16). For each of the ten digits, we randomly chose 300 examples from the training set and 50 examples from the test set. We used the m... |

398 |
Theoretical foundations of the potential function method in pattern recognition,
- Aizerman, Braverman, et al.
- 1964
(Show Context)
Citation Context ...be carried out in terms of dot products can be made nonlinear by substituting an a priori chosen kernel. Examples of such algorithms include the potential function method, SV Machines, and kernel PCA =-=[1]-=-, [34] [29]. The price that one has to pay for this elegance, however, is that the solutions are only obtained as expansions in terms of input patterns mapped into feature space. For instance, the nor... |

394 | Principal curves,
- Hastie, Stuetzle
- 1989
(Show Context)
Citation Context ...onal case, Fig. 7 depicts the results of denoising a half circle and a square in the plane, using kernel PCA, a nonlinear autoencoder, principal curves, and linear PCA. The principal curves algorithm =-=[14]-=- iteratively estimates a curve capturing the structure of the data. The data are projected to the closest point on a curve which the algorithm tries to construct such that each point is the average of... |

360 | Interpolation of scatter data: distance matrices and conditionally positive definite functions,
- Micchelli
- 1986
(Show Context)
Citation Context ...mage under this map. To characterize this set of points in a specific example, consider Gaussian kernels (41) In this case, maps each input into a Gaussian sitting on that point. However, it is known =-=[19]-=- that no Gaussian can be written as a linear combination of Gaussians centered at other points. Therefore, in the Gaussian case, none of the expansions (3), excluding trivial cases with only one term,... |

243 | An equivalence between sparse approximation and support vector machines,
- Girosi
- 1998
(Show Context)
Citation Context ... accuracy as a dot product in , between images of B. The Reproducing Kernel Map We can also think of the feature space as a reproducing kernel Hilbert space (RKHS). To see this [2], [23], [36], [24], =-=[13]-=-, recall that a RKHS is a Hilbert space of functions on some set such that all evaluation functionals, i.e., the maps , are continuous. In that case, by the Riesz 1 Those readers who are chiefly inter... |

192 | Improving the accuracy and speed of support vector machines,”
- Burges, Schölkopf
- 1997
(Show Context)
Citation Context ... from input space into feature space, we now study the way back. There has been a fair amount of work on aspects of this problem in the context of developing so-called reduced set methods (e.g., [6], =-=[9]-=-, [25], [12], [22]). For pedagogical reasons, we shall postpone reduced set methods to Section V, as they focus on a problem that is already more complex than the one we would like to start with. A. T... |

184 | Simplified support vector decision rules,”
- Burges
- 1996
(Show Context)
Citation Context ...o get from input space into feature space, we now study the way back. There has been a fair amount of work on aspects of this problem in the context of developing so-called reduced set methods (e.g., =-=[6]-=-, [9], [25], [12], [22]). For pedagogical reasons, we shall postpone reduced set methods to Section V, as they focus on a problem that is already more complex than the one we would like to start with.... |

183 | Comparing support vector machines with Gaussian kernels to radial basis function classifiers,”
- Scholkopf, Kah-Kay, et al.
- 1997
(Show Context)
Citation Context ...apply to the case where B. An Algorithm for Approximate Preimages The present section [25] gives an analysis for the case of the Gaussian kernel, which has proven to perform very well in applications =-=[30]-=-, and proposes an iteration procedure for computing preimages of kernel expansions. We start by considering a problem slightly more general than the preimage problem: we are seeking to approximate by ... |

178 |
Matching Pursuit in a time-frequency dictionary,
- Mallat, Zhang
- 1993
(Show Context)
Citation Context ...ion with the regularization properties of the kernel (e.g., [38]), the application of the approach to compression, and the comparison (and connection) to alternative nonlinear denoising methods (cf., =-=[17]-=-). Speeding Up SV Machines: We have shown experimentally that our approximation algorithms can be used to speed up SV machines significantly. Note that in the Gaussian RBF case, the approximation can ... |

116 | Prior Knowledge in Support Vector Kernels
- Schölkopf, Simard, et al.
- 1998
(Show Context)
Citation Context ...eed to endow with a dot product such that (14) To this end, we use the ansatz with being a positive matrix. 4 Enforcing (14) on the training patterns, this yields the self-consistency condition (cf., =-=[28]-=-, [31]) (15) Here, we have used to denote the kernel Gram matrix (16) The condition (15) can be satisfied for instance by the pseudoinverse Equivalently, we could have incorporated this rescaling oper... |

94 |
Theory for Reproducing Kernels and its Applications.
- Saitoh
- 1989
(Show Context)
Citation Context ...ng, kernel methods, PCA, reduced set method, sparse representation, support vector machines. I. INTRODUCTION REPRODUCING kernels are functions which forall pattern sets give rise to positive matrices =-=[23]-=-. Here, is some compact set in which the data lives, typically (but not necessarily) a subset of In the support vector (SV) community, reproducing kernels are often referred to as Mercer kernels (Sect... |

88 | Entropy, compactness and the approximation of operators, - Carl, Stephani - 1990 |

66 | Reducing the run-time complexity of Support Vector Machines”,
- Osuna, Girosi
- 1998
(Show Context)
Citation Context ... into feature space, we now study the way back. There has been a fair amount of work on aspects of this problem in the context of developing so-called reduced set methods (e.g., [6], [9], [25], [12], =-=[22]-=-). For pedagogical reasons, we shall postpone reduced set methods to Section V, as they focus on a problem that is already more complex than the one we would like to start with. A. The Preimage Proble... |

47 | Kernel PCA pattern reconstruction via approximate pre-images,”
- Schölkopf, Mika, et al.
- 1998
(Show Context)
Citation Context ...ccuracies in both OCR and object recognition have been obtained using SV machines [24], [7]. A generalization to the case of regression estimation, leading to similar function expansion, exists [34], =-=[26]-=-. Kernel Principal Component Analysis [29] carries out a linear PCA in the feature space The extracted features take the nonlinear form (32) where, up to a normalization, the are the components of the... |

33 | From regularization operators to support vector kernels, in:
- Smola, Schölkopf
- 1998
(Show Context)
Citation Context ... endow with a dot product such that (14) To this end, we use the ansatz with being a positive matrix. 4 Enforcing (14) on the training patterns, this yields the self-consistency condition (cf., [28], =-=[31]-=-) (15) Here, we have used to denote the kernel Gram matrix (16) The condition (15) can be satisfied for instance by the pseudoinverse Equivalently, we could have incorporated this rescaling operation,... |

28 | Fast approximation of support vector kernel expansions, and an interpretation of clustering as approximation in feature spaces
- Scholkopf, Knirsch, et al.
- 1998
(Show Context)
Citation Context ...he next section, we shall develop a method for minimizing (42), which we will later, in the experimental section, apply to the case where B. An Algorithm for Approximate Preimages The present section =-=[25]-=- gives an analysis for the case of the Gaussian kernel, which has proven to perform very well in applications [30], and proposes an iteration procedure for computing preimages of kernel expansions. We... |

25 |
Support vector learning. Oldenbourg Verlag
- SchOlkopf
- 1997
(Show Context)
Citation Context ...within accuracy as a dot product in , between images of B. The Reproducing Kernel Map We can also think of the feature space as a reproducing kernel Hilbert space (RKHS). To see this [2], [23], [36], =-=[24]-=-, [13], recall that a RKHS is a Hilbert space of functions on some set such that all evaluation functionals, i.e., the maps , are continuous. In that case, by the Riesz 1 Those readers who are chiefly... |

22 | Support vector classifier with asymmetric kernel function
- Tsuda
(Show Context)
Citation Context ...the inputs into functions, i.e., into infinite-dimensional objects. However, for a given dataset, it is possible to approximate from (11) by only evaluating it on these points (cf., [15], [18], [20], =-=[33]-=-).s1002 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Definition 3: For given patterns , we call (13) the empirical kernel map with regard to . Example: Consider first the case ... |

19 | Kernel-dependent support vector error bounds
- Schölkopf, Shawe-Taylor, et al.
- 1999
(Show Context)
Citation Context ...epending on the eigenvalues of the kernel. Using similar entropy number methods, it is also possible to give rather precise data-dependent bounds in terms of the eigenvalues of the kernel Gram matrix =-=[27]. 7 Consider two n-=-ormed spaces i and pX For � P , the �th entropy number of a set w & i is defined as �@wAXa ���� bHX there exists an -cover for w in i containing � or fewer points�X (Recall that the ... |

13 | Entropy Numbers, Operators and Support Vector Kernels," submitted to EuroCOLT '99. See also "Generalization Performance of Regularization Networks and Support Vector Machines via Entropy Numbers of Compact Operators, " http://spigot.anu.edu
- Williamson, Smola, et al.
- 1998
(Show Context)
Citation Context ... be fairly precise. 6 If, however, the distribution of the data is such that it does not fill the sphere, then (18) is wasteful. The argument in the remainder of the section, which is summarized from =-=[37]-=-, shows that using a kernel typically entails that the data in fact lies in some box with rapidly decaying sidelengths, which can be much smaller than the above sphere. From statement 2 of Theorem 1, ... |

9 |
programming support vectormachines for pattern classification and regression estimation; and the SR algorithm: improving speed and tightness
- Frieb, Harrison, et al.
- 1998
(Show Context)
Citation Context ... space into feature space, we now study the way back. There has been a fair amount of work on aspects of this problem in the context of developing so-called reduced set methods (e.g., [6], [9], [25], =-=[12]-=-, [22]). For pedagogical reasons, we shall postpone reduced set methods to Section V, as they focus on a problem that is already more complex than the one we would like to start with. A. The Preimage ... |

8 |
Data clustering and learning. In: The handbook of brain theory and neural networks (Arbib MA, ed), pp 278–282. Cambridege
- Buhmann
- 1998
(Show Context)
Citation Context ...ical instabilities related to being small can thus be approached by restarting the iteration with different starting values. Interestingly, (51) can be interpreted in the context of clustering (e.g., =-=[5]-=-). It determines the center of a single Gaussian cluster, trying to capture as many of the with positive as possible, and simultaneously avoids those with negative For SV classifiers, the sign of the ... |

5 | tutorial on support vector machines for pattern recognization”, Data Mining and Knowledge Discovery - “A - 1998 |

2 | Decision region approximation by polynomials or neural networks
- Blackmore, Williamson, et al.
- 1997
(Show Context)
Citation Context ...other factor two–three). We conjecture that this is due to the following: in classification, we are not interested in , but in , where is the underlying probability distribution of the patterns (cf.=-=, [3]-=-). This is consistent with the fact that the performance of a RS SV classifier can be improved by recomputing an optimal threshold The previous RS construction method [6], [9] can be used for any SV k... |

2 |
invariance in kernel based methods
- “Geometry
- 1999
(Show Context)
Citation Context ...an dimensional submanifold embedded in For simplicity, we assume here that is sufficiently smooth that structures such as a Riemannian metric can be defined on it. Here we will follow the analysis of =-=[8]-=-, to which the reader is referred for more details, although the application to the class of inhomogeneous polynomial kernels is new. We first note that all intrinsic geometrical properties of can be ... |

2 |
Nichtlineare Signalverarbeitung in Feature-Raumen
- Mika
- 1998
(Show Context)
Citation Context ...t map the inputs into functions, i.e., into infinite-dimensional objects. However, for a given dataset, it is possible to approximate from (11) by only evaluating it on these points (cf., [15], [18], =-=[20]-=-, [33]).s1002 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Definition 3: For given patterns , we call (13) the empirical kernel map with regard to . Example: Consider first the... |

2 | A Tutorial on Support Vector Regression [J - A, S |

1 |
personal communication
- Mattera
- 1998
(Show Context)
Citation Context ...o first map the inputs into functions, i.e., into infinite-dimensional objects. However, for a given dataset, it is possible to approximate from (11) by only evaluating it on these points (cf., [15], =-=[18]-=-, [20], [33]).s1002 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 Definition 3: For given patterns , we call (13) the empirical kernel map with regard to . Example: Consider fir... |