## A Mathematical Programming Approach to the Kernel Fisher Algorithm (2001)

### Cached

### Download Links

- [ida.first.gmd.de]
- [www.cs.cmu.edu]
- [ida.first.fhg.de]
- [iesk.et.uni-magdeburg.de]
- [ml.cs.tu-berlin.de]
- DBLP

### Other Repositories/Bibliography

Citations: | 56 - 14 self |

### BibTeX

@MISC{Mika01amathematical,

author = {Sebastian Mika and Gunnar Rätsch and Klaus-Robert Müller},

title = {A Mathematical Programming Approach to the Kernel Fisher Algorithm},

year = {2001}

}

### Years of Citing Articles

### OpenURL

### Abstract

We investigate a new kernel-based classifier: the Kernel Fisher Discriminant (KFD). A mathematical programming formulation based on the observation that KFD maximizes the average margin permits an interesting modification of the original KFD algorithm yielding the sparse KFD. We find that both, KFD and the proposed sparse KFD, can be understood in an unifying probabilistic context. Furthermore, we show connections to Support Vector Machines and Relevance Vector Machines. From this understanding, we are able to outline an interesting kernel--regression technique based upon the KFD algorithm. Simulations support the usefulness of our approach.

### Citations

8980 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ... success of SVMs seems to be triggered by (i) their good generalization performance, (ii) the existence of a unique solution, and (iii) the strong theoretical background: structural risk minimization =-=[12]-=-, supporting the good empirical results. One of the key ingredients responsible for this success is the use of Mercer kernels, allowing for nonlinear decision surfaces which even might incorporate som... |

2028 | Learning with Kernels
- Scholkopf, Smola
- 2002
(Show Context)
Citation Context ...ad to convex linear or quadratic programming problems in the KFD framework. Table 1: Loss functions for the slack variables ξ and their corresponding density/noise models in a probabilistic framework =-=[10]-=-. loss function density model ε-ins. |ξ|ε Laplacian |ξ| 1 Gaussian Huber’s 1 2ξ2 � 1 2σ ξ2 |ξ| − σ 2 1 2(1+ε) exp(−|ξ|ε) 2 exp(−|ξ|) 1 √ exp(− 2π ξ2 2 ) � exp(− ξ2 2σ exp( σ 2 ) if |ξ| ≤ σ − |ξ|) othe... |

1291 | A training algorithm for optimal margin classifiers
- Boser, Guyon, et al.
(Show Context)
Citation Context .... Simulations support the usefulness of our approach. 1 Introduction Recent years have shown an enormous interest in kernel-based classification algorithms, primarily in Support Vector Machines (SVM) =-=[2]-=-. The success of SVMs seems to be triggered by (i) their good generalization performance, (ii) the existence of a unique solution, and (iii) the strong theoretical background: structural risk minimiza... |

1048 | Nonlinear component analysis as a kernel eigenvalue problem
- Schölkopf, Smola, et al.
- 1998
(Show Context)
Citation Context ... new algorithms: take any (usually) linear method and reformulate it using training samples only in dot–products, which are then replaced by the kernel. Examples thereof, among others, are Kernel–PCA =-=[9]-=- and the Kernel Fisher Discriminant (KFD [4]; see also [8, 1]). In this article we consider algorithmic ideas for KFD. Interestingly KFD – although exhibiting a similarly good performance as SVMs – ha... |

315 | Fisher’s discriminant analysis with kernels
- Mika, Rätsch, et al.
- 1999
(Show Context)
Citation Context ...ethod and reformulate it using training samples only in dot–products, which are then replaced by the kernel. Examples thereof, among others, are Kernel–PCA [9] and the Kernel Fisher Discriminant (KFD =-=[4]-=-; see also [8, 1]). In this article we consider algorithmic ideas for KFD. Interestingly KFD – although exhibiting a similarly good performance as SVMs – has no explicit concept of a margin. This is n... |

303 | Regularized discriminant analysis
- Friedman
- 1989
(Show Context)
Citation Context ...he number of training samples ℓ some form of regularization is necessary. In [4] it was proposed to add e.g. the identity or the kernel matrix K to N, penalizing �α�2 or �w�2 , respectively (see also =-=[3]-=-). There are several equivalent ways to optimize (2). One could either solve the generalized eigenproblem Mα = λNα, selecting the eigenvector α with maximal eigenvalue λ, or compute α ≡ N −1 (µ 2−µ 1)... |

254 | Soft margins for AdaBoost
- Rätsch, Onoda, et al.
- 2001
(Show Context)
Citation Context ...s. To estimate the necessary parameters, we ran 5-fold cross validation on the first five realizations of the training sets and took the model parameters to be the median over the five estimates (see =-=[7]-=- for details of the experimental setup). From Table 2 it can be seen that both, SVM and the KFD variants on average perform equally well. In terms of (4) KFD denotes the formulation with quadratic reg... |

218 | Generalized discriminant analysis using a kernel approach
- Baudat, Anouar
- 2000
(Show Context)
Citation Context ...rmulate it using training samples only in dot–products, which are then replaced by the kernel. Examples thereof, among others, are Kernel–PCA [9] and the Kernel Fisher Discriminant (KFD [4]; see also =-=[8, 1]-=-). In this article we consider algorithmic ideas for KFD. Interestingly KFD – although exhibiting a similarly good performance as SVMs – has no explicit concept of a margin. This is noteworthy since t... |

215 | The Relevance Vector Machine
- Tipping
- 2000
(Show Context)
Citation Context ...w · Φ(xi)) + b − yi) 2 ) = exp(− 1 2σ 2 �ξ�2 ). Now, assume some prior p(α|C) over the weights with hyper-parameters C. Computing the posterior we would end up with the Relevance Vector Machine (RVM) =-=[11]-=-. An advantage of the RVM approach is that all hyper-parameters σ and C are estimated automatically. The drawback however is that one has to solve a hard, computationally expensive optimization proble... |

85 | Nonlinear discriminant analysis using kernel functions
- Roth, Steinhage
- 2000
(Show Context)
Citation Context ...rmulate it using training samples only in dot–products, which are then replaced by the kernel. Examples thereof, among others, are Kernel–PCA [9] and the Kernel Fisher Discriminant (KFD [4]; see also =-=[8, 1]-=-). In this article we consider algorithmic ideas for KFD. Interestingly KFD – although exhibiting a similarly good performance as SVMs – has no explicit concept of a margin. This is noteworthy since t... |

42 | K.R.: Invariant feature extraction and classification in kernel space
- Mika, Rätsch, et al.
- 2000
(Show Context)
Citation Context ...at there exists an expansion for w ∈ F in terms of mapped training patterns, i.e. w = � I αiΦ(xi). (1) Using some straight forward algebra, the optimization problem for the KFD can then be written as =-=[5]-=-: J(α) = (α⊤ µ) 2 α⊤Nα = α⊤Mα α⊤ , (2) Nα where µ i = 1 ℓi K1i, N = KK⊤ − � i=1,2 ℓiµ iµ ⊤ i , µ = µ 2 − µ 1, M = µµ ⊤ , and Kij = (Φ(xi) · Φ(xj)) = k(xi, xj). The projection of a test point onto the ... |

32 | An improved training algorithm for kernel fisher discriminants
- Mika, Smola, et al.
(Show Context)
Citation Context ...y be solved by column generation techniques known from mathematical programming. A final possibility to optimize (4) for the standard KFD problem (i.e. quadratic loss and regularizer) is described in =-=[6]-=-. Here one uses a greedy approximation scheme which iteratively constructs a (sparse) solution to the full problem. Such an approach is straight forward to implement and much faster than solving a qua... |

1 | Fisher discriminant analysiswith kernels - Mika, Ratsch, et al. - 1999 |