## Margin maximizing discriminant analysis (2004)

### Cached

### Download Links

- [www.sztaki.hu]
- [www.ualberta.ca]
- [www.ualberta.ca]
- [papersdb.cs.ualberta.ca]
- [www.inf.u-szeged.hu]
- DBLP

### Other Repositories/Bibliography

Venue: | In Proceedings of the 15th European Conference on Machine Learning |

Citations: | 7 - 2 self |

### BibTeX

@INPROCEEDINGS{Kocsor04marginmaximizing,

author = {András Kocsor and Kornél Kovács and Csaba Szepesvári},

title = {Margin maximizing discriminant analysis},

booktitle = {In Proceedings of the 15th European Conference on Machine Learning},

year = {2004},

pages = {227--238}

}

### OpenURL

### Abstract

Abstract. We propose a new feature extraction method called Margin Maximizing Discriminant Analysis (MMDA) which seeks to extract features suitable for classification tasks. MMDA is based on the principle that an ideal feature should convey the maximum information about the class labels and it should depend only on the geometry of the optimal decision boundary and not on those parts of the distribution of the input data that do not participate in shaping this boundary. Further, distinct feature components should convey unrelated information about the data. Two feature extraction methods are proposed for calculating the parameters of such a projection that are shown to yield equivalent results. The kernel mapping idea is used to derive non-linear versions. Experiments with several real-world, publicly available data sets demonstrate that the new method yields competitive results. 1

### Citations

9805 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...are applied to non-linearly transformed data, the full method becomes non-linear. One important case is when the linear method uses only dot-products of the data. In this case the kernel mapping idea =-=[1, 15, 19]-=- can be used to obtain an efficient implementation whose run time does not depend on the dimensionality of the non-linear map’s image space. This ’kernel mapping’ idea applies to many well-known featu... |

3056 |
UCI repository of machine learning databases
- Blake, Merz
- 1998
(Show Context)
Citation Context ...=1 α(i) j 〈φ(xi),φ(x)〉. 5 Experimental Results In the first experiment we sought to demonstrate the visualization capability of MMDA. We used the Wine dataset from the UCI machine learning repository =-=[4]-=- which has 13 continuous attributes, 3 classes and 178 instances. We applied PCA, LDA and MMDA to these data sets. Two dimensional projections of the data are shown in Figure 2. In the case of PCA and... |

2884 |
Introduction to Statistical Pattern Recognition
- Fukunaga
- 1990
(Show Context)
Citation Context ...ng “small noise” which corrupts the input patterns regardless of the class labels. PCA has been generalized to KPCA [18] by using the kernel mapping idea. Classical linear discriminant analysis (LDA) =-=[7]-=- searches for directions that allow optimal discrimination between the classes provided that the input patterns are normally distributed for all classes j = 1,...,m and share the same covariance matri... |

2263 |
Principal Component Analysis
- Jolliffe
- 2002
(Show Context)
Citation Context ...incipal component analysis and linear discriminant analysis. In classification, the best known example utilizing this idea is the support vector machine (SVM) [19].sPrincipal component analysis (PCA) =-=[9]-=- is one of the most widely-known linear feature extraction methods used. It is an unsupervised method that seeks to represent the input patterns in a lower dimensional subspace such that the expected ... |

1401 | A training algorithm for optimal margin classifiers
- Boser, Guyon, et al.
- 1992
(Show Context)
Citation Context ...ions on the data, whereas we make no such assumptions, but propose to employ margin maximizing hyperplanes instead. This choice was motivated by the following desirable properties of such hyperplanes =-=[5, 3, 8]-=-: (i) without any additional information they are likely to provide good generalization on future data; (ii) these hyperplanes are insensitive to small perturbations of correctly classified patterns l... |

846 |
C4.5: Programs for
- Quinlan
- 1993
(Show Context)
Citation Context ...ined this way are quite good. 5 In this case we tested the interaction of MMDA with several classifiers. These were the ANN introduced earlier, support vector machines with the linear kernel and C4.5 =-=[16]-=-. The results for the three datasets are shown in Figure 4. For comparision we also included the results obtained with ‘no feature extraction’, PCA and LDA. For the datasets DNA and Optdigits we got c... |

355 | Fisher discriminant analysis with kernels
- Mika, Ratsch, et al.
- 1999
(Show Context)
Citation Context ...nd share the same covariance matrix. If these assumptions are violated LDA becomes suboptimal and a filtering disaster may occur. Recently, LDA has been generalized using the kernel mapping technique =-=[2, 14, 17]-=- as well. Discriminant analysis as a broader subject addresses the problem of finding a transformation of the input patterns such that classification using the transformed data set becomes easier (e.g... |

297 |
Theoretical foundations of the potential function method in pattern recognition learning
- Aizerman, Braverman, et al.
- 1964
(Show Context)
Citation Context ...are applied to non-linearly transformed data, the full method becomes non-linear. One important case is when the linear method uses only dot-products of the data. In this case the kernel mapping idea =-=[1, 15, 19]-=- can be used to obtain an efficient implementation whose run time does not depend on the dimensionality of the non-linear map’s image space. This ’kernel mapping’ idea applies to many well-known featu... |

273 |
Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations,Philosophical
- Mercer
- 1909
(Show Context)
Citation Context ...e of the linear span of the set {k(x, ·) |x ∈ R d } gives rise to a Hilbert space H where the inner product is defined such that it satisfies 〈k(x1, ·),k(x2, ·)〉 = k(x1,x2) for all points x1,x2 ∈ R d =-=[13]-=-. The choice of k automatically gives rise to the mapping φ : R d → H defined by φ(x) = k(x, ·). This is called the kernel mapping idea [1, 15, 19]. It is clear that the kernel mapping idea can be use... |

237 | Generalized discriminant analysis using a kernel approach
- Baudat, Anouar
- 2000
(Show Context)
Citation Context ...nd share the same covariance matrix. If these assumptions are violated LDA becomes suboptimal and a filtering disaster may occur. Recently, LDA has been generalized using the kernel mapping technique =-=[2, 14, 17]-=- as well. Discriminant analysis as a broader subject addresses the problem of finding a transformation of the input patterns such that classification using the transformed data set becomes easier (e.g... |

188 | Kernel principal component analysis
- Schölkopf, Smola, et al.
- 1999
(Show Context)
Citation Context ...his phenomenon a filtering disaster. PCA can still be very useful e.g. for suppressing “small noise” which corrupts the input patterns regardless of the class labels. PCA has been generalized to KPCA =-=[18]-=- by using the kernel mapping idea. Classical linear discriminant analysis (LDA) [7] searches for directions that allow optimal discrimination between the classes provided that the input patterns are n... |

122 | Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces
- Fukumizu, Bach, et al.
- 2004
(Show Context)
Citation Context ...nical analogy [10, 11] or, in a special case, as a method for maximizing the between-class average margin itself averaged for all pairs of distinct classes [12]. The goal of the algorithm proposed in =-=[6]-=- is to find a linear transformation of the input patterns such that the statistical relationship between the input and output variables is preserved. The authors of this article use reproducing kernel... |

93 | Nonlinear discriminant analysis using kernel functions
- Roth, Steinhage
- 1999
(Show Context)
Citation Context ...nd share the same covariance matrix. If these assumptions are violated LDA becomes suboptimal and a filtering disaster may occur. Recently, LDA has been generalized using the kernel mapping technique =-=[2, 14, 17]-=- as well. Discriminant analysis as a broader subject addresses the problem of finding a transformation of the input patterns such that classification using the transformed data set becomes easier (e.g... |

90 | Support vector machines: Hype or hallelujah
- Bennett, Campbell
(Show Context)
Citation Context ...ions on the data, whereas we make no such assumptions, but propose to employ margin maximizing hyperplanes instead. This choice was motivated by the following desirable properties of such hyperplanes =-=[5, 3, 8]-=-: (i) without any additional information they are likely to provide good generalization on future data; (ii) these hyperplanes are insensitive to small perturbations of correctly classified patterns l... |

51 |
On optimal nonlinear associative recall
- Poggio
- 1975
(Show Context)
Citation Context ...are applied to non-linearly transformed data, the full method becomes non-linear. One important case is when the linear method uses only dot-products of the data. In this case the kernel mapping idea =-=[1, 15, 19]-=- can be used to obtain an efficient implementation whose run time does not depend on the dimensionality of the non-linear map’s image space. This ’kernel mapping’ idea applies to many well-known featu... |

46 | Efficient and Robust Feature Extraction by Maximum Margin Criterion
- Li, Jiang, et al.
(Show Context)
Citation Context ...part, KSDA), which was derived using a mechanical analogy [10, 11] or, in a special case, as a method for maximizing the between-class average margin itself averaged for all pairs of distinct classes =-=[12]-=-. The goal of the algorithm proposed in [6] is to find a linear transformation of the input patterns such that the statistical relationship between the input and output variables is preserved. The aut... |

31 | A PAC-Bayesian margin bound for linear classifiers
- Herbrich, Graepel
- 2002
(Show Context)
Citation Context ...ions on the data, whereas we make no such assumptions, but propose to employ margin maximizing hyperplanes instead. This choice was motivated by the following desirable properties of such hyperplanes =-=[5, 3, 8]-=-: (i) without any additional information they are likely to provide good generalization on future data; (ii) these hyperplanes are insensitive to small perturbations of correctly classified patterns l... |

6 | Kernel-Based Feature Extraction with a Speech Technology Application
- Kocsor, Toth
(Show Context)
Citation Context ... noise). More recent methods in discriminant analysis include the “Springy Discriminant Analysis” (SDA) (and its non-linear kernelized counterpart, KSDA), which was derived using a mechanical analogy =-=[10, 11]-=- or, in a special case, as a method for maximizing the between-class average margin itself averaged for all pairs of distinct classes [12]. The goal of the algorithm proposed in [6] is to find a linea... |

4 | Kernel Springy Discriminant Analysis and Its Application to a Phonological Awareness Teaching System
- Kocsor, Kovács
- 2002
(Show Context)
Citation Context ... noise). More recent methods in discriminant analysis include the “Springy Discriminant Analysis” (SDA) (and its non-linear kernelized counterpart, KSDA), which was derived using a mechanical analogy =-=[10, 11]-=- or, in a special case, as a method for maximizing the between-class average margin itself averaged for all pairs of distinct classes [12]. The goal of the algorithm proposed in [6] is to find a linea... |