## Large margin multi-category discriminant models and scale-sensitive Ψ-dimensions (2006)

Citations: | 6 - 3 self |

### BibTeX

@MISC{Guermeur06largemargin,

author = {Yann Guermeur},

title = {Large margin multi-category discriminant models and scale-sensitive Ψ-dimensions},

year = {2006}

}

### OpenURL

### Abstract

### Citations

8962 | Statistical Learning Theory
- Vapnik
- 1998
(Show Context)
Citation Context ...se'es, principe inductif de minimisation structurelle du risque, SVM multi-classessScale-sensitive \Psi -dimensions 3 1 Introduction One of the central domains of Vapnik's statistical learning theory =-=[73]-=- is the theory of bounds, which is at the origin of the structural risk minimization (SRM) inductive principle [71, 64] and, as such, has not only a theoretical interest, but also a practical one. Thi... |

2162 | Support-vector networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ... INRIAsScale-sensitive \Psi -dimensions 35 6 Margin Natarajan Dimension of the Multi-class SVMs Support vector machines (SVMs) are learning systems which have been introduced by Vapnik and co-workers =-=[14, 20]-=- as a nonlinear extension of the maximal margin hyperplane [71]. Originally, they were designed to perform pattern recognition (compute dichotomies). In this context, the principle on which they are b... |

1488 | Probability Inequalities for Sums of Bounded Random Variables
- Hoeffding
- 1963
(Show Context)
Citation Context ...hosen independently and uniformly on f\Gamma 1; 1g. To bound from above the right-hand side of (20), an exponential bound can be applied. 3.4 Exponential bound Hoeffding's inequality (see for example =-=[38, 55]-=-) is a consequence of Chernoff's inequality [50]. Theorem 2 (Hoeffding's inequality) Let X 1 ; X 2 ; : : : ; X n be n independent random variables with zero means and bounded ranges: a i ^ X i ^ b i .... |

1285 | A Training Algorithm for Optimal Margin Classifiers
- Boser, Guyon, et al.
- 1992
(Show Context)
Citation Context ... INRIAsScale-sensitive \Psi -dimensions 35 6 Margin Natarajan Dimension of the Multi-class SVMs Support vector machines (SVMs) are learning systems which have been introduced by Vapnik and co-workers =-=[14, 20]-=- as a nonlinear extension of the maximal margin hyperplane [71]. Originally, they were designed to perform pattern recognition (compute dichotomies). In this context, the principle on which they are b... |

1272 |
Spline Models for Observational Data
- Wahba
- 1990
(Show Context)
Citation Context ...0 ) = h\Phi (x); \Phi (x 0 )i H ^ : (34) Hence, the "linear part" of each component function of the model is a function of x belonging to the Reproducing Kernel Hilbert Space (RKHS) (see for instance =-=[6, 59, 75, 76, 12]-=-) \GammasH ^ ; h:; :i H ^ \Deltasand H ae \Gamma \GammasH ^ ; h:; :i H ^ \Deltas+ f1g \DeltasQ . Furthermore, ^ is supposed to satisfy Mercer's conditions [1]. We now introduce the standard hypotheses... |

992 |
A pmbabilistic theory of pattern recognition
- Devroye, Gyorfi, et al.
- 1996
(Show Context)
Citation Context ...lves covering numbers as capacity measure. Their definition is the subject of the following subsection. Introductions to the basic notions of functional analysis used in this document can be found in =-=[18, 19, 23, 70]-=-. 2.3 Capacity measure: covering numbers The notion of covering number is based on the notion of ffl-cover. Definition 7 (ffl-cover or ffl-net) Let (E; ae) be a pseudo-metric space, and B(e; r) the op... |

802 |
Estimation of Dependencies Based on Empirical Data
- Vapnik
- 1982
(Show Context)
Citation Context ... 3 1 Introduction One of the central domains of Vapnik's statistical learning theory [73] is the theory of bounds, which is at the origin of the structural risk minimization (SRM) inductive principle =-=[71, 64]-=- and, as such, has not only a theoretical interest, but also a practical one. This theory has been developed for discriminant analysis, regression and density estimation. The first results in the fiel... |

777 |
Theory of reproducing kernels
- Aronszajn
- 1950
(Show Context)
Citation Context ...0 ) = h\Phi (x); \Phi (x 0 )i H ^ : (34) Hence, the "linear part" of each component function of the model is a function of x belonging to the Reproducing Kernel Hilbert Space (RKHS) (see for instance =-=[6, 59, 75, 76, 12]-=-) \GammasH ^ ; h:; :i H ^ \Deltasand H ae \Gamma \GammasH ^ ; h:; :i H ^ \Deltas+ f1g \DeltasQ . Furthermore, ^ is supposed to satisfy Mercer's conditions [1]. We now introduce the standard hypotheses... |

567 |
Convergence of Stochastic Processes
- Pollard
- 1984
(Show Context)
Citation Context ...j + ln ` 2 flffi '' + 1 m : (9) This theorem can be seen as an extension of Corollary 9 in [8], Theorem 4.1 in [73], and more generally an extension of the Glivenko-Cantelli theorem (see for instance =-=[55, 23, 73, 69]-=-). Its proof is divided into several steps, following the structure proposed in [24, 55, 62]. 3.1 First symmetrization In this first step of the proof, standard techniques are used to replace the prob... |

565 | A comparison of methods for multiclass support vector machines
- Chih-Wei, Chih-Jen
- 2002
(Show Context)
Citation Context ...es [61, 52, 73, 44, 54, 2, 4, 58]. The multi-class SVMs are globally more recent. They are all obtained by combining a multivariate affine model with the nonlinear mapping \Phisinto the feature space =-=[73, 77, 17, 32, 21, 22, 30, 39, 47, 48]-=-. Formally, the functions h = (h k ) 1^k^Q that a Q-category M-SVM can implement have the general form: 8x 2 X ; 8k 2 f1; : : : ; Qg ; h k (x) = hw k ; \Phi (x)i + b k ; (33) where the values of the v... |

419 | Reducing multiclass to binary: A unifying approach for margin classifiers
- Allwein, Schapire, et al.
(Show Context)
Citation Context ...appear central to measure the quality of the discrimination is the value of a multi-class margin. This notion of margin has been studied independently by different groups of authors (see for instance =-=[27, 2]-=-). Definition 3 (Multi-class margin) Let h be a function from X into R Q and (x; y) an element of X \ThetasY. Then the margin of h on (x; y), M (h; x; y), is given by: M (h; x; y) = 1 2 ae h y (x) \Ga... |

372 |
Decision theoretic generalizations of the PAC model for neural net and other learning applications
- Haussler
- 1992
(Show Context)
Citation Context ...H, if it is finite, or infinity otherwise. The Vapnik dimension is a uniform variant of Pollard's pseudo-dimension. INRIAsScale-sensitive \Psi -dimensions 19 Definition 18 (Pollard's pseudo-dimension =-=[56, 36]-=-) Let H be a class of real-valued functions on a set X . A subset s X m = fx i : 1 ^ i ^ mg of X is said to be P -shattered by H if there is a vector v b = (b i ) 2 R m such that, for each binary vect... |

365 | On the algorithmic implementation of multiclass kernel-based vector machines
- Crammer, Singer, et al.
- 2001
(Show Context)
Citation Context ...es [61, 52, 73, 44, 54, 2, 4, 58]. The multi-class SVMs are globally more recent. They are all obtained by combining a multivariate affine model with the nonlinear mapping \Phisinto the feature space =-=[73, 77, 17, 32, 21, 22, 30, 39, 47, 48]-=-. Formally, the functions h = (h k ) 1^k^Q that a Q-category M-SVM can implement have the general form: 8x 2 X ; 8k 2 f1; : : : ; Qg ; h k (x) = hw k ; \Phi (x)i + b k ; (33) where the values of the v... |

310 |
Neural Network Learning: Theoretical Foundations
- Anthony, Bartlett
- 1999
(Show Context)
Citation Context ...of Fj D are separated. 5.2 Lemmas There is a close connection between covering and packing properties of bounded subsets in metric spaces. The following well-known lemma, introduced in [43] (see also =-=[23, 3, 5]-=- for more recent references), will prove useful in what follows. Lemma 4 For every pseudo-metric space (E; ae), every totally bounded subset E 0 of E and ffl ? 0, M(2ffl; E 0 ; ae) ^ N (ffl; E 0 ; ae)... |

282 |
Theoretical foundations of the potential function method in pattern recognition learning,” Automation and Remote Control
- Aizerman, Braverman, et al.
- 1964
(Show Context)
Citation Context ... (RKHS) (see for instance [6, 59, 75, 76, 12]) \GammasH ^ ; h:; :i H ^ \Deltasand H ae \Gamma \GammasH ^ ; h:; :i H ^ \Deltas+ f1g \DeltasQ . Furthermore, ^ is supposed to satisfy Mercer's conditions =-=[1]-=-. We now introduce the standard hypotheses on X and H which will allow us to formulate the upper bound on the margin Natarajan dimension of interest. We suppose that \Phis(X ) is included in the ball ... |

267 |
Neural network classifiers estimate Bayesian a posteriori probabiüties
- D, Lippmann
- 1991
(Show Context)
Citation Context ...e class posterior probabilities, which happens for instance when H is the set of functions computed by a multi-layer perceptron and the training criterion has been adequately chosen (see for instance =-=[57]-=-), applying this decision function is especially natural since it simply amounts to implementing Bayes' estimated decision rule. The class H is supposed to satisfy some mild measurability conditions w... |

267 | Concentration of measure and isoperimetric inequalities in product spaces
- Talagrand
- 1995
(Show Context)
Citation Context ... sources of inspiration, also even in that case, some results exposed here could still prove useful. An obvious possibility is represented by new tools of concentration theory and empirical processes =-=[66, 67, 46, 51, 50]-=-. They make it possible, for instance, to work with data dependent capacity measures such as the empirical VC entropy. A great survey of the recent advances in this field, especially focusing on Radem... |

253 | Structural risk minimization over data-dependent hierarchies
- Shawe-Taylor, Bartlett, et al.
- 1998
(Show Context)
Citation Context ... 3 1 Introduction One of the central domains of Vapnik's statistical learning theory [73] is the theory of bounds, which is at the origin of the structural risk minimization (SRM) inductive principle =-=[71, 64]-=- and, as such, has not only a theoretical interest, but also a practical one. This theory has been developed for discriminant analysis, regression and density estimation. The first results in the fiel... |

250 |
Analyse Fonctionnelle Théorie et applications
- Brezis
- 2006
(Show Context)
Citation Context ...lves covering numbers as capacity measure. Their definition is the subject of the following subsection. Introductions to the basic notions of functional analysis used in this document can be found in =-=[18, 19, 23, 70]-=-. 2.3 Capacity measure: covering numbers The notion of covering number is based on the notion of ffl-cover. Definition 7 (ffl-cover or ffl-net) Let (E; ae) be a pseudo-metric space, and B(e; r) the op... |

243 |
Weak Convergence and Empirical Processes: With Applications to Statistics
- Vaart, Wellner
- 1996
(Show Context)
Citation Context ...lves covering numbers as capacity measure. Their definition is the subject of the following subsection. Introductions to the basic notions of functional analysis used in this document can be found in =-=[18, 19, 23, 70]-=-. 2.3 Capacity measure: covering numbers The notion of covering number is based on the notion of ffl-cover. Definition 7 (ffl-cover or ffl-net) Let (E; ae) be a pseudo-metric space, and B(e; r) the op... |

236 |
On the density of families of sets
- Sauer
- 1972
(Show Context)
Citation Context ...onenkis (VC) dimension [74], for which an upper bound is computed afterwards. The basic result relating a covering number (precisely the growth function) to the VC dimension is the Sauer-Shelah lemma =-=[74, 60, 65]-=-. As stated in the introduction, extensions of the standard VC theory, which only deals with the computation of dichotomies with indicator functions, have mainly been proposed for large margin bi-clas... |

205 | Scale-sensitive dimensions, uniform convergence, and learnablity
- Alon, Ben-David, et al.
- 1997
(Show Context)
Citation Context ...f dichotomies with binary-valued functions. Later on, several studies were devoted to the case of multi-class f1; : : : ; Qg-valued classifers [11], and large margin classifiers computing dichotomies =-=[3, 8, 10]-=- (see also [9] for the case of regression). However, the case of large margin classifiers computing polychotomies (models taking their values in R Q ) has seldom been tackled independently, although i... |

203 | A.: In Defense of One-Vs-All Classification
- Rifkin, Klautau
- 2004
(Show Context)
Citation Context ..., to separate the two categories. 6.1 Architecture and training of the M-SVMs The problem of performing multi-class discriminant analysis with SVMs was initially tackled through decomposition schemes =-=[61, 52, 73, 44, 54, 2, 4, 58]-=-. The multi-class SVMs are globally more recent. They are all obtained by combining a multivariate affine model with the nonlinear mapping \Phisinto the feature space [73, 77, 17, 32, 21, 22, 30, 39, ... |

197 | Efficient Distribution-free Learning of Probabilistic Concepts
- Kearns, RE
- 1994
(Show Context)
Citation Context ...n bi-class discriminant models, the generalization of the VC dimension which has given birth to the richest set of theoretical results is a scale-sensitive variant called the fat-shattering dimension =-=[41, 42]-=-. In the multi-class case, several alternative solutions were proposed by different authors, such as the graph dimension [25, 53], or the Natarajan dimension [53]. It was proved in [11] that most of t... |

185 | Extracting support data for a given task
- Schölkopf, Burges, et al.
- 1995
(Show Context)
Citation Context ..., to separate the two categories. 6.1 Architecture and training of the M-SVMs The problem of performing multi-class discriminant analysis with SVMs was initially tackled through decomposition schemes =-=[61, 52, 73, 44, 54, 2, 4, 58]-=-. The multi-class SVMs are globally more recent. They are all obtained by combining a multivariate affine model with the nonlinear mapping \Phisinto the feature space [73, 77, 17, 32, 21, 22, 30, 39, ... |

176 | Multicategory support vector machines, theory, and application to the classi cation of microarray data and satellite radiance data
- Lee, Lee, et al.
- 2004
(Show Context)
Citation Context ...es [61, 52, 73, 44, 54, 2, 4, 58]. The multi-class SVMs are globally more recent. They are all obtained by combining a multivariate affine model with the nonlinear mapping \Phisinto the feature space =-=[73, 77, 17, 32, 21, 22, 30, 39, 47, 48]-=-. Formally, the functions h = (h k ) 1^k^Q that a Q-category M-SVM can implement have the general form: 8x 2 X ; 8k 2 f1; : : : ; Qg ; h k (x) = hw k ; \Phi (x)i + b k ; (33) where the values of the v... |

162 | On the learnability and design of output codes for multiclass problems. In
- Crammer, K
- 2000
(Show Context)
Citation Context |

150 | Support vector machines, reproducing kernel hilbert spaces and randomized gacv
- Wahba
- 1998
(Show Context)
Citation Context ...0 ) = h\Phi (x); \Phi (x 0 )i H ^ : (34) Hence, the "linear part" of each component function of the model is a function of x belonging to the Reproducing Kernel Hilbert Space (RKHS) (see for instance =-=[6, 59, 75, 76, 12]-=-) \GammasH ^ ; h:; :i H ^ \Deltasand H ae \Gamma \GammasH ^ ; h:; :i H ^ \Deltas+ f1g \DeltasQ . Furthermore, ^ is supposed to satisfy Mercer's conditions [1]. We now introduce the standard hypotheses... |

135 | Central limit theorems for empirical measures - Dudley - 1978 |

129 |
Uniform Central Limit Theorems
- Dudley
- 1999
(Show Context)
Citation Context ...easurability conditions which will appear implicitly in the sequel. A suitable such condition could for instance result from slightly adapting the "image admissible Suslin" property (see for instance =-=[26]-=-, Section 5.3 or [29]). Hereafter, S will designate the product space X \ThetasY. 2.2 Multi-class margin risk The uniform convergence result established in the following section is based on an extende... |

118 | Generalization performance of support vector machines and other pattern classifiers
- Bartlett, Shawe-Taylor
- 1998
(Show Context)
Citation Context ...f dichotomies with binary-valued functions. Later on, several studies were devoted to the case of multi-class f1; : : : ; Qg-valued classifers [11], and large margin classifiers computing dichotomies =-=[3, 8, 10]-=- (see also [9] for the case of regression). However, the case of large margin classifiers computing polychotomies (models taking their values in R Q ) has seldom been tackled independently, although i... |

98 |
Reproducing Kernel Hilbert Spaces in Probability and Statistics
- Berlinet
- 2003
(Show Context)
Citation Context ...training set. As usual, the mapping \Phisdoes not appear explicitely in the computations. Thanks to the "kernel trick", it is replaced with the reproducing kernel function ^, a positive type function =-=[12]-=- which computes the ` 2 dot product in the feature space. Let \GammasH ^ ; h:; :i H ^ \Deltasdenote this space. We thus have: 8(x; x 0 ) 2 X 2 ; ^(x; x 0 ) = h\Phi (x); \Phi (x 0 )i H ^ : (34) Hence, ... |

96 |
A combinatorial problem: Stability and order for models and theories in infinitary languages
- Shelah
- 1972
(Show Context)
Citation Context ...onenkis (VC) dimension [74], for which an upper bound is computed afterwards. The basic result relating a covering number (precisely the growth function) to the VC dimension is the Sauer-Shelah lemma =-=[74, 60, 65]-=-. As stated in the introduction, extensions of the standard VC theory, which only deals with the computation of dichotomies with indicator functions, have mainly been proposed for large margin bi-clas... |

85 |
Some applications of concentration inequalities in statistics. Annales de la Faculte des Sciences de
- Massart
- 2000
(Show Context)
Citation Context ... sources of inspiration, also even in that case, some results exposed here could still prove useful. An obvious possibility is represented by new tools of concentration theory and empirical processes =-=[66, 67, 46, 51, 50]-=-. They make it possible, for instance, to work with data dependent capacity measures such as the empirical VC entropy. A great survey of the recent advances in this field, especially focusing on Radem... |

74 |
Theory of reproducing kernels and its applications
- Saitoh
- 1988
(Show Context)
Citation Context |

73 | Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators
- Williamson, Smola, et al.
- 2001
(Show Context)
Citation Context ...and parameters. In that sense, it completes previous works on the same subject [30, 33], which had followed another path, namely the computation of a bound on the entropy numbers of a linear operator =-=[78, 79, 34]-=-. Readers primarily interested in computing sample complexities should be aware of the fact that sharper bounds should result from using different (more recent) sources of inspiration, also even in th... |

61 | Fat-shattering and the learnability of real-valued functions
- Bartlett, Long, et al.
- 1996
(Show Context)
Citation Context ...nary-valued functions. Later on, several studies were devoted to the case of multi-class f1; : : : ; Qg-valued classifers [11], and large margin classifiers computing dichotomies [3, 8, 10] (see also =-=[9]-=- for the case of regression). However, the case of large margin classifiers computing polychotomies (models taking their values in R Q ) has seldom been tackled independently, although it cannot be co... |

56 | Multicategory Classification by Support Vector
- Bredensteiner, Bennett
- 1999
(Show Context)
Citation Context |

45 |
On learning sets and functions
- Natarajan
- 1989
(Show Context)
Citation Context ...sults is a scale-sensitive variant called the fat-shattering dimension [41, 42]. In the multi-class case, several alternative solutions were proposed by different authors, such as the graph dimension =-=[25, 53]-=-, or the Natarajan dimension [53]. It was proved in [11] that most of these extensions could be gathered in a general scheme, which makes it possible to derive necessary and sufficient conditions for ... |

43 | Statistical performance of Support Vector Machines
- Blanchard, Bousquet, et al.
- 2004
(Show Context)
Citation Context ... especially focusing on Rademacher averages, is provided by [15]. Regarding more specifically pattern recognition SVMs, the results the extension of which appears most promising are those reported in =-=[16, 63, 13]-=-. Performing these extensions is the subject of an ongoing work. We also intend to study the connection between the finiteness of the margin \Psi -dimensions (for all strictly positive values of their... |

40 |
Pairwise classification and support vector machines
- Kressel
- 1999
(Show Context)
Citation Context ..., to separate the two categories. 6.1 Architecture and training of the M-SVMs The problem of performing multi-class discriminant analysis with SVMs was initially tackled through decomposition schemes =-=[61, 52, 73, 44, 54, 2, 4, 58]-=-. The multi-class SVMs are globally more recent. They are all obtained by combining a multivariate affine model with the nonlinear mapping \Phisinto the feature space [73, 77, 17, 32, 21, 22, 30, 39, ... |

39 | Combining discriminant models with new multi-class SVMs
- Guermeur
(Show Context)
Citation Context ...vial extension of the three former ones [31]. In this report, we extend some of our previous works on the statistical theory of large margin multi-class discriminant systems, reported for instance in =-=[27, 30, 33]-=-. The main idea is to unify two complementary and well established theories: the theory of large margin (bi-class) classifiers and the theory of multi-class f1; : : : ; Qg-valued classifers. To that e... |

39 | Support Vector Machines for MultiClass Classification
- Mayoraz, Alpaydin
- 1998
(Show Context)
Citation Context |

38 |
Universal Donsker classes and metric entropy
- Dudley
- 1987
(Show Context)
Citation Context ...sults is a scale-sensitive variant called the fat-shattering dimension [41, 42]. In the multi-class case, several alternative solutions were proposed by different authors, such as the graph dimension =-=[25, 53]-=-, or the Natarajan dimension [53]. It was proved in [11] that most of these extensions could be gathered in a general scheme, which makes it possible to derive necessary and sufficient conditions for ... |

36 |
Monotone convergence of binomial probabilities and a generalization of ramanujan’s equation
- Jogdeo, Samuels
- 1968
(Show Context)
Citation Context ...tion with parameters n and p (X ,! B (n; p)). Then its median is either bnpc or bnpc + 1. Moreover, if np is an integer, the median is simply np. The proof of this result can for instance be found in =-=[40]-=- (see also Appendix B in [49]). It springs from Lemma 2 that mR(h \Lambdas) \Gammas1 is inferior or equal to the median of mR ~s m (h \Lambdas), and thus, by definition of the median, that the right-h... |

33 |
A generalization of Sauer’s lemma
- Haussler, PM
- 1995
(Show Context)
Citation Context ...or large margin bi-class discriminant models and multi-class discriminant models taking their values in finite sets. In both cases, generalized Sauer-Shelah lemmas have been derived (see for instance =-=[37, 3]-=-), which involve extended notions of VC dimension. For large margin bi-class discriminant models, the generalization of the VC dimension which has given birth to the richest set of theoretical results... |

32 |
Fast Rates for Support Vector Machines
- Scovel, Steinwart
- 2004
(Show Context)
Citation Context ... especially focusing on Rademacher averages, is provided by [15]. Regarding more specifically pattern recognition SVMs, the results the extension of which appears most promising are those reported in =-=[16, 63, 13]-=-. Performing these extensions is the subject of an ongoing work. We also intend to study the connection between the finiteness of the margin \Psi -dimensions (for all strictly positive values of their... |

31 |
Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms
- Bousquet
- 2002
(Show Context)
Citation Context ... especially focusing on Rademacher averages, is provided by [15]. Regarding more specifically pattern recognition SVMs, the results the extension of which appears most promising are those reported in =-=[16, 63, 13]-=-. Performing these extensions is the subject of an ongoing work. We also intend to study the connection between the finiteness of the margin \Psi -dimensions (for all strictly positive values of their... |

29 |
A note on a scale-sensitive dimension of linear bounded functionals
- Gurvits
- 1997
(Show Context)
Citation Context ...e maximal cardinality of a subset of X P shattered by H, if it is finite, or infinity otherwise. The V fl dimension is a scale-sensitive variant of the Vapnik dimension. Definition 19 (V fl dimension =-=[3, 35]-=-) Let H be a class of real-valued functions on a set X . For fl ? 0, a subset s X m = fx i : 1 ^ i ^ mg of X is said to be V fl -shattered by H if there is a scalar b such that, for each binary vector... |

28 |
Inductive principles of the search for empirical dependences
- Vapnik
- 1989
(Show Context)
Citation Context ...ng dimensions can alternatively be seen as multivariate extensions of the fat-shattering dimension. We introduce the definition of this latter dimension progressively. Definition 17 (Vapnik dimension =-=[72]-=-) Let H be a class of real-valued functions on a set X . A subset s X m = fx i : 1 ^ i ^ mg of X is said to be V -shattered by H if there is a scalar b such that, for each binary vector v y = (y i ) 2... |