## Simplified PAC-Bayesian margin bounds (2003)

Venue: | In COLT |

Citations: | 48 - 3 self |

### BibTeX

@INPROCEEDINGS{Mcallester03simplifiedpac-bayesian,

author = {David Mcallester},

title = {Simplified PAC-Bayesian margin bounds},

booktitle = {In COLT},

year = {2003},

pages = {203--215}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract. The theoretical understanding of support vector machines is largely based on margin bounds for linear classifiers with unit-norm weight vectors and unit-norm feature vectors. Unit-norm margin bounds have been proved previously using fat-shattering arguments and Rademacher complexity. Recently Langford and Shawe-Taylor proved a dimensionindependent unit-norm margin bound using a relatively simple PAC-Bayesian argument. Unfortunately, the Langford-Shawe-Taylor bound is stated in a variational form making direct comparison to fat-shattering bounds difficult. This paper provides an explicit solution to the variational problem implicit in the Langford-Shawe-Taylor bound and shows that the PAC-Bayesian margin bounds are significantly tighter. Because a PAC-Bayesian bound is derived from a particular prior distribution over hypotheses, a PAC-Bayesian margin bound also seems to provide insight into the nature of the learning bias underlying the bound. 1

### Citations

1488 | Probability Inequalities for Sums of Bounded Random Variables
- Hoeffding
- 1963
(Show Context)
Citation Context ...ma 2. For γ > 0 we have the following. ln ≤ ln + (mγ 2 ) γ2 + 3 1 2 ln m+ln δ +3 m−1 1 Φ(µ(γ)) ≤ ln+ � mγ2� γ2 + 1 lnm + 3 2 ⎪⎭ � ⊓⊔sProof. First, if µ(γ) ≤ 3/2 we have the following. ln 1 1 ≤ ln ≤ 3 =-=(7)-=- Φ(µ(γ)) Φ(3/2) In this case ln � mγ 2� might be negative, but the lemma still follows. Now suppose µ(γ) ≥ 3/2. For µ ≥ 0 we have the following well known lower bound on Φ(µ) (see [14]). Φ(µ) ≥ � 1 − ... |

721 | Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics
- Schapire, Freund, et al.
- 1998
(Show Context)
Citation Context ...uction Margin bounds play a central role in learning theory. Margin bounds for convex combination weight vectors (unit ℓ1 norm weight vectors) provide a theoretical foundation for boosting algorithms =-=[15,9,8]-=-. Margin bounds for unit-norm weight vectors provide a theoretical foundation for support vector machines [3,17,2]. This paper concerns the unit-norm margin bounds underlying support vector machines. ... |

257 | Rademacher and Gaussian complexities: Risk bounds and structural results
- Bartlett, Mendelson
(Show Context)
Citation Context ...r of features and corresponding weights). Intuitively the quantity 1/γ2 acts like the complexity of the weight vector. Bound (1) has been recently improved using Rademacher complexity — Theorem 21 of =-=[4]-=- implies the following where k is mγ2 . ℓ0(w, D) ≤ ℓγ(w, S) + 8 � ln 4 δ k + 4 √ k + � ln 4 δ m Bound (2) has the nice scaling property that the bound remains meaningful in a limit where k is held con... |

178 | The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Imporatant than the Size of the Network
- Bartlett
- 1998
(Show Context)
Citation Context ...unit ℓ1 norm weight vectors) provide a theoretical foundation for boosting algorithms [15,9,8]. Margin bounds for unit-norm weight vectors provide a theoretical foundation for support vector machines =-=[3,17,2]-=-. This paper concerns the unit-norm margin bounds underlying support vector machines. Earlier unit-norm margin bounds were proved using fat shattering dimension. This paper, building on results by Lan... |

118 | Generalization performance of support vector machines and other pattern classifiers
- Bartlett, Shawe-Taylor
- 1998
(Show Context)
Citation Context ...unit ℓ1 norm weight vectors) provide a theoretical foundation for boosting algorithms [15,9,8]. Margin bounds for unit-norm weight vectors provide a theoretical foundation for support vector machines =-=[3,17,2]-=-. This paper concerns the unit-norm margin bounds underlying support vector machines. Earlier unit-norm margin bounds were proved using fat shattering dimension. This paper, building on results by Lan... |

113 | Empirical margin distributions and bounding the generalization error of combined classi ers
- Koltchinskii, Panchenko
- 2002
(Show Context)
Citation Context ...uction Margin bounds play a central role in learning theory. Margin bounds for convex combination weight vectors (unit ℓ1 norm weight vectors) provide a theoretical foundation for boosting algorithms =-=[15,9,8]-=-. Margin bounds for unit-norm weight vectors provide a theoretical foundation for support vector machines [3,17,2]. This paper concerns the unit-norm margin bounds underlying support vector machines. ... |

59 | PAC-Bayesian stochastic model selection
- McAllester
(Show Context)
Citation Context ...es as well, PAC-Bayesian derivations present the bias of the algorithm in the familiar form of a prior distribution. 2 The PAC-Bayesian Theorem A first version of the PAC-Bayesian theorem appeared in =-=[12]-=-. The improved statement of the theorem given here is due to Langford and the simplified proof in the appendix is due to Seeger [10,16]. Let D be a distribution on a set Z, let P be a distribution on ... |

58 | PAC-Bayes & margins
- Langford, Shawe-Taylor
(Show Context)
Citation Context ...he unit-norm margin bounds underlying support vector machines. Earlier unit-norm margin bounds were proved using fat shattering dimension. This paper, building on results by Langford and Shawe-Taylor =-=[11]-=-, gives a PAC-Bayesian unit-norm margin bound that is tighter than known unitnorm margin bounds derived from fat shattering arguments. Consider a fixed distribution D on pairs 〈x, y〉 with x ∈ R d sati... |

27 |
A Framework for Structural Risk Minimization
- Shawe-Taylor, Bartlett, et al.
- 1996
(Show Context)
Citation Context ...unit ℓ1 norm weight vectors) provide a theoretical foundation for boosting algorithms [15,9,8]. Margin bounds for unit-norm weight vectors provide a theoretical foundation for support vector machines =-=[3,17,2]-=-. This paper concerns the unit-norm margin bounds underlying support vector machines. Earlier unit-norm margin bounds were proved using fat shattering dimension. This paper, building on results by Lan... |

24 | An improved predictive accuracy bound for averaging classifiers
- Langford, Seeger, et al.
- 2001
(Show Context)
Citation Context ...uction Margin bounds play a central role in learning theory. Margin bounds for convex combination weight vectors (unit ℓ1 norm weight vectors) provide a theoretical foundation for boosting algorithms =-=[15,9,8]-=-. Margin bounds for unit-norm weight vectors provide a theoretical foundation for support vector machines [3,17,2]. This paper concerns the unit-norm margin bounds underlying support vector machines. ... |

16 | Concentration inequalities for the missing mass and for histogram rule error
- McAllester, Ortiz
- 2003
(Show Context)
Citation Context ...nd on the error rate of the majority classifier together with (11) yields the following. � 2 ℓ0(w, D) ≤ 2ℓγ(w, S) + 2 � ℓγ(w, S) + 1 � + k ln (k) + k 2 � 1 + 2 ln + (k) � k ⎛� +O ⎝ lnm + ln 1 ⎞ δ ⎠ m =-=(13)-=- Again it is interesting to consider the thermodynamic limit where ℓγ(w, S) and k are held constant as m → ∞. Note that (3) is tighter than (13) in the regime where 1/k is small compared to ℓγ(w, S). ... |

14 | Bounds for averaging classifiers
- Langford, Seeger
- 2002
(Show Context)
Citation Context ...ian Theorem A first version of the PAC-Bayesian theorem appeared in [12]. The improved statement of the theorem given here is due to Langford and the simplified proof in the appendix is due to Seeger =-=[10,16]-=-. Let D be a distribution on a set Z, let P be a distribution on a set H, and let ℓ be a “loss function” from H ×Z to [0, 1]. For any distribution W on Z and h ∈ H let ℓ(h, W) be Ez∼W [ℓ(h, z)]. Let S... |

14 |
PAC-Bayesian generalization bounds for Gaussian processes
- Seeger
(Show Context)
Citation Context ...f (3) is based on an isotropic Gaussian prior over the weight vectors. PAC-Bayesian arguments have also been used to give what appears to be the tightest known bounds for Gaussian process classifiers =-=[16]-=- and useful bounds for convex weight vector linear threshold classifiers [9]. In these cases as well, PAC-Bayesian derivations present the bias of the algorithm in the familiar form of a prior distrib... |

8 |
and Thore Graepel. A PAC-Bayesian margin bound for linear classifiers
- Herbrich
(Show Context)
Citation Context ...improvements on (2) are possible within the Rademacher complexity framework [1]. Initial attempts to use PAC-Bayesian arguments to derive unit-norm margin bounds resulted in bounds that depended on d =-=[6]-=-. Here, building on the work of Langford and Shawe-Taylor [11], we use a PAC-Bayesian argument to show that with probability at least 1 − δ over the choice of the sample S we have the following simult... |

2 |
A New Asymptotic Expansion for the Normal Probability Integral and Mill's Ratio
- Ruben
- 1962
(Show Context)
Citation Context ... ln 1 1 ≤ ln ≤ 3 (7) Φ(µ(γ)) Φ(3/2) In this case ln � mγ 2� might be negative, but the lemma still follows. Now suppose µ(γ) ≥ 3/2. For µ ≥ 0 we have the following well known lower bound on Φ(µ) (see =-=[14]-=-). Φ(µ) ≥ � 1 − 1 µ 2 � 1 1 √2π µ exp � − µ2 � 2 For µ(γ) ≥ 3/2 formula 8 yields the following. This yields the following. Φ(µ(γ)) ≥ 5 1 1 √ 9 2π µ(γ) exp�−µ 2 (γ)/2 � ln 1 Φ(µ(γ)) ≤ 2 + lnµ(γ) + ln(m... |