MetaCart Sign in to MyCiteSeerX

Include Citations | Advanced Search | Help

Disambiguated Search | Include Citations | Advanced Search | Help

Logistic Regression, AdaBoost and Bregman Distances (2000) [126 citations — 32 self]

Abstract:

We give a unified account of boosting and logistic regression in which each learning problem is cast in terms of optimization of Bregman distances. The striking similarity of the two problems in this framework allows us to design and analyze algorithms for both simultaneously, and to easily adapt algorithms designed for one problem to the other. For both problems, we give new algorithms and explain their potential advantages over existing methods. These algorithms can be divided into two types based on whether the parameters are iteratively updated sequentially (one at a time) or in parallel (all at once). We also describe a parameterized family of algorithms which interpolates smoothly between these two extremes. For all of the algorithms, we give convergence proofs using a general formalization of the auxiliary-function proof technique. As one of our sequential-update algorithms is equivalent to AdaBoost, this provides the first general proof of convergence for AdaBoost. We show th...

Citations

1239 A decision-theoretic generalization of on-line learning and an application to boosting – Freund, Schapire - 1997
642 A maximum entropy approach to natural language processing – Berger, Pietra, et al. - 1996
623 Additive logistic regression: a statistical view of boosting”, Ann – Friedman, Hastie, et al. - 2000
408 Improved boosting algorithms using confidence-rated predictions. Machine Learning 37 – Schapire, Singer - 1999
370 Inducing features of random fields – Pietra, Pietra, et al. - 1997
298 Generalized iterative scaling for log-linear models – Darroch, Ratcliff - 1972
278 BoosTexter: A boosting-based system for text categorization – Schapire, Singer - 2000
163 Soft margins for AdaBoost – Rätsch, Onoda, et al. - 2001
156 I-divergence geometry of probability distributions and minimization problems. The Annals of Probability – Csiszar - 1975
126 Parallel optimization: Theory, algorithms, and applications – Censor, Zenios - 1997
110 Additive versus exponentiated gradient updates for linear prediction – Kivinen, Warmuth - 1997
103 A relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming – Brègman - 1967
95 Functional gradient techniques for combining hypotheses – Mason, Baxter, et al. - 2000
67 A simple, fast, and effective rule learner – Cohen, Singer - 1999
66 The alternating decision tree learning algorithm – Freund, Mason
65 Robust trainability of single neurons – Hoffgen, Horn, et al. - 1995
53 Arcing the edge – Breiman - 1997
47 Schapire and Yoram Singer. Improved boosting algorithms using confidence-rated predictions – Robert - 1999
41 Boosting as Entropy Projection – Kivinen, Warmuth - 1999
40 On-line learning of linear functions – Littlestone, Long, et al. - 1988
35 Sanov property, generalized i-projection and a conditional limit theorem. The Annals of Probability – Csiszar - 1984
32 Additive models, boosting, and inference for generalized divergences – Lafferty - 1999
31 An iterative row-action method for interval convex programming – Censor, Lent - 1981
24 Prediction games and arcing classifiers – Breiman - 1997
18 Potential boosters – Duffy, Helmbold - 1999
18 Duality and auxiliary functions for bregman distances – Pietra, Pietera - 2001
14 Statistical learning algorithms based on Bregman distances – Lafferty, Pietra, et al. - 1997
12 Körner: “Information Theory – Csiszár, J - 1981
11 Scaling up a boosting-based learner via adaptive sampling – Domingo, Watanabe - 2000
9 Information theoretical optimization techniques – Topsøe - 1979
8 Bounds on approximate steepest descent for likelihood maximization in exponential families – Cesa-Bianchi, Krogh, et al. - 1994
5 From computational learning theory to discovery science – Watanabe - 1999
3 Robust trainability of single neurons – offgen, Simon - 1992
2 Why least squares and maximum entropy? An axiomatic approach to inference for linear inverse problems. The Annals of Statistics – ar, I - 1991
2 I-divergence geometry of probability distributions and minimization problems. The Annals of Probability – ar - 1975
1 Generalized projections for non-negative functions – ar, I - 1995
1 Sanov property, generalized I-projection and a conditional limit theorem – ar - 1984