Results 1  10
of
95
A DecisionTheoretic Generalization of onLine Learning and an Application to Boosting
, 1997
"... In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worstcase online framework. The model we study can be interpreted as a broad, abstract extension of the wellstudied online prediction model to a general decisiontheoretic set ..."
Abstract

Cited by 2307 (59 self)
 Add to MetaCart
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worstcase online framework. The model we study can be interpreted as a broad, abstract extension of the wellstudied online prediction model to a general decisiontheoretic setting. We show that the multiplicative weightupdate rule of Littlestone and Warmuth [20] can be adapted to this model yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multipleoutcome prediction, repeated games and prediction of points in R n . In the second part of the paper we apply the multiplicative weightupdate technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of...
Boosting the margin: A new explanation for the effectiveness of voting methods
 In Proceedings International Conference on Machine Learning
, 1997
"... Abstract. One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show ..."
Abstract

Cited by 721 (52 self)
 Add to MetaCart
Abstract. One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between the number of correct votes and the maximum number of votes received by any incorrect label. We show that techniques used in the analysis of Vapnik’s support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. We also show theoretically and experimentally that boosting is especially effective at increasing the margins of the training examples. Finally, we compare our explanation to those based on the biasvariance decomposition. 1
An Efficient Boosting Algorithm for Combining Preferences
, 1999
"... The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting ..."
Abstract

Cited by 515 (18 self)
 Add to MetaCart
The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting algorithm for combining preferences called RankBoost. We also describe an efficient implementation of the algorithm for certain natural cases. We discuss two experiments we carried out to assess the performance of RankBoost. In the first experiment, we used the algorithm to combine different WWW search strategies, each of which is a query expansion for a given domain. For this task, we compare the performance of RankBoost to the individual search strategies. The second experiment is a collaborativefiltering task for making movie recommendations. Here, we present results comparing RankBoost to nearestneighbor and regression algorithms.
Soft Margins for AdaBoost
, 1998
"... Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overfitting. This paper shows that although AdaBoost rarely overfits in the low noise regime it clearly does so for higher noise levels. Central for understanding this ..."
Abstract

Cited by 256 (22 self)
 Add to MetaCart
Recently ensemble methods like AdaBoost were successfully applied to character recognition tasks, seemingly defying the problems of overfitting. This paper shows that although AdaBoost rarely overfits in the low noise regime it clearly does so for higher noise levels. Central for understanding this fact is the margin distribution and we find that AdaBoost achieves  doing gradient descent in an error function with respect to the margin  asymptotically a hard margin distribution, i.e. the algorithm concentrates its resources on a few hardtolearn patterns (here an interesting overlap emerge to Support Vectors). This is clearly a suboptimal strategy in the noisy case, and regularization, i.e. a mistrust in the data, must be introduced in the algorithm to alleviate the distortions that a difficult pattern (e.g. outliers) can cause to the margin distribution. We propose several regularization methods and generalizations of the original AdaBoost algorithm to achieve a soft margin  a ...
Universal prediction
 IEEE Transactions on Information Theory
, 1998
"... Abstract — This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both th ..."
Abstract

Cited by 136 (11 self)
 Add to MetaCart
Abstract — This paper consists of an overview on universal prediction from an informationtheoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. Both the probabilistic setting and the deterministic setting of the universal prediction problem are described with emphasis on the analogy and the differences between results in the two settings. Index Terms — Bayes envelope, entropy, finitestate machine, linear prediction, loss function, probability assignment, redundancycapacity, stochastic complexity, universal coding, universal prediction. I.
SCHAPIRE: Adaptive game playing using multiplicative weights
 Games and Economic Behavior
, 1999
"... We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the mult ..."
Abstract

Cited by 134 (14 self)
 Add to MetaCart
We present a simple algorithm for playing a repeated game. We show that a player using this algorithm suffers average loss that is guaranteed to come close to the minimum loss achievable by any fixed strategy. Our bounds are nonasymptotic and hold for any opponent. The algorithm, which uses the multiplicativeweight methods of Littlestone and Warmuth, is analyzed using the Kullback–Liebler divergence. This analysis yields a new, simple proof of the min–max theorem, as well as a provable method of approximately solving a game. A variant of our gameplaying algorithm is proved to be optimal in a very strong sense. Journal of Economic Literature
An introduction to boosting and leveraging
 Advanced Lectures on Machine Learning, LNCS
, 2003
"... ..."
Fast Binary Feature Selection with Conditional Mutual Information
 Journal of Machine Learning Research
, 2004
"... We propose in this paper a very fast feature selection technique based on conditional mutual information. ..."
Abstract

Cited by 101 (1 self)
 Add to MetaCart
We propose in this paper a very fast feature selection technique based on conditional mutual information.
Adwords and generalized online matching
 In FOCS ’05: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
, 2005
"... How does a search engine company decide what ads to display with each query so as to maximize its revenue? This turns out to be a generalization of the online bipartite matching problem. We introduce the notion of a tradeoff revealing LP and use it to derive two optimal algorithms achieving competit ..."
Abstract

Cited by 99 (5 self)
 Add to MetaCart
How does a search engine company decide what ads to display with each query so as to maximize its revenue? This turns out to be a generalization of the online bipartite matching problem. We introduce the notion of a tradeoff revealing LP and use it to derive two optimal algorithms achieving competitive ratios of 1 − 1/e for this problem. 1