Results 1 - 10
of
67
A Decision-Theoretic Generalization of on-Line Learning and an Application to Boosting
, 1997
"... In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic set ..."
Abstract
-
Cited by 1714 (53 self)
- Add to MetaCart
In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weightupdate rule of Littlestone and Warmuth [20] can be adapted to this model yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multiple-outcome prediction, repeated games and prediction of points in R n . In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of...
An Efficient Boosting Algorithm for Combining Preferences
, 1999
"... The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting ..."
Abstract
-
Cited by 383 (13 self)
- Add to MetaCart
The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting algorithm for combining preferences called RankBoost. We also describe an efficient implementation of the algorithm for certain natural cases. We discuss two experiments we carried out to assess the performance of RankBoost. In the first experiment, we used the algorithm to combine different WWW search strategies, each of which is a query expansion for a given domain. For this task, we compare the performance of RankBoost to the individual search strategies. The second experiment is a collaborative-filtering task for making movie recommendations. Here, we present results comparing RankBoost to nearest-neighbor and regression algorithms.
Tracking the best expert
- In Proceedings of the 12th International Conference on Machine Learning
, 1995
"... Abstract. We generalize the recent relative loss bounds for on-line algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound th ..."
Abstract
-
Cited by 157 (17 self)
- Add to MetaCart
Abstract. We generalize the recent relative loss bounds for on-line algorithms where the additional loss of the algorithm on the whole sequence of examples over the loss of the best expert is bounded. The generalization allows the sequence to be partitioned into segments, and the goal is to bound the additional loss of the algorithm over the sum of the losses of the best experts for each segment. This is to model situations in which the examples change and different experts are best for certain segments of the sequence of examples. In the single segment case, the additional loss is proportional to log n, where n is the number of experts and the constant of proportionality depends on the loss function. Our algorithms do not produce the best partition; however the loss bound shows that our predictions are close to those of the best partition. When the number of segments is k +1and the sequence is of length ℓ, we can bound the additional loss of our algorithm over the best partition by O(k log n + k log(ℓ/k)). For the case when the loss per trial is bounded by one, we obtain an algorithm whose additional loss over the loss of the best partition is independent of the length of the sequence. The additional loss becomes O(k log n + k log(L/k)), where L is the loss of the best partition with k +1segments. Our algorithms for tracking the predictions of the best expert are simple adaptations of Vovk’s original algorithm for the single best expert case. As in the original algorithms, we keep one weight per expert, and spend O(1) time per weight in each trial.
Adaptive Game Playing Using Multiplicative Weights
"... this paper, we present a simple algorithm for solving this problem, and give a simple analysis of the algorithm. The bounds we obtain are not asymptotic and hold for any finite number of rounds. The algorithm and its analysis are based directly on the "on-line prediction" methods of Littlestone and ..."
Abstract
-
Cited by 106 (14 self)
- Add to MetaCart
this paper, we present a simple algorithm for solving this problem, and give a simple analysis of the algorithm. The bounds we obtain are not asymptotic and hold for any finite number of rounds. The algorithm and its analysis are based directly on the "on-line prediction" methods of Littlestone and Warmuth [24]. The analysis of this algorithm yields a new (as far as we know) and simple proof of von Neumann's minmax theorem, as well as a provable method of approximately solving a game. We also give more refined variants of the algorithm for this purpose, and we show that one of these is optimal in a very strong sense. The paper is organized as follows. In Section 2 we define the mathematical setup and notation. In Section 3 we introduce the basic multiplicative weights algorithm whose average performance is guaranteed to be almost as good as that of the best fixed mixed strategy. In Section 4 we outline the relationship between our work and some of the extensive existing work on the use of multiplicative weights algorithms for on-line prediction. In Section 5 we show how the algorithm can be used to give a simple proof of Von-Neumann's min-max theorem. In Section 6 we give a version of the algorithm whose distributions are guaranteed to converge to an optimal mixed strategy. We note the possible application of this algorithm to solving linear programming problems and reference other work that have used multiplicative weights to this end. Finally, in Section 7 we show that the convergence rate of the second version of the algorithm is asymptotically optimal. 2 Playing repeated games
Universal Prediction
- IEEE Transactions on Information Theory
, 1998
"... This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression. ..."
Abstract
-
Cited by 99 (6 self)
- Add to MetaCart
This paper consists of an overview on universal prediction from an information-theoretic perspective. Special attention is given to the notion of probability assignment under the selfinformation loss function, which is directly related to the theory of universal data compression.
Using and Combining Predictors That Specialize
, 1997
"... . We study online learning algorithms that predict by combining the predictions of several subordinate prediction algorithms, sometimes called "experts." These simple algorithms belong to the multiplicative weights family of algorithms. The performance of these algorithms degrades only logarithmical ..."
Abstract
-
Cited by 76 (11 self)
- Add to MetaCart
. We study online learning algorithms that predict by combining the predictions of several subordinate prediction algorithms, sometimes called "experts." These simple algorithms belong to the multiplicative weights family of algorithms. The performance of these algorithms degrades only logarithmically with the number of experts, making them particularly useful in applications where the number of experts is very large. However, in applications such as text categorization, it is often natural for some of the experts to abstain from making predictions on some of the instances. We show how to transform algorithms that assume that all experts are always awake to algorithms that do not require this assumption. We also show how to derive corresponding loss bounds. Our method is very general, and can be applied to a large family of online learning algorithms. We also give applications to various prediction models including decision graphs and "switching" experts. 1 Introduction We study onlin...
Sequential Prediction of Individual Sequences Under General Loss Functions
- IEEE Transactions on Information Theory
, 1998
"... We consider adaptive sequential prediction of arbitrary binary sequences when the performance is evaluated using a general loss function. The goal is to predict on each individual sequence nearly as well as the best prediction strategy in a given comparison class of (possibly adaptive) prediction st ..."
Abstract
-
Cited by 58 (7 self)
- Add to MetaCart
We consider adaptive sequential prediction of arbitrary binary sequences when the performance is evaluated using a general loss function. The goal is to predict on each individual sequence nearly as well as the best prediction strategy in a given comparison class of (possibly adaptive) prediction strategies, called experts. By using a general loss function, we generalize previous work on universal prediction, forecasting, and data compression. However, here we restrict ourselves to the case when the comparison class is finite. For a given sequence, we define the regret as the total loss on the entire sequence suffered by the adaptive sequential predictor, minus the total loss suffered by the predictor in the comparison class that performs best on that particular sequence. We show that for a large class of loss functions, the minimax regret is either \Theta(log N) or \Omega\Gamma p ` log N ), depending on the loss function, where N is the number of predictors in the comparison class a...
Adaptive and Self-Confident On-Line Learning Algorithms
, 2000
"... We study on-line learning in the linear regression framework. Most of the performance bounds for on-line algorithms in this framework assume a constant learning rate. To achieve these bounds the learning rate must be optimized based on a posteriori information. This information depends on the wh ..."
Abstract
-
Cited by 50 (4 self)
- Add to MetaCart
We study on-line learning in the linear regression framework. Most of the performance bounds for on-line algorithms in this framework assume a constant learning rate. To achieve these bounds the learning rate must be optimized based on a posteriori information. This information depends on the whole sequence of examples and thus it is not available to any strictly on-line algorithm. We introduce new techniques for adaptively tuning the learning rate as the data sequence is progressively revealed. Our techniques allow us to prove essentially the same bounds as if we knew the optimal learning rate in advance. Moreover, such techniques apply to a wide class of on-line algorithms, including p-norm algorithms for generalized linear regression and Weighted Majority for linear regression with absolute loss. Our adaptive tunings are radically dierent from previous techniques, such as the so-called doubling trick. Whereas the doubling trick restarts the on-line algorithm several ti...
Tracking a Small Set of Experts by Mixing Past Posteriors
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2002
"... In this paper, we examine on-line learning problems in which the target concept is allowed to change over time. In each trial a master algorithm receives predictions from a large set of n experts. Its goal is to predict almost as well as the best sequence of such experts chosen off-line by partit ..."
Abstract
-
Cited by 49 (8 self)
- Add to MetaCart
In this paper, we examine on-line learning problems in which the target concept is allowed to change over time. In each trial a master algorithm receives predictions from a large set of n experts. Its goal is to predict almost as well as the best sequence of such experts chosen off-line by partitioning the training sequence into k + 1 sections and then choosing the best expert for each section. We build on methods developed by Herbster and Warmuth and consider an open problem posed by Freund where the experts in the best partition are from a small pool of size m. Since k >> m, the best expert shifts back and forth between the experts of the small pool. We propose algorithms that solve this open problem by mixing the past posteriors maintained by the master algorithm. We relate the number of bits needed for encoding the best partition to the loss bounds of the algorithms. Instead of paying log n for choosing the best expert in each section we first pay log bits in the bounds for identifying the pool of m experts and then log m bits per new section. In the bounds we also pay twice for encoding the boundaries of the sections.
On-line algorithms in machine learning
- IN FIAT, AND WOEGINGER., EDS., ONLINE ALGORITHMS: THE STATE OF THE ART
, 1998
"... The areas of On-Line Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computation ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
The areas of On-Line Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computational Learning Theory that fit nicely into the "on-line algorithms" framework. This survey article discusses some of the results, models, and open problems from Computational Learning Theory that seem particularly interesting from the point of view of on-line algorithms. The emphasis in this article is on describing some of the simpler, more intuitive results, whose proofs can be given in their entirity. Pointers to the literature are given for more sophisticated versions of these algorithms.

