## On-line algorithms in machine learning (1998)

### Cached

### Download Links

Venue: | IN FIAT, AND WOEGINGER., EDS., ONLINE ALGORITHMS: THE STATE OF THE ART |

Citations: | 61 - 2 self |

### BibTeX

@INPROCEEDINGS{Blum98on-linealgorithms,

author = {Avrim Blum},

title = {On-line algorithms in machine learning},

booktitle = {IN FIAT, AND WOEGINGER., EDS., ONLINE ALGORITHMS: THE STATE OF THE ART},

year = {1998},

publisher = {}

}

### Years of Citing Articles

### OpenURL

### Abstract

The areas of On-Line Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computational Learning Theory that fit nicely into the "on-line algorithms" framework. This survey article discusses some of the results, models, and open problems from Computational Learning Theory that seem particularly interesting from the point of view of on-line algorithms. The emphasis in this article is on describing some of the simpler, more intuitive results, whose proofs can be given in their entirity. Pointers to the literature are given for more sophisticated versions of these algorithms.

### Citations

1761 | A theory of learnable
- Valiant
- 1984
(Show Context)
Citation Context ...The Mistake Bound model is equivalent to the "extended equivalence query" model of Angluin [1], and is known to be strictly harder for polynomialtime algorithms than the PAC learning model o=-=f Valiant [34, 22]-=- in which (among other differences) the adversary is required to select examples from a fixed distribution [6]. Agnostic learning is disussed in [23]. Littlestone [26] gives a variety of results on th... |

715 | The weighted majority algorithm
- Littlestone, Warmuth
- 1994
(Show Context)
Citation Context ... by using M = P t i=1 F i . ut 2.3 History and Extensions Within the Computational Learning Theory community, the problem of predicting from expert advice was first studied by Littlestone and Warmuth =-=[28]-=-, DeSantis, Markowsky and Wegman [15], and Vovk [35]. The algorithms described above as well as Theorems 1 and 2 are from Littlestone and Warmuth [28], and Corollary 3, as well as a number of refineme... |

696 | Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm’. Machine Learning 2, 285{318. techreport.tex; 7/07/2003; 15:27; p.21 Noviko, A.: 1963, ‘On convergence proofs for perceptrons
- Littlestone
- 1988
(Show Context)
Citation Context ... reverse direction, one can fix values of some of the variables and still have a legal concept. See [9] for details. 3.5 History The Winnow algorithm was developed by Littlestone in his seminal paper =-=[24], which al-=-so gives a variety of extensions and introduces the Mistake-Bound learning model. The Mistake Bound model is equivalent to the "extended equivalence query" model of Angluin [1], and is known... |

683 |
Queries and concept learning
- Angluin
- 1988
(Show Context)
Citation Context ...eminal paper [24], which also gives a variety of extensions and introduces the Mistake-Bound learning model. The Mistake Bound model is equivalent to the "extended equivalence query" model o=-=f Angluin [1]-=-, and is known to be strictly harder for polynomialtime algorithms than the PAC learning model of Valiant [34, 22] in which (among other differences) the adversary is required to select examples from ... |

395 | Learning decision lists
- Rivest
- 1987
(Show Context)
Citation Context ...h as predicting links followed by users on the Web [2], and a calendar scheduling application [7]. The algorithm presented for learning decision lists is based on Rivest's algorithm for the PAC model =-=[31]-=-, adapted to the Mistake Bound model by Littlestone [25] and Helmbold, Sloan and Warmuth [20]. The Infinite-Attribute model is defined in Blum [5] and Theorem 9 is from Blum, Hellerstein, and Littlest... |

328 | How to use expert advice
- Cesa-Bianchi, Freund, et al.
- 1997
(Show Context)
Citation Context ...15], and Vovk [35]. The algorithms described above as well as Theorems 1 and 2 are from Littlestone and Warmuth [28], and Corollary 3, as well as a number of refinements, are from Cesa-Bianchi et al. =-=[12]-=-. Perhaps one of the key lessons of this work in comparison to work of a more statistical nature is that one can remove all statistical assumptions about the data and still achieve extremely tight bou... |

327 | Webwatcher: A learning apprentice for the world wide web
- Armstrong, Freitag, et al.
- 1995
(Show Context)
Citation Context ...stronger result than Theorem 8, in the style of Theorem 7. The Winnow algorithm has been shown to be quite successful in practical tasks as well, such as predicting links followed by users on the Web =-=[2]-=-, and a calendar scheduling application [7]. The algorithm presented for learning decision lists is based on Rivest's algorithm for the PAC model [31], adapted to the Mistake Bound model by Littleston... |

303 | Efficient Noise-Tolerant Learning from Statistical Queries
- Kearns
- 1993
(Show Context)
Citation Context ...nomial p after seeing polynomially many examples. Does this imply that there must exist a polynomial time algorithm B that succeeds in the same sense for all constant noise rates j ! 1=2. (See Kearns =-=[21]-=- for related issues.) 7. What Competitive Ratio can be achieved for learning with respect to the best Disjunction? Is there a polynomial time algorithm that given any sequence of examples over f0; 1g ... |

269 |
Aggregating strategies
- Vovk
- 1990
(Show Context)
Citation Context ...sions Within the Computational Learning Theory community, the problem of predicting from expert advice was first studied by Littlestone and Warmuth [28], DeSantis, Markowsky and Wegman [15], and Vovk =-=[35]-=-. The algorithms described above as well as Theorems 1 and 2 are from Littlestone and Warmuth [28], and Corollary 3, as well as a number of refinements, are from Cesa-Bianchi et al. [12]. Perhaps one ... |

201 | Towards efficient agnostic learning
- Kearns, Schapire, et al.
- 1992
(Show Context)
Citation Context ...orithms than the PAC learning model of Valiant [34, 22] in which (among other differences) the adversary is required to select examples from a fixed distribution [6]. Agnostic learning is disussed in =-=[23]-=-. Littlestone [26] gives a variety of results on the behavior of Winnow in the presence of various kinds of noise. The improved bounds of Theorem 7 are from Auer and Warmuth [3]. The use of Winnow for... |

168 | Learning Boolean formulas
- Kearns, Li, et al.
- 1994
(Show Context)
Citation Context ...The Mistake Bound model is equivalent to the "extended equivalence query" model of Angluin [1], and is known to be strictly harder for polynomialtime algorithms than the PAC learning model o=-=f Valiant [34, 22]-=- in which (among other differences) the adversary is required to select examples from a fixed distribution [6]. Agnostic learning is disussed in [23]. Littlestone [26] gives a variety of results on th... |

166 |
An analog of the minimax theorem for vector payoffs
- Blackwell
- 1956
(Show Context)
Citation Context ...bounds (see Freund [18]). This problem and many variations and extensions have been addressed in a number of different communities, under names such as the "sequential compound decision problem&q=-=uot; [32] [4], "universal prediction" [16], "uni-=-versal coding" [33], "universal portfolios" [13], and "prediction of individual sequences"; the notion of the competitiveness is also called the "min-max regret" of ... |

165 | Universal portfolios
- Cover
- 1991
(Show Context)
Citation Context ...n addressed in a number of different communities, under names such as the "sequential compound decision problem" [32] [4], "universal prediction" [16], "universal coding"=-= [33], "universal portfolios" [13], and "predicti-=-on of individual sequences"; the notion of the competitiveness is also called the "min-max regret" of an algorithm. A web page uniting some of these communities and with a discussion of... |

163 | Universal prediction of individual sequences
- Feder, Merhav, et al.
- 1992
(Show Context)
Citation Context ...is problem and many variations and extensions have been addressed in a number of different communities, under names such as the "sequential compound decision problem" [32] [4], "univers=-=al prediction" [16], "universal coding" [33], &qu-=-ot;universal portfolios" [13], and "prediction of individual sequences"; the notion of the competitiveness is also called the "min-max regret" of an algorithm. A web page unit... |

140 | Game theory, on-line prediction and boosting
- Freund, Schapire
- 1996
(Show Context)
Citation Context ...abilistically select an expert to use and so forth. Freund and Schapire show that extensions of the randomized Weighted Majority Algorithm discussed above can be made to fit nicely into this scenario =-=[19]-=- (see also the classic work of Blackwell [4]). Another scenario fitting this framework would be a case where each expert is a page-replacement algorithm, and an operating system needs to decide which ... |

130 | Empirical support for winnow and weighted-majority algorithms: Results on a calendar scheduling domain
- Blum
- 1997
(Show Context)
Citation Context ...e of Theorem 7. The Winnow algorithm has been shown to be quite successful in practical tasks as well, such as predicting links followed by users on the Web [2], and a calendar scheduling application =-=[7]-=-. The algorithm presented for learning decision lists is based on Rivest's algorithm for the PAC model [31], adapted to the Mistake Bound model by Littlestone [25] and Helmbold, Sloan and Warmuth [20]... |

112 | A game of prediction with expert advice
- Vovk
- 1995
(Show Context)
Citation Context ...ss functions appropriate to different settings are the absolute loss: jp \Gamma xj, the square loss: (p \Gamma x) 2 , and the log loss: \Gammax ln p \Gamma (1 \Gamma x) ln(1 \Gamma p). Papers of Vovk =-=[35, 36]-=-, Cesa-Bianchi et al. [12, 11], and Foster and Vohra [17] describe optimal algorithms both for these specific loss functions and for a wide variety of general loss functions. A second extension of thi... |

108 |
Redundant noisy attributes, attribute errors, and linear-threshold learning using Winnow
- Littlestone
- 1991
(Show Context)
Citation Context ...AC learning model of Valiant [34, 22] in which (among other differences) the adversary is required to select examples from a fixed distribution [6]. Agnostic learning is disussed in [23]. Littlestone =-=[26]-=- gives a variety of results on the behavior of Winnow in the presence of various kinds of noise. The improved bounds of Theorem 7 are from Auer and Warmuth [3]. The use of Winnow for learning changing... |

95 | Universal portfolios with side information
- Cover, Ordentlich
- 1996
(Show Context)
Citation Context ...hm to use. Periodically the operating system computes losses for the various algorithms that it could have used and based on this information decides which algorithm to use next. Ordentlich and Cover =-=[14]-=- [30] describe strategies related to the randomized Weighted Majority algorithm for a problem of on-line portfolio selection. They give an on-line algorithm that is optimally competitive against the b... |

78 | Tracking the best disjunction
- Auer, Warmuth
- 1998
(Show Context)
Citation Context ...arning is disussed in [23]. Littlestone [26] gives a variety of results on the behavior of Winnow in the presence of various kinds of noise. The improved bounds of Theorem 7 are from Auer and Warmuth =-=[3]-=-. The use of Winnow for learning changing concepts is folklore (and makes a good homework problem); Auer and Warmuth [3] provide a more sophisticated algorithm and analysis, achieving a stronger resul... |

76 |
Learning boolean functions in an infinite attribute space
- Blum
- 1992
(Show Context)
Citation Context ...sts is based on Rivest's algorithm for the PAC model [31], adapted to the Mistake Bound model by Littlestone [25] and Helmbold, Sloan and Warmuth [20]. The Infinite-Attribute model is defined in Blum =-=[5]-=- and Theorem 9 is from Blum, Hellerstein, and Littlestone [9]. 4 Open Problems 1. Can the bounds of Corollary 3 be achieved and improved with a smooth algorithm? The bound of Corollary 3 is achieved u... |

61 | Universal portfolios with and without transaction costs
- Blum, Kalai
- 1999
(Show Context)
Citation Context ...he algorithm itself just initially divides its funds equally among all infinitely-many CRPs and then lets it sit. A simple analysis of their algorithm with extensions to transaction costs is given in =-=[10]. 3 On-Lin-=-e Learning from Examples The previous section considered the problem of "learning from expert advice". We now broaden our focus to consider the more general scenario of on-line learning from... |

53 | Learning in the presence of finitely or infinitely many irrelevant attributes
- Blum, Hellerstein, et al.
- 1991
(Show Context)
Citation Context ...ables and embed it into a space with n2 ? n1 variables and still stay within the class C, and in the reverse direction, one can fix values of some of the variables and still have a legal concept. See =-=[9]-=- for details. 3.5 History The Winnow algorithm was developed by Littlestone in his seminal paper [24], which also gives a variety of extensions and introduces the Mistake-Bound learning model. The Mis... |

50 | Online prediction and conversion strategies
- Cesa-Bianchi, Freund, et al.
- 1996
(Show Context)
Citation Context ...fferent settings are the absolute loss: jp \Gamma xj, the square loss: (p \Gamma x) 2 , and the log loss: \Gammax ln p \Gamma (1 \Gamma x) ln(1 \Gamma p). Papers of Vovk [35, 36], Cesa-Bianchi et al. =-=[12, 11]-=-, and Foster and Vohra [17] describe optimal algorithms both for these specific loss functions and for a wide variety of general loss functions. A second extension of this framework is to broaden the ... |

44 | Predicting a binary sequence almost as well as the optimal biased coin
- Freund
- 1996
(Show Context)
Citation Context ... the key lessons of this work in comparison to work of a more statistical nature is that one can remove all statistical assumptions about the data and still achieve extremely tight bounds (see Freund =-=[18]). This problem-=- and many variations and extensions have been addressed in a number of different communities, under names such as the "sequential compound decision problem" [32] [4], "universal predict... |

44 | On-line learning of linear functions
- Littlestone, Long, et al.
- 1991
(Show Context)
Citation Context ... variety of general loss functions. A second extension of this framework is to broaden the class of algorithms against which the algorithm is competitive. For instance, Littlestone, Long, and Warmuth =-=[27]-=- show that modifications of the algorithms described above are constant-competitive with respect to the best linear combination of experts, when the squared loss measure is used. Merhav and Feder [29]... |

42 |
Asymptotically subminimax solutions of compound statistical decision problems
- Robbins
- 1951
(Show Context)
Citation Context ...ight bounds (see Freund [18]). This problem and many variations and extensions have been addressed in a number of different communities, under names such as the "sequential compound decision prob=-=lem" [32] [4], "universal prediction" [16], "-=-;universal coding" [33], "universal portfolios" [13], and "prediction of individual sequences"; the notion of the competitiveness is also called the "min-max regret"... |

36 | On-line learning and the metrical task system problem
- Blum, Burch
- 2000
(Show Context)
Citation Context ...al sorts of learning problems, it seems inevitable that the notion of state will begin to play a larger role, and ideas from On-Line Algorithms will be crucial. Some work in this direction appears in =-=[8]-=-. Limiting the power of the adversary. In the On-Line Algorithms literature, it is usually assumed that the adversary has unlimited power to choose a worst-case sequence for the algorithm. In the mach... |

36 |
Learning probabilistic prediction functions
- DeSantis, Markowsky, et al.
- 1988
(Show Context)
Citation Context ...story and Extensions Within the Computational Learning Theory community, the problem of predicting from expert advice was first studied by Littlestone and Warmuth [28], DeSantis, Markowsky and Wegman =-=[15]-=-, and Vovk [35]. The algorithms described above as well as Theorems 1 and 2 are from Littlestone and Warmuth [28], and Corollary 3, as well as a number of refinements, are from Cesa-Bianchi et al. [12... |

34 |
A randomization rule for selecting forecasts
- Foster, Vohra
- 1993
(Show Context)
Citation Context ...te loss: jp \Gamma xj, the square loss: (p \Gamma x) 2 , and the log loss: \Gammax ln p \Gamma (1 \Gamma x) ln(1 \Gamma p). Papers of Vovk [35, 36], Cesa-Bianchi et al. [12, 11], and Foster and Vohra =-=[17]-=- describe optimal algorithms both for these specific loss functions and for a wide variety of general loss functions. A second extension of this framework is to broaden the class of algorithms against... |

24 |
Portfolio Selection
- M
- 1959
(Show Context)
Citation Context ... use. Periodically the operating system computes losses for the various algorithms that it could have used and based on this information decides which algorithm to use next. Ordentlich and Cover [14] =-=[30]-=- describe strategies related to the randomized Weighted Majority algorithm for a problem of on-line portfolio selection. They give an on-line algorithm that is optimally competitive against the best \... |

21 | Separating distribution-free and mistake-bound learning models over the boolean domain
- Blum
- 1994
(Show Context)
Citation Context ...strictly harder for polynomialtime algorithms than the PAC learning model of Valiant [34, 22] in which (among other differences) the adversary is required to select examples from a fixed distribution =-=[6]-=-. Agnostic learning is disussed in [23]. Littlestone [26] gives a variety of results on the behavior of Winnow in the presence of various kinds of noise. The improved bounds of Theorem 7 are from Auer... |

17 |
Toward ecient agnostic learning
- Kearns, Schapire, et al.
- 1994
(Show Context)
Citation Context ...algorithms than the PAC learning model of Valiant [34, 22] in which (among other dierences) the adversary is required to select examples from asxed distribution [6]. Agnostic learning is disussed in =-=[23]-=-. Littlestone [26] gives a variety of results on the behavior of Winnow in the presence of various kinds of noise. The improved bounds of Theorem 7 are from Auer and Warmuth [3]. The use of Winnow for... |

14 |
Universal sequential learning and decisions from individual data sequences
- Merhav, Feder
- 1992
(Show Context)
Citation Context ...[27] show that modifications of the algorithms described above are constant-competitive with respect to the best linear combination of experts, when the squared loss measure is used. Merhav and Feder =-=[29]-=- show that one can be competitive with respect to the best off-line strategy that can be implemented by a finite state machine. Another variation on this problem is to remove all semantics associated ... |

14 |
Learning Boolean Functions in an In Attribute Space
- Blum
- 1992
(Show Context)
Citation Context ...lists is based on Rivest's algorithm for the PAC model [31], adapted to the Mistake Bound model by Littlestone [25] and Helmbold, Sloan and Warmuth [20]. The Innite-Attribute model is dened in Blum =-=[5]-=- and Theorem 9 is from Blum, Hellerstein, and Littlestone [9]. 4 Open Problems 1. Can the bounds of Corollary 3 be achieved and improved with a smooth algorithm? The bound of Corollary 3 is achieved u... |

12 |
Toward e cient agnostic learning
- Kearns, Schapire, et al.
- 1994
(Show Context)
Citation Context ...ime algorithms than the PAC learning model of Valiant [34,22]inwhich(among other di erences) the adversary is required to select examples from a xed distribution [6]. Agnostic learning is disussed in =-=[23]-=-. Littlestone [26] gives a variety of results on the behavior of Winnow inthe presence of various kinds of noise. The improved bounds of Theorem 7 are from Auer and Warmuth [3]. The use of Winnow for ... |

11 |
On-line portfolio selection
- Ordentlich, Cover
- 1996
(Show Context)
Citation Context ... use. Periodically the operating system computes losses for the various algorithms that it could have used and based on this information decides which algorithm to use next. Ordentlich and Cover [14] =-=[30] desc-=-ribe strategies related to the randomized Weighted Majority algorithm for a problem of on-line portfolio selection. They give an on-line algorithm that is optimally competitive against the best "... |

10 |
Learning Nested Differences of Intersection Closed Concept Classes
- Haussler, Sloan, et al.
- 1989
(Show Context)
Citation Context ... [7]. The algorithm presented for learning decision lists is based on Rivest's algorithm for the PAC model [31], adapted to the Mistake Bound model by Littlestone [25] and Helmbold, Sloan and Warmuth =-=[20]-=-. The Infinite-Attribute model is defined in Blum [5] and Theorem 9 is from Blum, Hellerstein, and Littlestone [9]. 4 Open Problems 1. Can the bounds of Corollary 3 be achieved and improved with a smo... |

7 |
Learning boolean functions in an in nite attribute space
- Blum
- 1992
(Show Context)
Citation Context ...lists is based on Rivest's algorithm for the PAC model [31], adapted to the Mistake Bound model by Littlestone [25] and Helmbold, Sloan and Warmuth [20]. The In nite-Attribute model is de ned in Blum =-=[5]-=- and Theorem 9 is from Blum, Hellerstein, and Littlestone [9]. 4 Open Problems 1. Can the bounds of Corollary 3 be achieved and improved with a smooth algorithm? The bound of Corollary 3 is achieved u... |

5 |
Universal sequential coding of single measures
- Shtarkov
- 1987
(Show Context)
Citation Context ...tions and extensions have been addressed in a number of different communities, under names such as the "sequential compound decision problem" [32] [4], "universal prediction" [16],=-= "universal coding" [33], "universal portfolios&q-=-uot; [13], and "prediction of individual sequences"; the notion of the competitiveness is also called the "min-max regret" of an algorithm. A web page uniting some of these communi... |

4 |
Learning in the presence of nitely or in nitely many irrelevant attributes
- Blum, Hellerstein, et al.
- 1995
(Show Context)
Citation Context ...ariables and embed it into a space with n2 >n1 variables and still stay within the class C, and in the reverse direction, one can x values of some of the variables and still have a legal concept. See =-=[9]-=- for details.3.5 History The Winnow algorithm was developed by Littlestone in his seminal paper [24], which also givesavariety of extensions and introduces the Mistake-Bound learning model. The Mista... |

4 |
cient noise-tolerant learning from statistical queries
- unknown authors
- 1993
(Show Context)
Citation Context ...olynomial p after seeing polynomially many examples. Does this imply that there must exist a polynomial time algorithm B that succeeds in the same sense for all constant noise rates <1=2. (See Kearns =-=[21]-=- for related issues.) 7. What Competitive Ratio can be achieved for learning with respect to the best Disjunction? Is there a polynomial time algorithm that given any sequence of examples over f0� 1g ... |

4 |
Learning in the presence of or in many irrelevant attributes
- Blum, Hellerstein, et al.
- 1995
(Show Context)
Citation Context ...ables and embed it into a space with n 2 > n 1 variables and still stay within the class C, and in the reverse direction, one cansx values of some of the variables and still have a legal concept. See =-=[9]-=- for details. Chapter 14 in "Online Algorithms: the state of the art", Fiat and Woeginger eds., LNCS #1442, 1998. 3.5 History The Winnow algorithm was developed by Littlestone in his seminal paper [24... |

2 |
Learning nested dierences of intersection closed concept classes
- Helmbold, Sloan, et al.
- 1990
(Show Context)
Citation Context ... [7]. The algorithm presented for learning decision lists is based on Rivest's algorithm for the PAC model [31], adapted to the Mistake Bound model by Littlestone [25] and Helmbold, Sloan and Warmuth =-=[20]-=-. The In nite-Attribute model is de ned in Blum [5] and Theorem 9 is from Blum, Hellerstein, and Littlestone [9]. 4 Open Problems 1. Can the bounds of Corollary 3 be achieved and improved with a smoot... |

1 |
personal communication (a mistake-bound version of Rivest's decision-list algorithm
- Littlestone
- 1989
(Show Context)
Citation Context ...and a calendar scheduling application [7]. The algorithm presented for learning decision lists is based on Rivest's algorithm for the PAC model [31], adapted to the Mistake Bound model by Littlestone =-=[25]-=- and Helmbold, Sloan and Warmuth [20]. The Infinite-Attribute model is defined in Blum [5] and Theorem 9 is from Blum, Hellerstein, and Littlestone [9]. 4 Open Problems 1. Can the bounds of Corollary ... |