## Regret minimization under partial monitoring (2004)

### Cached

### Download Links

Venue: | MATHEMATICS OF OPERATIONS RESEARCH |

Citations: | 34 - 7 self |

### BibTeX

@ARTICLE{Cesa-Bianchi04regretminimization,

author = {Nicolò Cesa-Bianchi and Gábor Lugosi and Gilles Stoltz},

title = {Regret minimization under partial monitoring},

journal = {MATHEMATICS OF OPERATIONS RESEARCH},

year = {2004},

volume = {31},

pages = {2006}

}

### Years of Citing Articles

### OpenURL

### Abstract

We consider repeated games in which the player, instead of observing the action chosen by the opponent in each game round, receives a feedback generated by the combined choice of the two players. We study Hannan consistent players for this games; that is, randomized playing strategies whose per-round regret vanishes with probability one as the number n of game rounds goes to infinity. We prove a general lower bound of Ω(n^−1/3) on the convergence rate of the regret, and exhibit a specific strategy that attains this rate on any game for which a Hannan consistent player exists.

### Citations

8615 | Elements of Information Theory - Cover, Thomas - 1991 |

1495 | Probability inequalities for sums of bounded random variables
- Hoeffding
- 1963
(Show Context)
Citation Context ...Mathematics of Operations Research 31(3), pp. 562–580, © 2006 INFORMS The next lemma is an easy consequence of the Hoeffding-Azuma inequality for sums of bounded martingale differences (see Hoeffding =-=[29]-=-, Azuma [4]). Lemma 3.5. With probability at least 1 − � ′ , n� n� N� � n 1 ℓ�It�yt� � pi�tℓ�i� yt� + ln � t=1 t=1 i=1 2 � ′ The proof of the main result now follows from a combination of Lemmas 3.1 t... |

784 |
The Theory of Learning in Games
- Fudenberg, Levine
- 1998
(Show Context)
Citation Context ... Megiddo [30]). Hannan consistent strategies 4swere constructed by Foster and Vohra [16], Auer, Cesa-Bianchi, Freund, and Schapire [2], and Hart and Mas Colell [21, 23] (see also Fudenberg and Levine =-=[19]-=-). Auer, Cesa-Bianchi, Freund, and Schapire [2] (see also Auer [1]) define a strategy that guarantees a rate of convergence of the order O( � N log(nN)/n) for the regret, which is optimal up to the lo... |

671 | The weighted majority algorithm
- Littlestone, Warmuth
- 1994
(Show Context)
Citation Context ...w key references and surveys include Blackwell [6], Cesa-Bianchi et al. [14], Cesa-Bianchi and Lugosi [10], Feder et al. [16], Foster and Vohra [19], Hart and Mas-Colell [25], Littlestone and Warmuth =-=[31]-=-, Merhav and Feder [35], and Vovk [40, 41]. A natural question one might ask is under what conditions on the loss and feedback matrices it is possible to achieve Hannan consistency, that is, to guaran... |

314 | The Nonstochastic Multiarmed Bandit Problem
- Auer, Cesa-Bianchi, et al.
- 2002
(Show Context)
Citation Context ...case, or adversarial, setting considered in this paper was first investigated by Baños [5] (see also Megiddo [34]). Hannan-consistent strategies were constructed by Foster and Vohra [18], Auer et al. =-=[3]-=-, and Hart and Mas-Colell [24, 26] (see also Fudenberg and Levine [22]). Auer et al. [3] (see also Auer [1] and the refined analysis of Cesa-Bianchi and Lugosi [12]) define a strategy that guarantees ... |

314 | How to use expert advice
- Cesa-Bianchi, Freund, et al.
- 1997
(Show Context)
Citation Context ...e has been studied extensively in the theory of repeated games and in the fields of learning theory and information theory. A few key references and surveys include Blackwell [6], Cesa-Bianchi et al. =-=[14]-=-, Cesa-Bianchi and Lugosi [10], Feder et al. [16], Foster and Vohra [19], Hart and Mas-Colell [25], Littlestone and Warmuth [31], Merhav and Feder [35], and Vovk [40, 41]. A natural question one might... |

249 |
Aggregating strategies
- Vovk
- 1990
(Show Context)
Citation Context ...lackwell [6], Cesa-Bianchi et al. [14], Cesa-Bianchi and Lugosi [10], Feder et al. [16], Foster and Vohra [19], Hart and Mas-Colell [25], Littlestone and Warmuth [31], Merhav and Feder [35], and Vovk =-=[40, 41]-=-. A natural question one might ask is under what conditions on the loss and feedback matrices it is possible to achieve Hannan consistency, that is, to guarantee that, asymptotically, the cumulative l... |

236 |
Weighted sums of certain dependent random variables
- Azuma
- 1967
(Show Context)
Citation Context ...→j t � ˜ℓ�i� yt� − ˜ℓ�j� yt�� � 1 � n� � �t + 2 N t=1 i�=j t=1 n� 2�k∗�2 �t t=1 � ln 1 � 1 + 2k � ′ 3 ∗ + � � 1 ln N 1 � (16) � ′ We then use the Hoeffding-Azuma inequality (see Hoeffding [29], Azuma =-=[4]-=-) N�N− 1� times to show that for every pair i �= j, with probability at least 1 − � ′ , n� n� � pi�t�ℓ�i� yt� − ℓ�j� yt�� � �It=i�ℓ�i� yt� − ℓ�j� yt�� − 2n ln 1 � (17) � ′ t=1 t=1 Finally, we substitu... |

221 | A simple adaptive procedure leading to correlated equilibrium
- Hart, Mas-Colell
(Show Context)
Citation Context ... considered in this paper was first investigated by Baños [5] (see also Megiddo [34]). Hannan-consistent strategies were constructed by Foster and Vohra [18], Auer et al. [3], and Hart and Mas-Colell =-=[24, 26]-=- (see also Fudenberg and Levine [22]). Auer et al. [3] (see also Auer [1] and the refined analysis of Cesa-Bianchi and Lugosi [12]) define a strategy that guarantees a rate of convergence of the order... |

157 | Universal prediction of individual sequences
- Merhav, Feder, et al.
- 1992
(Show Context)
Citation Context ...epeated games and in the fields of learning theory and information theory. A few key references and surveys include Blackwell [6], Cesa-Bianchi et al. [14], Cesa-Bianchi and Lugosi [10], Feder et al. =-=[16]-=-, Foster and Vohra [19], Hart and Mas-Colell [25], Littlestone and Warmuth [31], Merhav and Feder [35], and Vovk [40, 41]. A natural question one might ask is under what conditions on the loss and fee... |

153 |
An analog of the minimax theorem for vector payoffs
- Blackwell
- 1956
(Show Context)
Citation Context ... The full-information case has been studied extensively in the theory of repeated games and in the fields of learning theory and information theory. A few key references and surveys include Blackwell =-=[6]-=-, Cesa-Bianchi et al. [14], Cesa-Bianchi and Lugosi [10], Feder et al. [16], Foster and Vohra [19], Hart and Mas-Colell [25], Littlestone and Warmuth [31], Merhav and Feder [35], and Vovk [40, 41]. A ... |

142 |
Approximation to bayes risk in repeated plays
- Hannan
- 1957
(Show Context)
Citation Context ...ely for all possible strategies of the environment are called Hannan consistent after James Hannan, who first proved the existence of a Hannan-consistent strategy in the full-information case (Hannan =-=[23]-=-) when h�i� j� = j for all i, j (i.e., when the true outcome y t is revealed to the forecaster after taking an action). The full-information case has been studied extensively in the theory of repeated... |

136 | Universal prediction
- Merhav, Feder
- 1998
(Show Context)
Citation Context ...rveys include Blackwell [6], Cesa-Bianchi et al. [14], Cesa-Bianchi and Lugosi [10], Feder et al. [16], Foster and Vohra [19], Hart and Mas-Colell [25], Littlestone and Warmuth [31], Merhav and Feder =-=[35]-=-, and Vovk [40, 41]. A natural question one might ask is under what conditions on the loss and feedback matrices it is possible to achieve Hannan consistency, that is, to guarantee that, asymptoticall... |

112 | Regret in the on-line decision problem
- Foster, Vohra
- 1999
(Show Context)
Citation Context ...e fields of learning theory and information theory. A few key references and surveys include Blackwell [6], Cesa-Bianchi et al. [14], Cesa-Bianchi and Lugosi [10], Feder et al. [16], Foster and Vohra =-=[19]-=-, Hart and Mas-Colell [25], Littlestone and Warmuth [31], Merhav and Feder [35], and Vovk [40, 41]. A natural question one might ask is under what conditions on the loss and feedback matrices it is po... |

110 | Using confidence bounds for exploitation-exploration trade-offs
- Auer
(Show Context)
Citation Context ...o [34]). Hannan-consistent strategies were constructed by Foster and Vohra [18], Auer et al. [3], and Hart and Mas-Colell [24, 26] (see also Fudenberg and Levine [22]). Auer et al. [3] (see also Auer =-=[1]-=- and the refined analysis of Cesa-Bianchi and Lugosi [12]) define a strategy that guarantees a rate of convergence of the order O� � N�log N �/n� for the regret, which is optimal up to the logarithmic... |

99 |
Approximation to Bayes risk in repeated play,” Contributions to the theory of games
- Hannan
- 1957
(Show Context)
Citation Context ...most surely for all possible strategies of the environment are called Hannan consistent after James Hannan, who first proved the existence of a Hannan consistent strategy in the full information case =-=[20]-=- when h(i, j) = j for all i, j (i.e., when the true outcome yt is revealed to the forecaster after taking an action). The full information case has been studied extensively in the theory of repeated g... |

98 |
Consistency and Cautious Fictitious Play
- Fudenberg, Levine
- 1995
(Show Context)
Citation Context ...egy that is Hannan consistent with respect to the internal regret, then the joint empirical frequencies of play converge to the set of correlated equilibria of the game (see also Fudenberg and Levine =-=[21]-=-, Hart and Mas-Colell [24]). Foster and Vohra [17, 19] proposed internal regret-minimizing strategies for the full-information case; see also Cesa-Bianchi and Lugosi [11]. Here we design such a proced... |

86 | Calibrated learning and correlated equilibrium
- Foster, Vohra
- 1997
(Show Context)
Citation Context ... requiring that the above average regret vanishes with probability 1 as n →�. The notion of internal regret has been shown to be useful in the theory of equilibria of repeated games. Foster and Vohra =-=[17, 19]-=- showed that if all players of a finite game choose a strategy that is Hannan consistent with respect to the internal regret, then the joint empirical frequencies of play converge to the set of correl... |

72 | 2001) “A General Class of Adaptative Strategies
- Hart, Mas-Colell
(Show Context)
Citation Context ...y and information theory. A few key references and surveys include Blackwell [6], Cesa-Bianchi et al. [14], Cesa-Bianchi and Lugosi [10], Feder et al. [16], Foster and Vohra [19], Hart and Mas-Colell =-=[25]-=-, Littlestone and Warmuth [31], Merhav and Feder [35], and Vovk [40, 41]. A natural question one might ask is under what conditions on the loss and feedback matrices it is possible to achieve Hannan c... |

71 | Asymptotic calibration
- Foster, Vohra
- 1998
(Show Context)
Citation Context ...etting. The worst-case, or adversarial, setting considered in this paper was first investigated by Baños [5] (see also Megiddo [34]). Hannan-consistent strategies were constructed by Foster and Vohra =-=[18]-=-, Auer et al. [3], and Hart and Mas-Colell [24, 26] (see also Fudenberg and Levine [22]). Auer et al. [3] (see also Auer [1] and the refined analysis of Cesa-Bianchi and Lugosi [12]) define a strategy... |

63 | Competitive on-line statistics
- Vovk
(Show Context)
Citation Context ...lackwell [6], Cesa-Bianchi et al. [14], Cesa-Bianchi and Lugosi [10], Feder et al. [16], Foster and Vohra [19], Hart and Mas-Colell [25], Littlestone and Warmuth [31], Merhav and Feder [35], and Vovk =-=[40, 41]-=-. A natural question one might ask is under what conditions on the loss and feedback matrices it is possible to achieve Hannan consistency, that is, to guarantee that, asymptotically, the cumulative l... |

62 | Adaptive and self-confident on-line learning algorithms
- Auer, Cesa-Bianchi, et al.
(Show Context)
Citation Context ... Piccolboni and Schindelhauer [37]. This strategy is based on the exponentially weighted average forecaster, a thoroughly studied predictor in the full information case; see, for example, Auer et al. =-=[2]-=-, Cesa-Bianchi et al. [14], Littlestone and Warmuth [31], Vovk [40, 41]. In the special case of the multiarmed bandit problem, the forecaster reduces to the strategy of Auer et al. [3] (see also Hart ... |

59 | From external to internal regret
- Blum, Mansour
- 2007
(Show Context)
Citation Context ...o Cesa-Bianchi and Lugosi [11]. Here we design such a procedure in the setting of partial monitoring. The key tool is a conversion trick described in Stoltz and Lugosi [39] (see also Blum and Mansour =-=[8]-=- for a similar procedure). This trick essentially converts external regret-minimizing strategies into internal regret-minimizing strategies, under full information. We extend it here to prediction und... |

59 | Online learning in online auctions
- Blum, Kumar, et al.
- 2004
(Show Context)
Citation Context ...that all terms depending on yt cancel out when considering the regret, and we obtain the bandit setting referred to as online posted price mechanism in, e.g., Kleinberg and Leighton [30], Blum et al. =-=[9]-=-, Blum and Hartline [7]—see below.) In either case, if the seller knew in advance the empirical distribution of the yts, then he could set a constant price q ∈ �0� 1�, which minimizes his overall loss... |

57 |
On tail probabilities for martingales
- Freedman
- 1975
(Show Context)
Citation Context ...ow of H ′ ,1� k 1 � N ,1� k 2 � m, isH �k1�s k2 �. All the other details of the construction and the proofs go through. Appendix A. Bernstein’s inequality. Bernstein’s inequality (see, e.g., Freedman =-=[20]-=-, Massart [33]) is used several times in the proofs. Lemma A.1 (Bernstein’s Inequality). Let X1�X2�����Xn be a bounded martingale difference sequence (with respect to the filtration ℱ = �ℱt�1�t�n), wi... |

45 | J.D.: Near-optimal online auctions
- Blum, Hartline
- 2005
(Show Context)
Citation Context ...g on yt cancel out when considering the regret, and we obtain the bandit setting referred to as online posted price mechanism in, e.g., Kleinberg and Leighton [30], Blum et al. [9], Blum and Hartline =-=[7]-=-—see below.) In either case, if the seller knew in advance the empirical distribution of the yts, then he could set a constant price q ∈ �0� 1�, which minimizes his overall loss. A natural question is... |

44 |
The value of knowing a demand curve: Bounds on regret for online posted-price auctions
- Kleinberg, Leighton
- 2003
(Show Context)
Citation Context ...ore general framework which we describe next. The dynamic pricing problem described above, which is a special case of this more general framework, has been also investigated by Kleinberg and Leighton =-=[27]-=- in a simpler setting where the reward of the seller is defined as ρ(pt, yt) = pt Ipt≤yt. Note that, by using the feedback information (i.e., whether the customer bought the product or not), here the ... |

43 |
Concentration Inequalities and Model Selection
- Massart
- 2007
(Show Context)
Citation Context ...k 1 � N ,1� k 2 � m, isH �k1�s k2 �. All the other details of the construction and the proofs go through. Appendix A. Bernstein’s inequality. Bernstein’s inequality (see, e.g., Freedman [20], Massart =-=[33]-=-) is used several times in the proofs. Lemma A.1 (Bernstein’s Inequality). Let X1�X2�����Xn be a bounded martingale difference sequence (with respect to the filtration ℱ = �ℱt�1�t�n), with increments ... |

40 | Minimizing regret with label efficient prediction
- Cesa-Bianchi, Lugosi, et al.
- 2005
(Show Context)
Citation Context ... apple classified “good for sale” is never checked.) Example 2.4 (Label-Efficient Prediction). In the problem of label-efficient prediction (see Helmbold and Panizza [27] and also Cesa-Bianchi et al. =-=[13]-=-), the forecaster, after choosing its prediction for round t, decides whether to query the outcome yt, which he can do only a limited number of times. In Cesa-Bianchi et al. [13], matching upper and l... |

36 |
Repeated games
- Mertens, Sorin, et al.
- 1994
(Show Context)
Citation Context ... be chosen such that it has m � M elements, although after numerical encoding the matrix might have as many as MN distinct elements. The problem of partial monitoring was considered by Mertens et al. =-=[36]-=-, Rustichini [38], Piccolboni and Schindelhauer [37], and Mannor and Shimkin [32]. The forecaster strategy studied in §3 is first introduced in Piccolboni and Schindelhauer [37], where its expected re... |

32 | Potential-based algorithms in on-line prediction and game theory
- Cesa-Bianchi, Lugosi
- 2003
(Show Context)
Citation Context ...see also Fudenberg and Levine [21], Hart and Mas-Colell [24]). Foster and Vohra [17, 19] proposed internal regret-minimizing strategies for the full-information case; see also Cesa-Bianchi and Lugosi =-=[11]-=-. Here we design such a procedure in the setting of partial monitoring. The key tool is a conversion trick described in Stoltz and Lugosi [39] (see also Blum and Mansour [8] for a similar procedure). ... |

26 |
Discrete prediction games with arbitrary feedback and loss. Computational Learning Theory
- Piccolboni, Schindelhauer
- 2001
(Show Context)
Citation Context ... probability one. Naturally, this depends on the relationship between the loss and feedback functions. An initial answer to this question has been provided by the work of Piccolboni and Schindelhauer =-=[37]-=-. However, because they are concerned only with expected performance, their results do not imply Hannan consistency. In addition, their bounds have suboptimal rates of convergence. Below, we extend th... |

26 | Minimizing regret: The general case
- Rustichini
- 1999
(Show Context)
Citation Context ...ove inequality is at most as desired. � 15�k ∗ N� 2/3 �ln N� 1/3 �n + 1� 2/3 7. Random feedback. Several authors consider an extended setup in which the feedbacks are random variables. See Rustichini =-=[38]-=-, Mannor and Shimkin [32], Weissman and Merhav [42], and Weissman et al. [43] for examples. In this section we briefly point out that most of the results of this paper extend effortlessly to this more... |

25 | On prediction of individual sequences
- Cesa-Bianchi, Lugosi
- 1999
(Show Context)
Citation Context ... in the theory of repeated games and in the fields of learning theory and information theory. A few key references and surveys include Blackwell [6], Cesa-Bianchi et al. [14], Cesa-Bianchi and Lugosi =-=[10]-=-, Feder et al. [16], Foster and Vohra [19], Hart and Mas-Colell [25], Littlestone and Warmuth [31], Merhav and Feder [35], and Vovk [40, 41]. A natural question one might ask is under what conditions ... |

25 | A reinforcement procedure leading to correlated equilibrium. In Economics Essays: A Festschrift for Werner Hildebrand
- Hart, Mas-Colell
- 2001
(Show Context)
Citation Context ... considered in this paper was first investigated by Baños [5] (see also Megiddo [34]). Hannan-consistent strategies were constructed by Foster and Vohra [18], Auer et al. [3], and Hart and Mas-Colell =-=[24, 26]-=- (see also Fudenberg and Levine [22]). Auer et al. [3] (see also Auer [1] and the refined analysis of Cesa-Bianchi and Lugosi [12]) define a strategy that guarantees a rate of convergence of the order... |

25 | On repeated games with incomplete information played by non-Bayesian players
- Megiddo
- 1980
(Show Context)
Citation Context ...roblem has been widely studied both in a stochastic and in a worst-case setting. The worst-case, or adversarial, setting considered in this paper was first investigated by Baños [5] (see also Megiddo =-=[34]-=-). Hannan-consistent strategies were constructed by Foster and Vohra [18], Auer et al. [3], and Hart and Mas-Colell [24, 26] (see also Fudenberg and Levine [22]). Auer et al. [3] (see also Auer [1] an... |

23 | Some label efficient learning results
- Helmbold, Panizza
- 1997
(Show Context)
Citation Context ... checked cannot be put on sale, an apple classified “good for sale” is never checked.) Example 2.4 (Label-Efficient Prediction). In the problem of label-efficient prediction (see Helmbold and Panizza =-=[27]-=- and also Cesa-Bianchi et al. [13]), the forecaster, after choosing its prediction for round t, decides whether to query the outcome yt, which he can do only a limited number of times. In Cesa-Bianchi... |

19 | On pseudo-games
- Banos
- 1968
(Show Context)
Citation Context ...t his own loss. This problem has been widely studied both in a stochastic and in a worst-case setting. The worst-case, or adversarial, setting considered in this paper was first investigated by Baños =-=[5]-=- (see also Megiddo [34]). Hannan-consistent strategies were constructed by Foster and Vohra [18], Auer et al. [3], and Hart and Mas-Colell [24, 26] (see also Fudenberg and Levine [22]). Auer et al. [3... |

16 |
Twofold universal prediction schemes for achieving the finite-state predictability of a noisy individual binary sequence
- Weissman, Merhav, et al.
- 2001
(Show Context)
Citation Context ...7. Random feedback. Several authors consider an extended setup in which the feedbacks are random variables. See Rustichini [38], Mannor and Shimkin [32], Weissman and Merhav [42], and Weissman et al. =-=[43]-=- for examples. In this section we briefly point out that most of the results of this paper extend effortlessly to this more general case. To describe the model, denote by ��� � the set of all probabil... |

16 | Apple tasting
- Helmbold, Littlestone, et al.
(Show Context)
Citation Context ... + b Ii>j i, j = 1, . . . , N where a and b are constants chosen by the forecaster satisfying a, b ∈ [−1, 1]. Example 3 (APPLE TASTING.) This problem was considered by Helmbold, Littlestone, and Long =-=[25]-=- in a somewhat more restrictive setting. In this example N = M = 2 and the loss and feedback matrices are given by L = � 0 1 1 0 � and H = � a a b c Thus, the forecaster only receives feedback about t... |

15 | Internal regret in on-line portfolio selection
- Stoltz, Lugosi
(Show Context)
Citation Context ...e full-information case; see also Cesa-Bianchi and Lugosi [11]. Here we design such a procedure in the setting of partial monitoring. The key tool is a conversion trick described in Stoltz and Lugosi =-=[39]-=- (see also Blum and Mansour [8] for a similar procedure). This trick essentially converts external regret-minimizing strategies into internal regret-minimizing strategies, under full information. We e... |

12 | Elements ofInformation Theory - Thomas - 1991 |

8 | On-line learning with imperfect monitoring
- Mannor, Shimkin
- 2003
(Show Context)
Citation Context ... matrix might have as many as MN distinct elements. The problem of partial monitoring was considered by Mertens et al. [36], Rustichini [38], Piccolboni and Schindelhauer [37], and Mannor and Shimkin =-=[32]-=-. The forecaster strategy studied in §3 is first introduced in Piccolboni and Schindelhauer [37], where its expected regret is shown to have a sublinear growth. Rustichini [38] t=1sCesa-Bianchi, Lugos... |

7 |
Universal prediction of binary individual sequences in the presence of noise
- Weissman, Merhav
(Show Context)
Citation Context .../3 �ln N� 1/3 �n + 1� 2/3 7. Random feedback. Several authors consider an extended setup in which the feedbacks are random variables. See Rustichini [38], Mannor and Shimkin [32], Weissman and Merhav =-=[42]-=-, and Weissman et al. [43] for examples. In this section we briefly point out that most of the results of this paper extend effortlessly to this more general case. To describe the model, denote by ���... |

5 |
On tail probabilities for martingales, Ann
- Freedman
- 1975
(Show Context)
Citation Context ...� k1 � N, 1 � k2 � m, is H (k1,sk 2 ). All the other details of the construction and the proofs go through. Appendix A. Bernstein’s inequality. times in the proofs. Bernstein’s inequality (see, e.g., =-=[20, 33]-=-) is used several Lemma A.1 (Bernstein’s inequality) Let X1, X2, . . . , Xn be a bounded martingale difference sequence (with respect to the filtration F = (Ft)1�t�n), with increments bounded from abo... |

2 |
Online learning in online auctions, Theoret
- Blum, Kumar, et al.
(Show Context)
Citation Context ...that all terms depending on yt cancel out when considering the regret, and we obtain the bandit setting referred to as online posted price mechanism in, e.g., Kleinberg and Leighton [30], Blum et al. =-=[8]-=-, Blum and Hartline [7]—see below.) In either case, if the seller knew in advance the empirical distribution of the yt’s then he could set a constant price q ∈ [0, 1] which minimizes his overall loss.... |

1 |
The Theory ofLearning in Games
- Fudenberg, Levine
- 1998
(Show Context)
Citation Context ...stigated by Baños [5] (see also Megiddo [34]). Hannan-consistent strategies were constructed by Foster and Vohra [18], Auer et al. [3], and Hart and Mas-Colell [24, 26] (see also Fudenberg and Levine =-=[22]-=-). Auer et al. [3] (see also Auer [1] and the refined analysis of Cesa-Bianchi and Lugosi [12]) define a strategy that guarantees a rate of convergence of the order O� � N�log N �/n� for the regret, w... |

1 |
algorithms in on-line prediction and game
- Potential-based
(Show Context)
Citation Context ...see also Fudenberg and Levine [21], Hart and Mas-Colell [24]). Foster and Vohra [17, 19] proposed internal regret minimizing strategies for the full-information case, see also Cesa-Bianchi and Lugosi =-=[12]-=-. We design here such a procedure in the setting of partial monitoring. The key tool is a conversion trick described in Stoltz and Lugosi [39] (see also Blum and Mansour [9] for a similar procedure). ... |

1 |
in the on-line decision problem
- Regret
- 1999
(Show Context)
Citation Context ...e fields of learning theory and information theory. A few key references and surveys include Blackwell [6], Cesa-Bianchi et al. [10], Cesa-Bianchi and Lugosi [11], Feder et al. [16], Foster and Vohra =-=[19]-=-, Hart and Mas-Colell [25], Littlestone and Warmuth [31], Merhav and Feder [35], and Vovk [40, 41]. A natural question one may ask is under what conditions on the loss and feedback matrices it is poss... |

1 |
Repeated games, Discussion paper 9420
- Mertens, Sorin, et al.
- 1994
(Show Context)
Citation Context ... may be chosen such that it has m � M elements, though after numerical encoding the matrix may have as many as MN distinct elements. The problem of partial monitoring was considered by Mertens et al. =-=[36]-=-, Rustichini [38], Piccolboni and Schindelhauer [37], and Mannor and Shimkin [32]. The forecaster strategy studied in Section 3 is first introduced in [37], where its expected regret is shown to have ... |