Results 21  30
of
214
Potentialbased Algorithms in Online Prediction and Game Theory
"... In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasiadditive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapire's Hedge and MW, as well as the strategies of Hart ..."
Abstract

Cited by 42 (4 self)
 Add to MetaCart
(Show Context)
In this paper we show that several known algorithms for sequential prediction problems (including Weighted Majority and the quasiadditive family of Grove, Littlestone, and Schuurmans), for playing iterated games (including Freund and Schapire's Hedge and MW, as well as the strategies of Hart and MasColell), and for boosting (including AdaBoost) are special cases of a general decision strategy based on the notion of potential. By analyzing this strategy we derive known performance bounds, as well as new bounds, as simple corollaries of a single general theorem. Besides offering a new and unified view on a large family of algorithms, we establish a connection between potentialbased analysis in learning and their counterparts independently developed in game theory. By exploiting this connection, we show that certain learning problems are instances of more general gametheoretic problems. In particular, we describe a notion of generalized regret and show its applications in learning theory.
On NoRegret Learning, Fictitious Play, and Nash Equilibrium
 In Proceedings of the Eighteenth International Conference on Machine Learning
, 2001
"... This paper addresses the question what is the outcome of multiagent learning via noregret algorithms in repeated games? Specically, can the outcome of noregret learning be characterized by traditional gametheoretic solution concepts, such as Nash equilibrium ? The conclusion of this study ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
(Show Context)
This paper addresses the question what is the outcome of multiagent learning via noregret algorithms in repeated games? Specically, can the outcome of noregret learning be characterized by traditional gametheoretic solution concepts, such as Nash equilibrium ? The conclusion of this study is that noregret learning is reminiscent of ctitious play: play converges to Nash equilibrium in dominancesolvable, constantsum, and generalsum 2 2 games, but cycles exponentially in the Shapley game. Notably, however, the information required of ctitious play far exceeds that of noregret learning. 1.
Universal schemes for sequential decision from individual data sequences
, 1993
"... Sequential decision algorithms are investigated, under a family of additive performance criteria, for individual data sequences, with various application areas in information theory and signal processing. Simple universal sequential schemes are known, under certain conditions, to approach optimality ..."
Abstract

Cited by 35 (14 self)
 Add to MetaCart
Sequential decision algorithms are investigated, under a family of additive performance criteria, for individual data sequences, with various application areas in information theory and signal processing. Simple universal sequential schemes are known, under certain conditions, to approach optimality uniformly as fast as nl log n, where n is the sample size. For the case of finitealphabet observations, the class of schemes that can be implemented by bitestate machines (FSM’s), is studied. It is shown that Markovian machines with daently long memory exist that are asympboticaily nerrly as good as any given FSM (deterministic or WomhI) for the purpose of sequential decision. For the continuousvalued observation case, a useful class of parametric schemes is discussed with special attention to the recursive least squares W) algorithm.
Monte Carlo Sampling for Regret Minimization in Extensive Games
"... Sequential decisionmaking with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zerosum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven e ..."
Abstract

Cited by 34 (13 self)
 Add to MetaCart
(Show Context)
Sequential decisionmaking with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zerosum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domainspecific augmentation involving chance outcome sampling. In this paper, we describe a general family of domainindependent CFR samplebased algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and pokerspecific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using selfplay. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFR’s bounds. We show empirically that, although the samplebased algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games. 1
Admission Control and Scheduling for QoS Guarantees for VariableBitRate Applications on Wireless Channels
"... Providing differentiated Quality of Service (QoS) over unreliable wireless channels is an important challenge for supporting several future applications. We analyze a model that has been proposed to describe the QoS requirements by four criteria: traffic pattern, channel reliability, delay bound, an ..."
Abstract

Cited by 30 (11 self)
 Add to MetaCart
(Show Context)
Providing differentiated Quality of Service (QoS) over unreliable wireless channels is an important challenge for supporting several future applications. We analyze a model that has been proposed to describe the QoS requirements by four criteria: traffic pattern, channel reliability, delay bound, and throughput bound. We study this mathematical model and extend it to handle variable bit rate applications. We then obtain a sharp characterization of schedulability visavis latencies and timely throughput. Our results extend the results so that they are general enough to be applied on a wide range of wireless applications, including MPEG VariableBitRate (VBR) video streaming, VoIP with differentiated quality, and wireless sensor networks (WSN). Two major issues concerning QoS over wireless are admission control and scheduling. Based on the model incorporating the QoS criteria, we analytically derive a necessary and sufficient condition for a set of variable bitrate clients to be feasible. Admission control is reduced to evaluating the necessary and sufficient condition. We further analyze two scheduling policies that have been proposed, and show that they are both optimal in the sense that they can fulfill every set of clients that is feasible by some scheduling algorithms. The policies are easily implemented on the IEEE 802.11 standard. Simulation results under various settings support the theoretical study.
Optimal Prediction for Prefetching in the Worst Case
, 1998
"... Response time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, whe ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
Response time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, where the predictions must be accurate as well as made in a computationally efficient way. Unlike other online problems, prefetching cannot admit a competitive analysis, since the optimal offline prefetcher incurs no cost when it knows the future page requests. Previous analytical work on prefetching [J. Assoc. Comput. Mach., 143 (1996), pp. 771–793] consisted of modeling the user as a probabilistic Markov source. In this paper, we look at the much stronger form of worstcase analysis and derive a randomized algorithm for pure prefetching. We compare our algorithm for every page request sequence with the important class of finite state prefetchers, making no assumptions as to how the sequence of page requests is generated. We prove analytically that the fault rate of our online prefetching algorithm converges almost surely for every page request sequence to the fault rate of the optimal finite state prefetcher for the sequence. This analysis model can be looked upon as a generalization of the competitive framework, in that it compares an online algorithm in a worstcase manner over all sequences with a powerful yet nonclairvoyant opponent. We simultaneously achieve the computational goal of implementing our prefetcher in optimal constant expected time per prefetched page using the optimal dynamic discrete random variate generator of Matias, Vitter, and Ni [Proc. 4th Annual SIAM/ACM
Agnostic Online Learning
"... We study learnability of hypotheses classes in agnostic online prediction models. The analogous question in the PAC learning model [Valiant, 1984] was addressed by Haussler [1992] and others, who showed that the VC dimension characterization of the sample complexity of learnability extends to the ag ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
(Show Context)
We study learnability of hypotheses classes in agnostic online prediction models. The analogous question in the PAC learning model [Valiant, 1984] was addressed by Haussler [1992] and others, who showed that the VC dimension characterization of the sample complexity of learnability extends to the agnostic (or ”unrealizable”) setting. In his influential work, Littlestone [1988] described a combinatorial characterization of hypothesis classes that are learnable in the online model. We extend Littlestone’s results in two aspects. First, while Littlestone only dealt with the realizable case, namely, assuming there exists a hypothesis in the class that perfectly explains the entire data, we derive results for the nonrealizable (agnostic) case as well. In particular, we describe several models of nonrealizable data and derive upper and lower bounds on the achievable regret. Second, we extend the theory to include marginbased hypothesis classes, in which the prediction of each hypothesis is accompanied by a confidence value. We demonstrate how the newly developed theory seamlessly yields novel online regret bounds for the important class of large margin linear separators. 1
Drifting Games
, 1999
"... . We introduce and study a general, abstract game played between two players called the shepherd and the adversary. The game is played in a series of rounds using a finite set of "chips" which are moved about in R n . On each round, the shepherd assigns a desired direction of movement a ..."
Abstract

Cited by 26 (7 self)
 Add to MetaCart
. We introduce and study a general, abstract game played between two players called the shepherd and the adversary. The game is played in a series of rounds using a finite set of "chips" which are moved about in R n . On each round, the shepherd assigns a desired direction of movement and an importance weight to each of the chips. The adversary then moves the chips in any way that need only be weakly correlated with the desired directions assigned by the shepherd. The shepherd's goal is to cause the chips to be moved to lowloss positions, where the loss of each chip at its final position is measured by a given loss function. We present a shepherd algorithm for this game and prove an upper bound on its performance. We also prove a lower bound showing that the algorithm is essentially optimal for a large number of chips. We discuss computational methods for efficiently implementing our algorithm. We show that our general driftinggame algorithm subsumes some well studied boosting and...
Internal regret in online portfolio selection
 Machine Learning
, 2005
"... Abstract. This paper extends the gametheoretic notion of internal regret to the case of online potfolio selection problems. New sequential investment strategies are designed to minimize the cumulative internal regret for all possible market behaviors. Some of the introduced strategies, apart from ..."
Abstract

Cited by 24 (7 self)
 Add to MetaCart
Abstract. This paper extends the gametheoretic notion of internal regret to the case of online potfolio selection problems. New sequential investment strategies are designed to minimize the cumulative internal regret for all possible market behaviors. Some of the introduced strategies, apart from achieving a small internal regret, achieve an accumulated wealth almost as large as that of the best constantly rebalanced portfolio. It is argued that the lowinternalregret property is related to stability and experiments on real stock exchange data demonstrate that the new strategies achieve better returns compared to some known algorithms. 1.