Results 21  30
of
119
Universal schemes for sequential decision from individual data sequences
, 1993
"... Sequential decision algorithms are investigated, under a family of additive performance criteria, for individual data sequences, with various application areas in information theory and signal processing. Simple universal sequential schemes are known, under certain conditions, to approach optimality ..."
Abstract

Cited by 28 (11 self)
 Add to MetaCart
Sequential decision algorithms are investigated, under a family of additive performance criteria, for individual data sequences, with various application areas in information theory and signal processing. Simple universal sequential schemes are known, under certain conditions, to approach optimality uniformly as fast as nl log n, where n is the sample size. For the case of finitealphabet observations, the class of schemes that can be implemented by bitestate machines (FSM’s), is studied. It is shown that Markovian machines with daently long memory exist that are asympboticaily nerrly as good as any given FSM (deterministic or WomhI) for the purpose of sequential decision. For the continuousvalued observation case, a useful class of parametric schemes is discussed with special attention to the recursive least squares W) algorithm.
Optimal Prediction for Prefetching in the Worst Case
, 1998
"... Response time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, whe ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
Response time delays caused by I/O are a major problem in many systems and database applications. Prefetching and cache replacement methods are attracting renewed attention because of their success in avoiding costly I/Os. Prefetching can be looked upon as a type of online sequential prediction, where the predictions must be accurate as well as made in a computationally efficient way. Unlike other online problems, prefetching cannot admit a competitive analysis, since the optimal offline prefetcher incurs no cost when it knows the future page requests. Previous analytical work on prefetching [J. Assoc. Comput. Mach., 143 (1996), pp. 771–793] consisted of modeling the user as a probabilistic Markov source. In this paper, we look at the much stronger form of worstcase analysis and derive a randomized algorithm for pure prefetching. We compare our algorithm for every page request sequence with the important class of finite state prefetchers, making no assumptions as to how the sequence of page requests is generated. We prove analytically that the fault rate of our online prefetching algorithm converges almost surely for every page request sequence to the fault rate of the optimal finite state prefetcher for the sequence. This analysis model can be looked upon as a generalization of the competitive framework, in that it compares an online algorithm in a worstcase manner over all sequences with a powerful yet nonclairvoyant opponent. We simultaneously achieve the computational goal of implementing our prefetcher in optimal constant expected time per prefetched page using the optimal dynamic discrete random variate generator of Matias, Vitter, and Ni [Proc. 4th Annual SIAM/ACM
On NoRegret Learning, Fictitious Play, and Nash Equilibrium
 In Proceedings of the Eighteenth International Conference on Machine Learning
, 2001
"... This paper addresses the question what is the outcome of multiagent learning via noregret algorithms in repeated games? Specically, can the outcome of noregret learning be characterized by traditional gametheoretic solution concepts, such as Nash equilibrium ? The conclusion of this study ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
This paper addresses the question what is the outcome of multiagent learning via noregret algorithms in repeated games? Specically, can the outcome of noregret learning be characterized by traditional gametheoretic solution concepts, such as Nash equilibrium ? The conclusion of this study is that noregret learning is reminiscent of ctitious play: play converges to Nash equilibrium in dominancesolvable, constantsum, and generalsum 2 2 games, but cycles exponentially in the Shapley game. Notably, however, the information required of ctitious play far exceeds that of noregret learning. 1.
Drifting Games
, 1999
"... . We introduce and study a general, abstract game played between two players called the shepherd and the adversary. The game is played in a series of rounds using a finite set of "chips" which are moved about in R n . On each round, the shepherd assigns a desired direction of movement and an impo ..."
Abstract

Cited by 23 (6 self)
 Add to MetaCart
. We introduce and study a general, abstract game played between two players called the shepherd and the adversary. The game is played in a series of rounds using a finite set of "chips" which are moved about in R n . On each round, the shepherd assigns a desired direction of movement and an importance weight to each of the chips. The adversary then moves the chips in any way that need only be weakly correlated with the desired directions assigned by the shepherd. The shepherd's goal is to cause the chips to be moved to lowloss positions, where the loss of each chip at its final position is measured by a given loss function. We present a shepherd algorithm for this game and prove an upper bound on its performance. We also prove a lower bound showing that the algorithm is essentially optimal for a large number of chips. We discuss computational methods for efficiently implementing our algorithm. We show that our general driftinggame algorithm subsumes some well studied boosting and...
Guaranteed performance regions in Markovian systems with competing decision makers
 IEEE Trans. Autom. Control
, 1993
"... AbstractThe paper addresses the problem of (longterm) multiobjective control under dynamic uncertainty, using a game theoretic framework. A decision maker faces a dynamic system, which is also affected by other decision makers (these may stand for other controllers, system users, or dynamic distur ..."
Abstract

Cited by 19 (14 self)
 Add to MetaCart
AbstractThe paper addresses the problem of (longterm) multiobjective control under dynamic uncertainty, using a game theoretic framework. A decision maker faces a dynamic system, which is also affected by other decision makers (these may stand for other controllers, system users, or dynamic disturbances). He / she considers a vector of timeaveraged performance measures. Acceptable performance is defined through a set in the space of performance vectors. Can this decision maker guarantee a performance vector which asymptotically approaches this desired set? We consider the worstcase scenario, where other decision makers may try to exclude his / her vector from the desired set. For a controlled Markov model of the system, we give a sufficient condition for approachability, and construct appropriate control strategies. Under certain recurrence conditions, a complete characterization of approachability is then provided for convex sets. The mathematical formulation leads to a theory of approachability for “stochastic games with vector payoffs. ” A simple queueing example is analyzed to illustrate this approach. I.
Monte Carlo Sampling for Regret Minimization in Extensive Games
"... Sequential decisionmaking with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zerosum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven e ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
Sequential decisionmaking with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zerosum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domainspecific augmentation involving chance outcome sampling. In this paper, we describe a general family of domainindependent CFR samplebased algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and pokerspecific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using selfplay. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFR’s bounds. We show empirically that, although the samplebased algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games. 1
The online shortest path problem under partial monitoring
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... The online shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
The online shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be small. In the multiarmed bandit setting, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this scenario, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched offline to the entire sequence of the edge weights, by a quantity that is proportional to 1 / √n and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. This result improves earlier banditalgorithms which have performance bounds that either depend exponentially on the number of edges or converge to zero at a slower rate than O(1 / √n). An extension to the socalled label efficient setting is also given, where the decision maker is informed about the weight of the chosen path only with probability ɛ < 1. Applications to routing in packet switched networks along with simulation results are also presented.
The Empirical Bayes Envelope and Regret Minimization in Competitive Markov Decision Processes
 Mathematics of Operations Research
, 2002
"... This paper proposes an extension of the regret minimizing framework from repeated matrix games to stochastic game models, under appropriate recurrence conditions. A decision maker (P1) who wishes to maximize his longterm average reward is facing a Markovian environment, which may also be affected b ..."
Abstract

Cited by 17 (12 self)
 Add to MetaCart
This paper proposes an extension of the regret minimizing framework from repeated matrix games to stochastic game models, under appropriate recurrence conditions. A decision maker (P1) who wishes to maximize his longterm average reward is facing a Markovian environment, which may also be affected by arbitrary actions of other agents. The latter are collectively modeled as a second player, P2, whose strategy is arbitrary. Both states and actions are fully observed by both players. While P1 may obviously secure the minmax value of the game, he may wish to improve on that when the opponent is not playing a worstcase strategy. For repeated matrix games, an achievable goal is presented by the Bayes envelope, that traces Pl's bestresponse payoff against the observable frequencies of P2's actions. We propose a generalization to the stochastic game framework, under recurrence conditions that amount to fixedstate reachability. The empirical Bayes envelope (EBE) is defined as Pl's bestresponse payoff against the stationary strategies of P2 that agree with the observed stateaction frequencies. As the EBE may not be attainable in general, we consider its lower convex hull, the CBE, which is proved to be achievable by P1. The analysis relies on Blackwell's approachability theory. The CBE is lower bounded by the value of the game, and for irreducible games turns out to be strictly above the value whenever P2's frequencies deviate from a worstcase strategy. In the special case of singlecontroller games where P2 alone affects the state transitions, the EBE itself is shown to be attainable.
Online Prediction Algorithms for Databases and Operating Systems
, 1995
"... In making online decisions, computer systems are inherently trying to predict future events. Typical decision problems in computer systems translate to three prediction scenarios: predicting what event is going to happen in the future, when a specific event will take place, or how much of something ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
In making online decisions, computer systems are inherently trying to predict future events. Typical decision problems in computer systems translate to three prediction scenarios: predicting what event is going to happen in the future, when a specific event will take place, or how much of something is going to happen. In this thesis, we develop practical algorithms for specific instances of these three prediction scenarios, and prove the goodness of our algorithms via analytical and experimental methods. We study each of the three prediction scenarios via motivating systems problems. The problem of prefetching requires a prediction of which page is going to be next requested by a user. The problem of disk spindown in mobile machines, modeled by the renttobuy framework, requires an estimate of when the next disk access is going to happen. Query optimizers choose a database access strategy by predicting or estimating selectivity, i.e., by estimating the size of a query result. We an...