Results 11  20
of
137
Online Decision Problems with Large Strategy Sets
, 2005
"... In an online decision problem, an algorithm performs a sequence of trials, each of which involves selecting one element from a fixed set of alternatives (the “strategy set”) whose costs vary over time. After T trials, the combined cost of the algorithm’s choices is compared with that of the single s ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
In an online decision problem, an algorithm performs a sequence of trials, each of which involves selecting one element from a fixed set of alternatives (the “strategy set”) whose costs vary over time. After T trials, the combined cost of the algorithm’s choices is compared with that of the single strategy whose combined cost is minimum. Their difference is called regret, and one seeks algorithms which are efficient in that their regret is sublinear in T and polynomial in the problem size. We study an important class of online decision problems called generalized multiarmed bandit problems. In the past such problems have found applications in areas as diverse as statistics, computer science, economic theory, and medical decisionmaking. Most existing algorithms were efficient only in the case of a small (i.e. polynomialsized) strategy set. We extend the theory by supplying nontrivial algorithms and lower bounds for cases in which the strategy set is much larger (exponential or infinite) and
Selfimproving algorithms
 in SODA ’06: Proceedings of the seventeenth annual ACMSIAM symposium on Discrete algorithm
"... We investigate ways in which an algorithm can improve its expected performance by finetuning itself automatically with respect to an arbitrary, unknown input distribution. We give such selfimproving algorithms for sorting and computing Delaunay triangulations. The highlights of this work: (i) an al ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
We investigate ways in which an algorithm can improve its expected performance by finetuning itself automatically with respect to an arbitrary, unknown input distribution. We give such selfimproving algorithms for sorting and computing Delaunay triangulations. The highlights of this work: (i) an algorithm to sort a list of numbers with optimal expected limiting complexity; and (ii) an algorithm to compute the Delaunay triangulation of a set of points with optimal expected limiting complexity. In both cases, the algorithm begins with a training phase during which it adjusts itself to the input distribution, followed by a stationary regime in which the algorithm settles to its optimized incarnation. 1
Playing games with approximation algorithms
 In Proceedings of the 39 th annual ACM Symposium on Theory of Computing
, 2007
"... Abstract. In an online linear optimization problem, on each period t, an online algorithm chooses st ∈ S from a fixed (possibly infinite) set S of feasible decisions. Nature (who may be adversarial) chooses a weight vector wt ∈ R n, and the algorithm incurs cost c(st, wt), where c is a fixed cost fu ..."
Abstract

Cited by 21 (2 self)
 Add to MetaCart
Abstract. In an online linear optimization problem, on each period t, an online algorithm chooses st ∈ S from a fixed (possibly infinite) set S of feasible decisions. Nature (who may be adversarial) chooses a weight vector wt ∈ R n, and the algorithm incurs cost c(st, wt), where c is a fixed cost function that is linear in the weight vector. In the fullinformation setting, the vector wt is then revealed to the algorithm, and in the bandit setting, only the cost experienced, c(st, wt), is revealed. The goal of the online algorithm is to perform nearly as well as the best fixed s ∈ S in hindsight. Many repeated decisionmaking problems with weights fit naturally into this framework, such as online shortestpath, online TSP, online clustering, and online weighted set cover. Previously, it was shown how to convert any efficient exact offline optimization algorithm for such a problem into an efficient online algorithm in both the fullinformation and the bandit settings, with average cost nearly as good as that of the best fixed s ∈ S in hindsight. However, in the case where the offline algorithm is an approximation algorithm with ratio α> 1, the previous approach only worked for special types of approximation algorithms. We show how to convert any offline approximation algorithm for a linear optimization problem into a corresponding online approximation algorithm, with a polynomial blowup in runtime. If the offline algorithm has an αapproximation guarantee, then the expected cost of the online algorithm on any sequence is not much larger than α times that of the best s ∈ S, where the best is chosen with the benefit of hindsight. Our main innovation is combining Zinkevich’s algorithm for convex optimization with a geometric transformation that can be applied to any approximation algorithm. Standard techniques generalize the above result to the bandit setting, except that a “Barycentric Spanner ” for the problem is also (provably) necessary as input. Our algorithm can also be viewed as a method for playing large repeated games, where one can only compute approximate bestresponses, rather than bestresponses. 1. Introduction. In the 1950’s
Regret based dynamics: Convergence in weakly acyclic games
 In Proceedings of the 2007 International Conference on Autonomous Agents and Multiagent Systems (AAMAS
, 2007
"... Regret based algorithms have been proposed to control a wide variety of multiagent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multiagent systems and (2) there are existing results proving that the behavior will asymptotically ..."
Abstract

Cited by 20 (9 self)
 Add to MetaCart
Regret based algorithms have been proposed to control a wide variety of multiagent systems. The appeal of regretbased algorithms is that (1) these algorithms are easily implementable in large scale multiagent systems and (2) there are existing results proving that the behavior will asymptotically converge to a set of points of “noregret ” in any game. We illustrate, through a simple example, that noregret points need not reflect desirable operating conditions for a multiagent system. Multiagent systems often exhibit an additional structure (i.e. being “weakly acyclic”) that has not been exploited in the context of regret based algorithms. In this paper, we introduce a modification of regret based algorithms by (1) exponentially discounting the memory and (2) bringing in a notion of inertia in players ’ decision process. We show how these modifications can lead to an entire class of regret based algorithm that provide almost sure convergence to a pure Nash equilibrium in any weakly acyclic game.
Combinatorial Bandits
"... We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S ⊆ {0, 1} d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the l ..."
Abstract

Cited by 19 (6 self)
 Add to MetaCart
We study sequential prediction problems in which, at each time instance, the forecaster chooses a binary vector from a certain fixed set S ⊆ {0, 1} d and suffers a loss that is the sum of the losses of those vector components that equal to one. The goal of the forecaster is to achieve that, in the long run, the accumulated loss is not much larger than that of the best possible vector in the class. We consider the “bandit ” setting in which the forecaster has only access to the losses of the chosen vectors. We introduce a new general forecaster achieving a regret bound that, for a variety of concrete choices of S, is of order √ nd ln S  where n is the time horizon. This is not improvable in general and is better than previously known bounds. We also point out that computationally efficient implementations for various interesting choices of S exist. 1
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
 IEEE TRANSACTIONS ON AUTOMATIC CONTROL
, 2010
"... The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multiagen ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multiagent coordination, estimation in sensor networks, and largescale machine learning. We develop and analyze distributed algorithms based on dual subgradient averaging, and we provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis allows us to clearly separate the convergence of the optimization algorithm itself and the effects of communication dependent on the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network and confirm this prediction’s sharpness both by theoretical lower bounds and simulations for various networks. Our approach includes the cases of deterministic optimization and communication as well as problems with stochastic optimization and/or communication.
A survey: The convex optimization approach to regret minimization
, 2009
"... A well studied and general setting for prediction and decision making is regret minimization in games. Originating independently in several disciplines, algorithms for regret minimization have proven to be empirically successful for a wide range of applications. Recently the design of algorithms for ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
A well studied and general setting for prediction and decision making is regret minimization in games. Originating independently in several disciplines, algorithms for regret minimization have proven to be empirically successful for a wide range of applications. Recently the design of algorithms for regret minimization in a wide array of settings has been influenced by tools from convex optimization. In this survey we describe two general methods for deriving algorithms and analyzing them, with a “convex optimization flavor”. The methods we describe are general enough to capture most existing algorithms with single, simple and generic analysis, and lie at the heart of several recent advancements in prediction theory.
The online shortest path problem under partial monitoring
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2007
"... The online shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the ..."
Abstract

Cited by 18 (5 self)
 Add to MetaCart
The online shortest path problem is considered under partial monitoring scenarios. At each round, a decision maker has to choose a path between two distinguished vertices of a weighted directed acyclic graph whose edge weights can change in an arbitrary (adversarial) way such that the loss of the chosen path (defined as the sum of the weights of its composing edges) be small. In the multiarmed bandit setting, after choosing a path, the decision maker learns only the weights of those edges that belong to the chosen path. For this scenario, an algorithm is given whose average cumulative loss in n rounds exceeds that of the best path, matched offline to the entire sequence of the edge weights, by a quantity that is proportional to 1 / √n and depends only polynomially on the number of edges of the graph. The algorithm can be implemented with linear complexity in the number of rounds n and in the number of edges. This result improves earlier banditalgorithms which have performance bounds that either depend exponentially on the number of edges or converge to zero at a slower rate than O(1 / √n). An extension to the socalled label efficient setting is also given, where the decision maker is informed about the weight of the chosen path only with probability ɛ < 1. Applications to routing in packet switched networks along with simulation results are also presented.
Extracting certainty from uncertainty: Regret bounded by variation in costs
 In COLT
, 2008
"... Prediction from expert advice is a fundamental problem in machine learning. A major pillar of the field is the existence of learning algorithms whose average loss approaches that of the best expert in hindsight (in other words, whose average regret approaches zero). Traditionally the regret of onlin ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
Prediction from expert advice is a fundamental problem in machine learning. A major pillar of the field is the existence of learning algorithms whose average loss approaches that of the best expert in hindsight (in other words, whose average regret approaches zero). Traditionally the regret of online algorithms was bounded in terms of the number of prediction rounds. CesaBianchi, Mansour and Stoltz [4] posed the question whether it is be possible to bound the regret of an online algorithm by the variation of the observed costs. In this paper we resolve this question, and prove such bounds in the fully adversarial setting, in two important online learning scenarios: prediction from expert advice, and online linear optimization. 1