• Documents
  • Authors
  • Tables

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 204
Next 10 →

Multi-Armed Recommendation Bandits for Selecting State Machine Policies for Robotic Systems

by Pyry Matikainen, P. Michael Furlong, Rahul Sukthankar, Martial Hebert
"... Abstract — We investigate the problem of selecting a statemachine from a library to control a robot. We are particularly interested in this problem when evaluating such state machines on a particular robotics task is expensive. As a motivating example, we consider a problem where a simulated vacuumi ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
vacuuming robot must select a driving state machine well-suited for a particular (unknown) room layout. By borrowing concepts from collaborative filtering (recommender systems such as Netflix and Amazon.com), we present a multi-armed bandit formulation that incorporates recommendation techniques

Pure exploration in multi-armed bandits problems

by Sébastien Bubeck, Rémi Munos , Gilles Stoltz - IN PROCEEDINGS OF THE TWENTIETH INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY (ALT 2009 , 2009
"... We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that explore sequentially the arms. The strategies are assessed not in terms of their cumulative regrets, as is usually the case, but through quantities referred to as simpl ..."
Abstract - Cited by 80 (13 self) - Add to MetaCart
to as simple regrets. The latter are related to the (expected) gains of the decisions that the strategies would recommend for a new one-shot instance of the same multi-armed bandit problem. Here, exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast

Multi-armed bandits with limited exploration

by Sudipto Guha, Kamesh Munagala - In Proceedings of the Annual Symposium on Theory of Computing (STOC , 2007
"... A central problem to decision making under uncertainty is the trade-off between exploration and exploitation: between learning from and adapting to a stochastic system and exploiting the current best-knowledge about the system. A fundamental decision-theoretic model that captures this trade-off is t ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
-off is the celebrated stochastic Multi-arm Bandit Problem. In this paper, we consider scenarios where the exploration phase corresponds to designing experiments, and the exploration phase has the following restrictions: (1) it must necessarily precede the exploitation phase; (2) it is expensive in terms of some

Using Multi-armed Bandit to Solve Cold-start Problems in Recommender Systems at Telco

by Hai Thanh Nguyen, Anders Kofod-petersen
"... Abstract. Recommending best-fit rate-plans for new users is a chal-lenge for the Telco industry. Rate-plans differ from most traditional products in the way that a user normally only have one product at any given time. This, combined with no background knowledge on new users hinders traditional reco ..."
Abstract - Add to MetaCart
recommender systems. Many Telcos today use either trivial approaches, such as picking a random plan or the most common plan in use. The work presented here shows that these methods perform poorly. We propose a new approach based on the multi-armed bandit algorithms to automatically recommend rate

Decentralized Multi-Armed Bandit with Multiple Distributed Players

by Keqin Liu, Qing Zhao
"... Abstract—We formulate and study a decentralized multiarmed bandit (MAB) problem, where M distributed players compete for N independent arms with unknown reward statistics. At each time, each player chooses one arm to play without exchanging information with other players. Players choosing the same a ..."
Abstract - Cited by 18 (2 self) - Add to MetaCart
Abstract—We formulate and study a decentralized multiarmed bandit (MAB) problem, where M distributed players compete for N independent arms with unknown reward statistics. At each time, each player chooses one arm to play without exchanging information with other players. Players choosing the same

Tighter Bounds for Multi-Armed Bandits with Expert Advice

by H. Brendan Mcmahan, Matthew Streeter
"... Bandit problems are a classic way of formulating exploration versus exploitation tradeoffs. Auer et al. [ACBFS02] introduced the EXP4 algorithm, which explicitly decouples the set of A actions which can be taken in the world from the set of M experts (general strategies for selecting actions) with w ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
. In this paper we introduce a new algorithm, similar in spirit to EXP4, which has a bound of O ( √ T S log M). The S parameter measures the extent to which expert recommendations agree; we always have S ≤ min {A, M}. We discuss practical applications that arise in the contextual bandits

Decentralized Multi-Armed Bandit with Imperfect Observations

by Keqin Liu, Qing Zhao, Bhaskar Krishnamachari
"... Abstract — We consider decentralized multi-armed bandit problems with multiple distributed players. At each time, each player chooses one of the N independent arms with unknown reward statistics to play. Players do not exchange information regarding their observations or actions. A collision occurs ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
Abstract — We consider decentralized multi-armed bandit problems with multiple distributed players. At each time, each player chooses one of the N independent arms with unknown reward statistics to play. Players do not exchange information regarding their observations or actions. A collision occurs

Optimizing Adaptive Marketing Experiments with the Multi-Armed Bandit

by Eric Michael Schwartz
"... Sequential decision making is central to a range of marketing problems. Both firms and consumers aim to maximize their objectives over time, yet they remain uncertain about the best course of action. So they allocate resources to both explore to reduce uncertainty (learning) and exploit their curren ..."
Abstract - Add to MetaCart
their current information for immediate reward (earning). This explore/exploit tradeoff is best captured by the multi-armed bandit, the conceptual and methodological backbone of this dissertation. We focus on this class of marketing problems and aim to make the following substantive and methodological

Online Optimization of Teaching Sequences with Multi-Armed Bandits

by Benjamin Clement, Didier Roy, Pierre-yves Oudeyer, Manuel Lopes, Inria Bordeaux Sud-ouest
"... We present an approach to Intelligent Tutoring Systems which adaptively personalizes sequences of learning activi-ties to maximize skills acquired by each student, taking into account limited time and motivational resources. At a given point in time, the system tries to propose to the student the ac ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
on the combination of three approaches. First, it leverages recent models of intrinsically motivated learning by transposing them to active teaching, relying on empirical estimation of learning progress provided by spe-cific activities to particular students. Second, it uses state-of-the-art Multi-Arm Bandit (MAB

On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards

by Yi Gai, Bhaskar Krishnamachari, Mingyan Liu , 2011
"... Abstract—We consider a combinatorial generalization of the classical multi-armed bandit problem that is defined as follows. There is a given bipartite graph of M users and N ≥ M resources. For each user-resource pair (i, j), there is an associated state that evolves as an aperiodic irreducible finit ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
Abstract—We consider a combinatorial generalization of the classical multi-armed bandit problem that is defined as follows. There is a given bipartite graph of M users and N ≥ M resources. For each user-resource pair (i, j), there is an associated state that evolves as an aperiodic irreducible
Next 10 →
Results 1 - 10 of 204
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University