Results 1 - 10
of
204
Multi-Armed Recommendation Bandits for Selecting State Machine Policies for Robotic Systems
"... Abstract — We investigate the problem of selecting a statemachine from a library to control a robot. We are particularly interested in this problem when evaluating such state machines on a particular robotics task is expensive. As a motivating example, we consider a problem where a simulated vacuumi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
vacuuming robot must select a driving state machine well-suited for a particular (unknown) room layout. By borrowing concepts from collaborative filtering (recommender systems such as Netflix and Amazon.com), we present a multi-armed bandit formulation that incorporates recommendation techniques
Pure exploration in multi-armed bandits problems
- IN PROCEEDINGS OF THE TWENTIETH INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY (ALT 2009
, 2009
"... We consider the framework of stochastic multi-armed bandit problems and study the possibilities and limitations of strategies that explore sequentially the arms. The strategies are assessed not in terms of their cumulative regrets, as is usually the case, but through quantities referred to as simpl ..."
Abstract
-
Cited by 80 (13 self)
- Add to MetaCart
to as simple regrets. The latter are related to the (expected) gains of the decisions that the strategies would recommend for a new one-shot instance of the same multi-armed bandit problem. Here, exploration is only constrained by the number of available rounds (not necessarily known in advance), in contrast
Multi-armed bandits with limited exploration
- In Proceedings of the Annual Symposium on Theory of Computing (STOC
, 2007
"... A central problem to decision making under uncertainty is the trade-off between exploration and exploitation: between learning from and adapting to a stochastic system and exploiting the current best-knowledge about the system. A fundamental decision-theoretic model that captures this trade-off is t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
-off is the celebrated stochastic Multi-arm Bandit Problem. In this paper, we consider scenarios where the exploration phase corresponds to designing experiments, and the exploration phase has the following restrictions: (1) it must necessarily precede the exploitation phase; (2) it is expensive in terms of some
Using Multi-armed Bandit to Solve Cold-start Problems in Recommender Systems at Telco
"... Abstract. Recommending best-fit rate-plans for new users is a chal-lenge for the Telco industry. Rate-plans differ from most traditional products in the way that a user normally only have one product at any given time. This, combined with no background knowledge on new users hinders traditional reco ..."
Abstract
- Add to MetaCart
recommender systems. Many Telcos today use either trivial approaches, such as picking a random plan or the most common plan in use. The work presented here shows that these methods perform poorly. We propose a new approach based on the multi-armed bandit algorithms to automatically recommend rate
Decentralized Multi-Armed Bandit with Multiple Distributed Players
"... Abstract—We formulate and study a decentralized multiarmed bandit (MAB) problem, where M distributed players compete for N independent arms with unknown reward statistics. At each time, each player chooses one arm to play without exchanging information with other players. Players choosing the same a ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Abstract—We formulate and study a decentralized multiarmed bandit (MAB) problem, where M distributed players compete for N independent arms with unknown reward statistics. At each time, each player chooses one arm to play without exchanging information with other players. Players choosing the same
Tighter Bounds for Multi-Armed Bandits with Expert Advice
"... Bandit problems are a classic way of formulating exploration versus exploitation tradeoffs. Auer et al. [ACBFS02] introduced the EXP4 algorithm, which explicitly decouples the set of A actions which can be taken in the world from the set of M experts (general strategies for selecting actions) with w ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
. In this paper we introduce a new algorithm, similar in spirit to EXP4, which has a bound of O ( √ T S log M). The S parameter measures the extent to which expert recommendations agree; we always have S ≤ min {A, M}. We discuss practical applications that arise in the contextual bandits
Decentralized Multi-Armed Bandit with Imperfect Observations
"... Abstract — We consider decentralized multi-armed bandit problems with multiple distributed players. At each time, each player chooses one of the N independent arms with unknown reward statistics to play. Players do not exchange information regarding their observations or actions. A collision occurs ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Abstract — We consider decentralized multi-armed bandit problems with multiple distributed players. At each time, each player chooses one of the N independent arms with unknown reward statistics to play. Players do not exchange information regarding their observations or actions. A collision occurs
Optimizing Adaptive Marketing Experiments with the Multi-Armed Bandit
"... Sequential decision making is central to a range of marketing problems. Both firms and consumers aim to maximize their objectives over time, yet they remain uncertain about the best course of action. So they allocate resources to both explore to reduce uncertainty (learning) and exploit their curren ..."
Abstract
- Add to MetaCart
their current information for immediate reward (earning). This explore/exploit tradeoff is best captured by the multi-armed bandit, the conceptual and methodological backbone of this dissertation. We focus on this class of marketing problems and aim to make the following substantive and methodological
Online Optimization of Teaching Sequences with Multi-Armed Bandits
"... We present an approach to Intelligent Tutoring Systems which adaptively personalizes sequences of learning activi-ties to maximize skills acquired by each student, taking into account limited time and motivational resources. At a given point in time, the system tries to propose to the student the ac ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
on the combination of three approaches. First, it leverages recent models of intrinsically motivated learning by transposing them to active teaching, relying on empirical estimation of learning progress provided by spe-cific activities to particular students. Second, it uses state-of-the-art Multi-Arm Bandit (MAB
On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards
, 2011
"... Abstract—We consider a combinatorial generalization of the classical multi-armed bandit problem that is defined as follows. There is a given bipartite graph of M users and N ≥ M resources. For each user-resource pair (i, j), there is an associated state that evolves as an aperiodic irreducible finit ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract—We consider a combinatorial generalization of the classical multi-armed bandit problem that is defined as follows. There is a given bipartite graph of M users and N ≥ M resources. For each user-resource pair (i, j), there is an associated state that evolves as an aperiodic irreducible
Results 1 - 10
of
204