Results 11  20
of
56
Unimodal bandits: Regret lower bounds and optimal algorithms.
, 2014
"... Abstract We consider stochastic multiarmed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Abstract We consider stochastic multiarmed bandits where the expected reward is a unimodal function over partially ordered arms. This important class of problems has been recently investigated in
Counterfactual Reasoning and Learning Systems
, 2013
"... This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select the changes that would have improved the syste ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select the changes that would have improved the system performance. This work is illustrated by experiments carried out on the ad placement system associated with the Bing search engine.
Bandits, Query Learning, and the Haystack Dimension
"... Motivated by multiarmed bandits (MAB) problems with a very large or even infinite number of arms, we consider the problem of finding a maximum of an unknown target function by querying the function at chosen inputs (or arms). We give an analysis of the query complexity of this problem, under the as ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Motivated by multiarmed bandits (MAB) problems with a very large or even infinite number of arms, we consider the problem of finding a maximum of an unknown target function by querying the function at chosen inputs (or arms). We give an analysis of the query complexity of this problem, under the assumption that the payoff of each arm is given by a function belonging to a known, finite, but otherwise arbitrary function class. Our analysis centers on a new notion of function class complexity that we call the haystack dimension, which is used to prove the approximate optimality of a simple greedy algorithm. This algorithm is then used as a subroutine in a functional MAB algorithm, yielding provably nearoptimal regret. We provide a generalization to the infinite cardinality setting, and comment on how our analysis is connected to, and improves upon, existing results for query learning and generalized binary search. 1
Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated PrincipalAgent Problems
, 2014
"... Crowdsourcing markets have emerged as a popular platform for matching available workers with tasks to complete. The payment for a particular task is typically set by the task's requester, and may be adjusted based on the quality of the completed work, for example, through the use of "bonus ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
Crowdsourcing markets have emerged as a popular platform for matching available workers with tasks to complete. The payment for a particular task is typically set by the task's requester, and may be adjusted based on the quality of the completed work, for example, through the use of "bonus" payments. In this paper, we study the requester's problem of dynamically adjusting qualitycontingent payments for tasks. We consider a multiround version of the wellknown principalagent model, whereby in each round a worker makes a strategic choice of the effort level which is not directly observable by the requester. In particular, our formulation significantly generalizes the budgetfree online task pricing problems studied in prior work. We treat this problem as a multiarmed bandit problem, with each "arm" representing a potential contract. To cope with the large (and in fact, infinite) number of arms, we propose a new algorithm, AgnosticZooming, which discretizes the contract space into a finite number of regions, effectively treating each region as a single arm. This discretization is adaptively rened, so that more promising regions of the contract space are eventually discretized more finely. We analyze this algorithm, showing that it achieves regret sublinear in the time horizon and
The best of both worlds: Stochastic and adversarial bandits.
 In COLT,
, 2012
"... Abstract We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the O( √ n) worstcase regret of Exp3 ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically, SAO combines the O( √ n) worstcase regret of Exp3
Ranked Bandits in Metric Spaces: Learning Diverse Rankings over Large Document Collections
, 2013
"... Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a lea ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a learningtorank formulation that optimizes the fraction of satisfied users, with several scalable algorithms that explicitly takes document similarity and ranking context into account. Our formulation is a nontrivial common generalization of two multiarmed bandit models from the literature: ranked bandits (Radlinski et al., 2008) and Lipschitz bandits (Kleinberg et al., 2008b). We present theoretical justifications for this approach, as well as a nearoptimal algorithm. Our evaluation adds optimizations that improve empirical performance, and shows that our algorithms learn orders of magnitude more quickly than previous approaches.
Online Decision Making in Crowdsourcing Markets: Theoretical Challenges
, 2013
"... Over the past decade, crowdsourcing has emerged as a cheap and efficient method of obtaining solutions to simple tasks that are difficult for computers to solve but possible for humans. The popularity and promise of crowdsourcing markets has led to both empirical and theoretical research on the desi ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Over the past decade, crowdsourcing has emerged as a cheap and efficient method of obtaining solutions to simple tasks that are difficult for computers to solve but possible for humans. The popularity and promise of crowdsourcing markets has led to both empirical and theoretical research on the design of algorithms to optimize various aspects of these markets, such as the pricing and assignment of tasks. Much of the existing theoretical work on crowdsourcing markets has focused on problems that fall into the broad category of online decision making; task requesters or the crowdsourcing platform itself make repeated decisions about prices to set, workers to filter out, problems to assign to specific workers, or other things. Often these decisions are complex, requiring algorithms that learn about the distribution of available tasks or workers over time and take into account the strategic (or sometimes irrational) behavior of workers. As human computation grows into its own field, the time is ripe to address these challenges in a principled way. However, it appears very difficult to capture all pertinent aspects of crowdsourcing markets in a single coherent model. In this paper, we reflect on the modeling issues that inhibit theoretical research on online decision making for crowdsourcing, and identify some steps forward. This paper grew out of the authors’ own frustration with these issues, and we hope it will encourage the community to attempt to understand, debate, and ultimately address them.
LargeScale Bandit Problems and KWIK Learning
"... We show that parametric multiarmed bandit (MAB) problems with large state and action spaces can be algorithmically reduced to the supervised learning model known as “Knows What It Knows ” or KWIK learning. We give matching impossibility results showing that the KWIKlearnability requirement cannot b ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We show that parametric multiarmed bandit (MAB) problems with large state and action spaces can be algorithmically reduced to the supervised learning model known as “Knows What It Knows ” or KWIK learning. We give matching impossibility results showing that the KWIKlearnability requirement cannot be replaced by weaker supervised learning assumptions. We provide such results in both the standard parametric MAB setting, as well as for a new model in which the action space is finite but growing with time. 1.
C.: Hierarchical exploration for accelerating contextual bandits. In: ICML. (2012) ACKNOWLEDGMENTS This work was partially funded by the Army Research Laboratory under Cooperative Agreement #W911NF1020061. The views and conclusions are those of the auth
"... Contextual bandit learning is an increasingly popular approach to optimizing recommender systems via user feedback, but can be slow to converge in practice due to the need for exploring a large feature space. In this paper, we propose a coarsetofine hierarchical approach for encoding prior knowled ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Contextual bandit learning is an increasingly popular approach to optimizing recommender systems via user feedback, but can be slow to converge in practice due to the need for exploring a large feature space. In this paper, we propose a coarsetofine hierarchical approach for encoding prior knowledge that drastically reduces the amount of exploration required. Intuitively, user preferences can be reasonably embedded in a coarse lowdimensional feature space that can be explored efficiently, requiring exploration in the highdimensional space only as necessary. We introduce a bandit algorithm that explores within this coarsetofine spectrum, and prove performance guarantees that depend on how well the coarse space captures the user’s preferences. We demonstrate substantial improvement over conventional bandit algorithms through extensive simulation as well as a live user study in the setting of personalized news recommendation. 1.
Mining the Situation: Spatiotemporal Traffic Prediction With Big Data
"... Abstract—With the vast availability of traffic sensors fromwhich traffic information can be derived, a lot of research effort has been devoted to developing traffic prediction techniques, which in turn improve route navigation, traffic regulation, urban area planning, etc. One key challenge in traff ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract—With the vast availability of traffic sensors fromwhich traffic information can be derived, a lot of research effort has been devoted to developing traffic prediction techniques, which in turn improve route navigation, traffic regulation, urban area planning, etc. One key challenge in traffic prediction is how much to rely on prediction models that are constructed using historical data in realtime traffic situations, which may differ from that of the historical data and change over time. In this paper, we propose a novel online framework that could learn from the current traffic situation (or context) in realtime and predict the future traffic by matching the current situation to the most effective prediction model trained using historical data. As realtime traffic arrives, the traffic context space is adaptively partitioned in order to efficiently estimate the effectiveness of each base predictor in different situations. We obtain and prove both shortterm and longterm performance guarantees (bounds) for our online algorithm. The proposed algorithm also works effectively in scenarios where the true labels (i.e., realized traffic) are missing or become available with delay. Using the proposed framework, the context dimension that is the most relevant to traffic prediction can also be revealed, which can further reduce the implementation complexity as well as inform traffic policy making. Our experiments with realworld data in reallife conditions show that the proposed approach significantly outperforms existing solutions. Index Terms—Traffic prediction, big data, spatiotemporal, contextaware, online learning. I.