Results 1 
5 of
5
Cascading bandits: Learning to rank in the cascade model
 In Proceedings of the 32nd International Conference on Machine Learning
"... A search engine usually outputs a list of K web pages. The user examines this list, from the first web page to the last, and chooses the first attractive page. This model of user behavior is known as the cascade model. In this paper, we propose cascading bandits, a learning variant of the cascade ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
A search engine usually outputs a list of K web pages. The user examines this list, from the first web page to the last, and chooses the first attractive page. This model of user behavior is known as the cascade model. In this paper, we propose cascading bandits, a learning variant of the cascade model where the objective is to identify K most attractive items. We formulate our problem as a stochastic combinatorial partial monitoring problem. We propose two algorithms for solving it, CascadeUCB1 and CascadeKLUCB. We also prove gapdependent upper bounds on the regret of these algorithms and derive a lower bound on the regret in cascading bandits. The lower bound matches the upper bound of CascadeKLUCB up to a logarithmic factor. We experiment with our algorithms on several problems. The algorithms perform surprisingly well even when our modeling assumptions are violated. 1.
Robust Influence Maximization
"... ABSTRACT In this paper, we address the important issue of uncertainty in the edge influence probability estimates for the well studied influence maximization problem the task of finding k seed nodes in a social network to maximize the influence spread. We propose the problem of robust influence ma ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT In this paper, we address the important issue of uncertainty in the edge influence probability estimates for the well studied influence maximization problem the task of finding k seed nodes in a social network to maximize the influence spread. We propose the problem of robust influence maximization, which maximizes the worstcase ratio between the influence spread of the chosen seed set and the optimal seed set, given the uncertainty of the parameter input. We design an algorithm that solves this problem with a solutiondependent bound. We further study uniform sampling and adaptive sampling methods to effectively reduce the uncertainty on parameters and improve the robustness of the influence maximization task. Our empirical results show that parameter uncertainty may greatly affect influence maximization performance and prior studies that learned influence probabilities could lead to poor performance in robust influence maximization due to relatively large uncertainty in parameter estimates, and information cascade based adaptive sampling method may be an effective way to improve the robustness of influence maximization.
Contextual Combinatorial Cascading Bandits
"... Abstract We propose the contextual combinatorial cascading bandits, a combinatorial online learning game, where at each time step a learning agent is given a set of contextual information, then selects a list of items, and observes stochastic outcomes of a prefix in the selected items by some stopp ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We propose the contextual combinatorial cascading bandits, a combinatorial online learning game, where at each time step a learning agent is given a set of contextual information, then selects a list of items, and observes stochastic outcomes of a prefix in the selected items by some stopping criterion. In online recommendation, the stopping criterion might be the first item a user selects; in network routing, the stopping criterion might be the first edge blocked in a path. We consider position discounts in the list order, so that the agent's reward is discounted depending on the position where the stopping criterion is met. We design a UCBtype algorithm, C 3 UCB, for this problem, prove an nstep regret bound O( √ n) in the general setting, and give finer analysis for two special cases. Our work generalizes existing studies in several directions, including contextual information, position discounts, and a more general cascading bandit model. Experiments on synthetic and real datasets demonstrate the advantage of involving contextual information and position discounts.
Combinatorial MultiArmed Bandit with General Reward Functions
"... Abstract In this paper, we study the stochastic combinatorial multiarmed bandit (CMAB) framework that allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Our fra ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract In this paper, we study the stochastic combinatorial multiarmed bandit (CMAB) framework that allows a general nonlinear reward function, whose expected value may not depend only on the means of the input random variables but possibly on the entire distributions of these variables. Our framework enables a much larger class of reward functions such as the max() function and nonlinear utility functions. Existing techniques relying on accurate estimations of the means of random variables, such as the upper confidence bound (UCB) technique, do not work directly on these functions. We propose a new algorithm called stochastically dominant confidence bound (SDCB), which estimates the distributions of underlying random variables and their stochastically dominant confidence bounds. We prove that SDCB can achieve O(log T ) distributiondependent regret andÕ( √ T ) distributionindependent regret, where T is the time horizon. We apply our results to the KMAX problem and expected utility maximization problems. In particular, for KMAX, we provide the first polynomialtime approximation scheme (PTAS) for its offline problem, and give the firstÕ( √ T ) bound on the (1− )approximation regret of its online problem, for any > 0.
Influence Maximization with Bandits
"... We consider the problem of influence maximization in networks, maximizing the number of people that become aware of a product by finding the ‘best ’ set of ‘seed ’ users to expose the product to. Most prior work on this topic assumes that we know the probability of each user influencing each other u ..."
Abstract
 Add to MetaCart
We consider the problem of influence maximization in networks, maximizing the number of people that become aware of a product by finding the ‘best ’ set of ‘seed ’ users to expose the product to. Most prior work on this topic assumes that we know the probability of each user influencing each other user, or we have data that lets us estimate these influences. However, this information is typically not available or is difficult to obtain. To avoid this assumption, we adopt a combinatorial multiarmed bandit paradigm that estimates the influence probabilities as we sequentially try different seed sets. We establish bounds on the performance of this procedure under the existing edgelevel feedback mechanism as well as a novel and more realistic nodelevel feedback mechanism. Beyond our theoretical results, we describe a practical implementation and experimentally demonstrate its efficiency and effectiveness on four real datasets. 1