Results

**11 - 18**of**18**### ABSTRACT Title of dissertation: STATISTICAL AND OPTIMAL LEARNING WITH APPLICATIONS IN BUSINESS ANALYTICS

"... Statistical learning is widely used in business analytics to discover structure or exploit patterns from historical data, and build models that capture relationships between an outcome of interest and a set of variables. Optimal learning on the other hand, solves the operational side of the problem, ..."

Abstract
- Add to MetaCart

(Show Context)
Statistical learning is widely used in business analytics to discover structure or exploit patterns from historical data, and build models that capture relationships between an outcome of interest and a set of variables. Optimal learning on the other hand, solves the operational side of the problem, by iterating between decision making and data acquisition/learning. All too often the two problems go hand-in-hand, which exhibit a feedback loop between statistics and optimization. We apply this statistical/optimal learning concept on a context of fundraising marketing campaign problem arising in many non-profit organizations. Many such organizations use direct-mail marketing to cultivate one-time donors and convert them into recurring contributors. Cultivated donors generate much more revenue than new donors, but also lapse with time, making it important to steadily draw in new cultivations. The direct-mail budget is limited, but better-designed mailings can improve success rates without increasing costs. We first apply statistical learning to analyze the effectiveness of several design approaches used in practice, based on a massive dataset covering 8.6 million direct-

### Cascading Bandits for Large-Scale Recommendation Problems

"... Abstract Most recommender systems recommend a list of items. The user examines the list, from the first item to the last, and often chooses the first attractive item and does not examine the rest. This type of user behavior can be modeled by the cascade model. In this work, we study cascading bandi ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract Most recommender systems recommend a list of items. The user examines the list, from the first item to the last, and often chooses the first attractive item and does not examine the rest. This type of user behavior can be modeled by the cascade model. In this work, we study cascading bandits, an online learning variant of the cascade model where the goal is to recommend K most attractive items from a large set of L candidate items. We propose two algorithms for solving this problem, which are based on the idea of linear generalization. The key idea in our solutions is that we learn a predictor of the attraction probabilities of items from their features, as opposing to learning the attraction probability of each item independently as in the existing work. This results in practical learning algorithms whose regret does not depend on the number of items L. We bound the regret of one algorithm and comprehensively evaluate the other on a range of recommendation problems. The algorithm performs well and outperforms all baselines.

### On the convergence rates of expected improvement methods

, 2014

"... We consider a ranking and selection problem with independent normal observations, and analyze the asymptotic sampling rates of expected improvement (EI) methods in this setting. Such methods often perform well in practice, but a tractable analysis of their convergence rates is difficult due to the n ..."

Abstract
- Add to MetaCart

We consider a ranking and selection problem with independent normal observations, and analyze the asymptotic sampling rates of expected improvement (EI) methods in this setting. Such methods often perform well in practice, but a tractable analysis of their convergence rates is difficult due to the nonlinearity and nonconvexity of the functions used in the EI calcula-tions. We present new results indicating that variants of EI produce simulation allocations that are essentially identical to those chosen by the optimal computing budget allocation (OCBA) methodology, which is known to achieve the optimal asymptotic rate of convergence. This is the first general equivalence result between EI and OCBA, and provides insight into the structure and behaviour of EI.

### SequeL team

"... Thompson Sampling (TS) has surged a lot of interest due to its good empirical performance, in particular in the compu-tational advertising. Though successful, the tools for its per-formance analysis appeared only recently. In this paper, we describe and analyze SpectralTS algorithm for a bandit prob ..."

Abstract
- Add to MetaCart

Thompson Sampling (TS) has surged a lot of interest due to its good empirical performance, in particular in the compu-tational advertising. Though successful, the tools for its per-formance analysis appeared only recently. In this paper, we describe and analyze SpectralTS algorithm for a bandit prob-lem, where the payoffs of the choices are smooth given an underlying graph. In this setting, each choice is a node of a graph and the expected payoffs of the neighboring nodes are assumed to be similar. Although the setting has application both in recommender systems and advertising, the traditional algorithms would scale poorly with the number of choices. For that purpose we consider an effective dimension d, which is small in real-world graphs. We deliver the analysis show-ing that the regret of SpectralTS scales as d T lnN with high probability, where T is the time horizon and N is the number of choices. Since a d T lnN regret is comparable to the known results, SpectralTS offers a computationally more efficient alternative. We also show that our algorithm is com-petitive on both synthetic and real-world data. 1

### (More) Efficient Reinforcement Learning via Posterior Sampling

"... Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration: posterior sam-pling for reinforcement learning (PSRL). This algorithm proceeds in re-peated e ..."

Abstract
- Add to MetaCart

Most provably-efficient reinforcement learning algorithms introduce opti-mism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration: posterior sam-pling for reinforcement learning (PSRL). This algorithm proceeds in re-peated episodes of known duration. At the start of each episode, PSRL updates a prior distribution over Markov decision processes and takes one sample from this posterior. PSRL then follows the policy that is optimal for this sample during the episode. The algorithm is conceptually simple, computationally efficient and allows an agent to encode prior knowledge in a natural way. We establish an Õ(τS AT) bound on expected regret, where T is time, τ is the episode length and S and A are the cardinali-ties of the state and action spaces. This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm. We show through simulation that PSRL significantly outperforms existing algorithms with similar regret bounds. 1