## Active Learning for Personalizing Treatment

### Cached

### Download Links

Citations: | 1 - 0 self |

### BibTeX

@MISC{Deng_activelearning,

author = {Kun Deng and Joelle Pineau and Susan Murphy},

title = {Active Learning for Personalizing Treatment},

year = {}

}

### OpenURL

### Abstract

Abstract—The personalization of treatment via genetic biomarkers and other risk categories has drawn increasing interest among clinical researchers and scientists. A major challenge here is to construct individualized treatment rules (ITR), which recommend the best treatment for each of the different categories of individuals. In general, ITRs can be constructed using data from clinical trials, however these are generally very costly to run. In order to reduce the cost of learning an ITR, we explore active learning techniques designed to carefully decide whom to recruit, and which treatment to assign, throughout the online conduct of the clinical trial. As an initial investigation, we focus on simple ITRs that utilize a small number of subpopulation categories to personalize treatment. To minimize the maximal uncertainty regarding the treatment effects for each subpopulation, we propose the use of a minimax bandit model and provide an active learning policy for solving it. We evaluate our active learning policy using simulated data and data modeled after a clinical trial involving treatments for depressed individuals. We contrast this policy with other plausible active learning policies. The techniques presented in the paper may be generalized to tackle problems of efficient exploration in other domains. I.

### Citations

416 | Improving generalization with active learning
- Cohn, Atlas, et al.
- 1994
(Show Context)
Citation Context ...acquisition, by parsimoniously querying some unknown aspects of existing data, or collecting new data based on the examples that the learning algorithm has seen so far. Traditionally, active learning =-=[10,12]-=-–[14,16,20,22]. has been studied in the context of supervised machine learning and classification problems. Active learning also has a long history in the statistics literature, which is generally ref... |

334 | Selective sampling using the query by committee algorithm
- Freund, Seung, et al.
- 1997
(Show Context)
Citation Context ...acquisition, by parsimoniously querying some unknown aspects of existing data, or collecting new data based on the examples that the learning algorithm has seen so far. Traditionally, active learning =-=[10,12]-=-–[14,16,20,22]. has been studied in the context of supervised machine learning and classification problems. Active learning also has a long history in the statistics literature, which is generally ref... |

334 | Support vector machine active learning for image retrieval - Tong, Chang - 2001 |

236 | Near-Optimal Reinforcement Learning in Polynominal Time - Kearns, Singh - 1998 |

236 | R-m.ax - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning - Brafman, Tennenholtz - 2003 |

172 | Bayesian experimental design: a review
- CHALONER, VERDINELLI
- 1995
(Show Context)
Citation Context ...xt of supervised machine learning and classification problems. Active learning also has a long history in the statistics literature, which is generally referred to as optimal experimental design; see =-=[11,23]-=-–[25] for some recent work and review. The main objective of this line of research is to reduce the variance of prediction over parameter estimates, while controlling the bias of the prediction at the... |

132 | Active learning literature survey
- Settles
- 2009
(Show Context)
Citation Context ...ion, by parsimoniously querying some unknown aspects of existing data, or collecting new data based on the examples that the learning algorithm has seen so far. Traditionally, active learning [10,12]–=-=[14,16,20,22]-=-. has been studied in the context of supervised machine learning and classification problems. Active learning also has a long history in the statistics literature, which is generally referred to as op... |

121 | Efficient exploration in reinforcement learning
- Thrun
- 1992
(Show Context)
Citation Context ... response as variables in a Bayesian network, our problem could be be framed as such a parameter estimation problem. Finally, there are a number of RL approaches that also useactive learning such as =-=[1]-=-–[5]; the main difference is that these algorithms deal with infinite horizon and cumulative reward functions, and mostly focus on exploring either state or action, whereas we are interested in balanc... |

88 | Online choice of active learning algorithms
- Baram, El-Yaniv, et al.
(Show Context)
Citation Context ...ion, by parsimoniously querying some unknown aspects of existing data, or collecting new data based on the examples that the learning algorithm has seen so far. Traditionally, active learning [10,12]–=-=[14,16,20,22]-=-. has been studied in the context of supervised machine learning and classification problems. Active learning also has a long history in the statistics literature, which is generally referred to as op... |

74 | A bayesian framework for reinforcement learning - Strens - 2000 |

55 | Active Learning for Parameter estimation in Bayesian Networks
- Tong, Koller
- 2000
(Show Context)
Citation Context ...the maximal variance of the estimated treatment effects. The possibility of active learning also arises in other domains, such as in the unsupervised learning task of density estimation. For example, =-=[15]-=- presented a framework for actively learning parameters in Bayesian networks. It is assumed in this framework that some subset of the variables are controllable, so a query has the form of these varia... |

34 | Active learning via transductive experimental design
- Yu, Bi, et al.
- 2006
(Show Context)
Citation Context ...pervised machine learning and classification problems. Active learning also has a long history in the statistics literature, which is generally referred to as optimal experimental design; see [11,23]–=-=[25]-=- for some recent work and review. The main objective of this line of research is to reduce the variance of prediction over parameter estimates, while controlling the bias of the prediction at the same... |

32 | Pure exploration in multi-armed bandits problems
- Bubeck, Munos, et al.
- 2009
(Show Context)
Citation Context ...f the resulting ITRs, which is a highly nonlinear reward. Problems with similar interests in only the “end results” as ours have been studied under the names of “budgeted” multi-armed bandit problems =-=[9,17,21]-=-, where one tries to optimize a goal function, say picking a arm of a slot machine with maximal payoff, designing a classifier with minimal prediction loss, estimating quantities with minimal variance... |

32 | Optimal dynamic treatment regimes - Murphy - 2003 |

30 | R-MAX—a general polynomial time algorithm for near-optimal reinforcement learning - Brafman, Tennenholtz - 2002 |

21 | All of Statistics. A Concise Course - Wasserman - 2004 |

18 | Active learning with feedback on features and instances - Raghavan, Madani, et al. - 2006 |

17 | Adaptive treatment of epilepsy via batch-mode reinforcement learning. AAAI (pp. 1671–1678 - Guez, Vincent, et al. - 2008 |

9 |
Software testing by active learning for commercial games
- Xiao, Southey, et al.
- 2005
(Show Context)
Citation Context ...ion, by parsimoniously querying some unknown aspects of existing data, or collecting new data based on the examples that the learning algorithm has seen so far. Traditionally, active learning [10,12]–=-=[14,16,20,22]-=-. has been studied in the context of supervised machine learning and classification problems. Active learning also has a long history in the statistics literature, which is generally referred to as op... |

9 | Constrained optimization of experimental design
- Cook, Fedorov
- 1995
(Show Context)
Citation Context ...xt of supervised machine learning and classification problems. Active learning also has a long history in the statistics literature, which is generally referred to as optimal experimental design; see =-=[11,23]-=-–[25] for some recent work and review. The main objective of this line of research is to reduce the variance of prediction over parameter estimates, while controlling the bias of the prediction at the... |

8 | The budgeted multi-armed bandit problem
- Madani, Lizotte, et al.
- 2004
(Show Context)
Citation Context ...f the resulting ITRs, which is a highly nonlinear reward. Problems with similar interests in only the “end results” as ours have been studied under the names of “budgeted” multi-armed bandit problems =-=[9,17,21]-=-, where one tries to optimize a goal function, say picking a arm of a slot machine with maximal payoff, designing a classifier with minimal prediction loss, estimating quantities with minimal variance... |

8 | Active learning for logistic regression - Schein - 2005 |

6 |
Bayesian adaptive design for targeted therapy development in lung cancer - a step toward personalized medicine. Clinical Trials, 5:181–193, 2008. 82 Chunyan Cai was born in Feidong
- Zhou, Liu, et al.
- 1986
(Show Context)
Citation Context ... be viewed as a form of exploration in an online RL setting, with the horizon being N steps corresponding to N subjects. This formulation bears formal similarity with some of response adaptive trials =-=[6]-=-–[8], popular in cancer research, which also divide patients into groups. However our formal criterion (defined below) is different from these response adaptive trials. Response adaptive trials usuall... |

5 | Bayesian clinical trials at the - Biswas, Liu, et al. - 2009 |

5 | Reinforcement learning for active model selection - Kapoor, Greiner - 2005 |

5 | Bandit-based algorithms for budgeted learning
- Deng, Bourke, et al.
- 2007
(Show Context)
Citation Context |

5 | Developing adaptive treatment strategies in substance abuse research. Drug and alcohol dependence - Murphy, Lynch, et al. - 2007 |

4 | Active reinforcement learning
- Epshteyn, Vogel, et al.
- 2008
(Show Context)
Citation Context ...ponse as variables in a Bayesian network, our problem could be be framed as such a parameter estimation problem. Finally, there are a number of RL approaches that also useactive learning such as [1]–=-=[5]-=-; the main difference is that these algorithms deal with infinite horizon and cumulative reward functions, and mostly focus on exploring either state or action, whereas we are interested in balancing ... |

4 |
Bayesian adaptive randomization designs for targeted agent development
- Lee, Gu, et al.
(Show Context)
Citation Context ...ssign each subpopulation to one of the two competing treatment options, ignoring other characteristics/covariates of the subjects. This setting, though simple, is not uncommon in many clinical trials =-=[8]-=-. For instance, a subpopulation type may correspond to patients with a particular Gene biomarker, and the two treatment options are the standard and alternative treatment option, or the top two choice... |

3 | Active learning in multi-armed bandits
- Antos, Grover, et al.
- 2008
(Show Context)
Citation Context ...f the resulting ITRs, which is a highly nonlinear reward. Problems with similar interests in only the “end results” as ours have been studied under the names of “budgeted” multi-armed bandit problems =-=[9,17,21]-=-, where one tries to optimize a goal function, say picking a arm of a slot machine with maximal payoff, designing a classifier with minimal prediction loss, estimating quantities with minimal variance... |

3 |
et al., “A comparison of nefazodone, the cognitive behavioral-analysis system of psychotherapy, and their combination for the treatment of chronic depression
- Keller
- 2000
(Show Context)
Citation Context ...here is a large difference in variance between the estimated mean responses to each treatment. For dataset DS-CBASP, the mean and variances are taken from a real clinical trial for chronic depression =-=[32]-=-. In this case, the treatment variances of each subpopulations are very similar. For each dataset, we first plotted the loss of a policy as the number of recruited subjects increases (Figure 2). In th... |

2 |
Placebo-free designs for evaluating new mental health treatments: the use of adaptive strategies
- Dawson, Lavori
- 2004
(Show Context)
Citation Context ...be natural. 5. Finally, the problem that we discussed is a one-stage RL problem, which doesn’t involve state changes. In the literature, “dynamic treatment regimes” or “adaptive treatment strategies” =-=[28]-=-–[31] naturally generalize the idea of ITRs to multiple stages by constructing a sequence of decision rules, one for each disease stage. How to extend the ideas of this paper to such time-varying sett... |

1 |
Accounting for patient heterogeneity
- Wathen, Thall, et al.
- 2008
(Show Context)
Citation Context ...tural. 5. Finally, the problem that we discussed is a one-stage RL problem, which doesn’t involve state changes. In the literature, “dynamic treatment regimes” or “adaptive treatment strategies” [28]–=-=[31]-=- naturally generalize the idea of ITRs to multiple stages by constructing a sequence of decision rules, one for each disease stage. How to extend the ideas of this paper to such time-varying settings ... |