Results 1  10
of
88
Optimal Structural Nested Models for Optimal Sequential Decisions
 In Proceedings of the Second Seattle Symposium on Biostatistics
, 2004
"... ABSTRACT: I describe two new methods for estimating the optimal treatment regime (equivalently, protocol, plan or strategy) from very high dimesional observational and experimental data: (i) gestimation of an optimal doubleregime structural nested mean model (drSNMM) and (ii) gestimation of a sta ..."
Abstract

Cited by 60 (5 self)
 Add to MetaCart
ABSTRACT: I describe two new methods for estimating the optimal treatment regime (equivalently, protocol, plan or strategy) from very high dimesional observational and experimental data: (i) gestimation of an optimal doubleregime structural nested mean model (drSNMM) and (ii) gestimation of a standard single regime SNMM combined with sequential dynamicprogramming (DP) regression. These methods are compared to certain regression methods found in the sequential decision and reinforcement learning literatures and to the regret modelling methods of Murphy (2003). I consider both Bayesian and frequentist inference. In particular, I propose a novel “Bayesfrequentist compromise ” that combines honest subjective non or semiparametric Bayesian inference with good frequentist behavior, even in cases where the model is so large and the likelihood function so complex that standard (uncompromised) Bayes procedures have poor frequentist performance. 1
Performance guarantee for individualized treatment rules
, 2009
"... S.1 The overfitting problem In this section, we discuss the problem with overfitting due to the potentially large number of pretreatment variables (and/or complex approximation space for Q0) mentioned in Section 4. Consider the setting in which we know that Q0 is linear in the {X, A} variables and ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
(Show Context)
S.1 The overfitting problem In this section, we discuss the problem with overfitting due to the potentially large number of pretreatment variables (and/or complex approximation space for Q0) mentioned in Section 4. Consider the setting in which we know that Q0 is linear in the {X, A} variables and suppose that most coefficients are nonzero (some may be quite small). Then the least squares estimator using the best correct linear model (i.e. the model that contains and only contains variables with truly nonzero coefficients) may result in ITRs with poor Value as compared to the estimator from a more sparse model. Intuitively this occurs when the dimension of {X, A} is too large for the size of the data set. This is similar to the case of stepwise model selection; a solution is to select the model that balances the approximation error with the estimation error instead of keeping all of the correct terms (Massart [3]). Indeed the l1PLS method aims to estimate a parameter possessing small approximation error (i.e. the excess prediction error) and controlled sparsity (which is directly related to the estimation error). As a result, the ITR produced by l1PLS will more reliably have higher Value than the rule produced by the OLS (ordinary least squares) estimator constructed when the correct model is known but is too nonsparse relative to the size of the data set. In the following we use a simple simulation to support this argument. First we generate X = (X1,..., X12), where X1,..., X12 are mutually independent and each Xj is uniformly distributed on [−1, 1]. The treatment A is then generated independently of X from {−1, 1} with probability 1/2 each. The response R is generated from a normal distribution with mean Q0(X, A) = (1, X−12, A, X−12A)ϑ and variance 1, where X−12 = (X1,..., X11) and ϑ ∈ R24 is a vector parameter. We consider
Demystifying Optimal Dynamic Treatment Regimes." Biometrics 63(2):447{455
, 2007
"... A dynamic regime is a function that takes treatment and covariate history and baseline covariates as inputs and returns a decision to be made. Murphy (2003) and Robins (2004) have proposed models and developed semiparametric methods for making inference about the optimal regime in a multiinterval ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
(Show Context)
A dynamic regime is a function that takes treatment and covariate history and baseline covariates as inputs and returns a decision to be made. Murphy (2003) and Robins (2004) have proposed models and developed semiparametric methods for making inference about the optimal regime in a multiinterval trial that provide clear advantages over traditional parametric approaches. We show that Murphy’s model is a special case of Robins ’ and that the methods are closely related but not equivalent. Interesting features of the methods are highlighted using the Multicenter AIDS Cohort Study (MACS) and through simulation. ∗Corresponding author’s
Impossibility Results for Nondifferentiable Functionals
, 2010
"... We examine challenges to estimation and inference when the objects of interest are nondifferentiable functionals of the underlying data distribution. This situation arises in a number of applications of bounds analysis and moment inequality models, and in recent work on estimating optimal dynamic tr ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
We examine challenges to estimation and inference when the objects of interest are nondifferentiable functionals of the underlying data distribution. This situation arises in a number of applications of bounds analysis and moment inequality models, and in recent work on estimating optimal dynamic treatment regimes. Drawing on earlier work relating differentiability to the existence of unbiased and regular estimators, we show that if the target object is not continuously differentiable in the parameters of the data distribution, there exist no locally asymptotically unbiased estimators and no regular estimators. This places strong limits on estimators, bias correction methods, and inference procedures.
Informing sequential clinical decisionmaking through Reinforcement learning: an empirical study
 Machine Learning
, 2011
"... Abstract This paper highlights the role that reinforcement learning can play in the optimization of treatment policies for chronic illnesses. Before applying any offtheshelf reinforcement learning methods in this setting, we must first tackle a number of challenges. We outline some of these challe ..."
Abstract

Cited by 14 (5 self)
 Add to MetaCart
(Show Context)
Abstract This paper highlights the role that reinforcement learning can play in the optimization of treatment policies for chronic illnesses. Before applying any offtheshelf reinforcement learning methods in this setting, we must first tackle a number of challenges. We outline some of these challenges and present methods for overcoming them. First, we describe a multiple imputation approach to overcome the problem of missing data. Second, we discuss the use of function approximation in the context of a highly variable observation set. Finally, we discuss approaches to summarizing the evidence in the data for recommending a particular action and quantifying the uncertainty around the Qfunction of the recommended policy. We present the results of applying these methods to real clinical trial data of patients with schizophrenia.
A CAUTIOUS APPROACH TO GENERALIZATION IN REINFORCEMENT LEARNING
"... In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which exploits weak prior knowledge about its environment for computing from a given sample of tr ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which exploits weak prior knowledge about its environment for computing from a given sample of trajectories and for a given initial state a sequence of actions. The proposed Viterbilike algorithm maximizes a recently proposed lower bound on the return depending on the initial state, and uses to this end prior knowledge about the environment provided in the form of upper bounds on its Lipschitz constants. It thereby avoids, in way depending on the initial state and on the prior knowledge, those regions of the state space where the sample is too sparse to make safe generalizations. Our experiments show that it can lead to more cautious policies than algorithms combining dynamic programming with function approximators. We give also a condition on the sample sparsity ensuring that, for a given initial state, the proposed algorithm produces an optimal sequence of actions in openloop. 1
Screening Experiments for Developing Dynamic Treatment Regimes
"... Many disorders require multicomponent dynamic treatment regimes given sequentially in time to achieve and maintain good outcomes. The construction of dynamic treatment regimes is challenging due to the large number of potentially useful components and sequential nature of components. There is a ric ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Many disorders require multicomponent dynamic treatment regimes given sequentially in time to achieve and maintain good outcomes. The construction of dynamic treatment regimes is challenging due to the large number of potentially useful components and sequential nature of components. There is a rich literature aimed at providing experimental designs that can be used to screen out inactive components of a multicomponent treatment, thereby focussing efforts in future trials on the most promising components. We borrow and generalize ideas from the experimental design field to propose screening experiments which scientists can use to screen out inactive or less active components of a dynamic treatment regime. Unfortunately the classical design and analysis of screening experiments cannot be directly imported as is. This is due to the sequential nature of dynamic treatment regimes in which some components are considered only if patients respond (or do not respond) to prior treatment components. Here, we define causal effects that can be used to screen components. However, the aliasing of these causal effects, incurred when a fraction of a factorial experimental is conducted, does not immediately follow from the associated defining words. Somewhat surprisingly, simple modifications can be used to infer aliasing from a combination of the defining words and the form of the statistical analysis of the data.
Statistical Inference in Dynamic Treatment Regimes
, 2010
"... Dynamic treatment regimes, also known as treatment policies, are increasingly being used to operationalize clinical decision making associated with longterm patient care. Common approaches to constructing a dynamic treatment regime from data, such as Qlearning, employ nonsmooth functionals of the ..."
Abstract

Cited by 7 (5 self)
 Add to MetaCart
Dynamic treatment regimes, also known as treatment policies, are increasingly being used to operationalize clinical decision making associated with longterm patient care. Common approaches to constructing a dynamic treatment regime from data, such as Qlearning, employ nonsmooth functionals of the data. Therefore, simple inferential tasks such as constructing a confidence interval for the parameters in the Qfunction are complicated by nonregular asymptotics under certain commonlyencountered generative models. Methods that ignore this nonregularity can suffer from poor performance in small samples. We construct confidence intervals for the parameters in the Qfunction by constructing smooth, datadependent, upper and lower bounds on these parameters and then applying the bootstrap. We prove that the proposed method provides asymptotically exact coverage regardless of the generative model. In addition, we show that in certain scenarios the bounds used in this method are tight even in finite samples. The small sample performance of the method is evaluated on a series of examples and compares favorably to previously published competitors. Finally, we illustrate the method on real data from the Adaptive Interventions for Children with ADHD study (Pelham and Fabiano 2008). 1 1
The Brown Legacy and the O’Connor Challenge: Transforming Schools in the Images of Children’s Potential
"... The gap between Blacks and Whites in educational outcomes has narrowed dramatically over the past 60 years, but progress stopped around 1990. The author reviews research suggesting that increasing the quantity and quality of schooling can play a powerful role in overcoming racial inequality. To ach ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
The gap between Blacks and Whites in educational outcomes has narrowed dramatically over the past 60 years, but progress stopped around 1990. The author reviews research suggesting that increasing the quantity and quality of schooling can play a powerful role in overcoming racial inequality. To achieve that goal, he reasons, our knowledge of best instructional practice should drive our conceptions of teachers ’ work, teachers ’ expertise, school leadership, and parent involvement. The research agenda supporting this paradigm connects developmental science to instructional practice and school organization and requires close collaboration between practitioners and researchers in a relentless commitment to provide superb educational opportunities to children whose future success depends most strongly on schooling.