Polynomial Splines and Their Tensor Products in Extended Linear Modeling
 Ann. Statist
, 1997
"... ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to m ..."
Abstract

Cited by 158 (16 self)
ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to model the main effects, and their tensor products are used to model any interaction components that are included. In the special context of survival analysis, the baseline hazard function is modeled and nonproportionality is allowed. In general, the theory involves the L 2 rate of convergence for the fitted model and its components. The methodology involves least squares and maximum likelihood estimation, stepwise addition of basis functions using Rao statistics, stepwise deletion using Wald statistics, and model selection using BIC, crossvalidation or an independent test set. Publically available software, written in C and interfaced to S/SPLUS, is used to apply this methodology to...
Hazard Regression
 Journal of the American Statistical Association
, 1995
"... An automatic procedure that uses linear splines and their tensor products is proposed for tting a regression model to data involving a polychotomous response variable and one or more predictors. The tted model can be used for multiple classi cation. The automatic tting procedure involves maximum lik ..."
Abstract

Cited by 94 (20 self)
An automatic procedure that uses linear splines and their tensor products is proposed for tting a regression model to data involving a polychotomous response variable and one or more predictors. The tted model can be used for multiple classi cation. The automatic tting procedure involves maximum likelihood estimation, stepwise addition, stepwise deletion, and model selection by AIC, crossvalidation or an independent test set. A modi ed version of the algorithm has been constructed that is applicable to large data sets, and it is illustrated using a phoneme recognition data set with 250,000 cases, 45 classes and 63 predictors.
Bayesian model averaging
 STAT.SCI
, 1999
"... Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overcon dent inferences and decisions tha ..."
Abstract

Cited by 49 (1 self)
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to overcon dent inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA haverecently emerged. We discuss these methods and present anumber of examples. In these examples, BMA provides improved outofsample predictive performance. We also provide a catalogue of
Accounting for Model Uncertainty in Survival Analysis Improves Predictive Performance
 In Bayesian Statistics 5
, 1995
"... Survival analysis is concerned with finding models to predict the survival of patients or to assess the efficacy of a clinical treatment. A key part of the modelbuilding process is the selection of the predictor variables. It is standard to use a stepwise procedure guided by a series of significanc ..."
Abstract

Cited by 42 (12 self)
Survival analysis is concerned with finding models to predict the survival of patients or to assess the efficacy of a clinical treatment. A key part of the modelbuilding process is the selection of the predictor variables. It is standard to use a stepwise procedure guided by a series of significance tests to select a single model, and then to make inference conditionally on the selected model. However, this ignores model uncertainty, which can be substantial. We review the standard Bayesian model averaging solution to this problem and extend it to survival analysis, introducing partial Bayes factors to do so for the Cox proportional hazards model. In two examples, taking account of model uncertainty enhances predictive performance, to an extent that could be clinically useful. 1 Introduction From 1974 to 1984 the Mayo Clinic conducted a doubleblinded randomized clinical trial involving 312 patients to compare the drug DPCA with a placebo in the treatment of primary biliary cirrhosis...
Supervised Harvesting of Expression Trees
, 2000
"... Background We propose a new method for supervising learning from gene expression data. We call it \Tree Harvesting". This technique starts with a hierarchical clustering of genes, and models the outcome variable as a sum of the average expression proles of chosen clusters, and their produc ..."
Abstract

Cited by 41 (5 self)
Background We propose a new method for supervising learning from gene expression data. We call it \Tree Harvesting". This technique starts with a hierarchical clustering of genes, and models the outcome variable as a sum of the average expression proles of chosen clusters, and their products. It can be applied to many dierent kinds of outcome measures, such as censored survival times, or a response falling in two or more classes (e.g. cancer classes). The method can discover genes that have strong eects on their own, and genes that interact with other genes. Results We illustrate the method on data from a lymphoma study, and on a dataset containing samples from 8 dierent cancers. It identied some interesting gene clusters and interactions between genes. Conclusions Tree Harvesting is a potentially useful tool for exploration of gene expression data and identication of interesting clusters of genes worthy of further investigation. Depts. of Statistics, and Hea...
Bayesian Model Averaging in proportional hazard models: Assessing the risk of a stroke
 Applied Statistics
, 1997
"... Evaluating the risk of stroke is important in reducing the incidence of this devastating disease. Here, we apply Bayesian model averaging to variable selection in Cox proportional hazard models in the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for ..."
Abstract

Cited by 35 (5 self)
Evaluating the risk of stroke is important in reducing the incidence of this devastating disease. Here, we apply Bayesian model averaging to variable selection in Cox proportional hazard models in the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for stroke. We introduce a technique based on the leaps and bounds algorithm which e ciently locates and ts the best models in the very large model space and thereby extends all subsets regression to Cox models. For each independent variable considered, the method provides the posterior probability that it belongs in the model. This is more directly interpretable than the corresponding Pvalues, and also more valid in that it takes account of model uncertainty. Pvalues from models preferred by stepwise methods tend to overstate the evidence for the predictive value of a variable. In our data Bayesian model averaging predictively outperforms standard model selection methods for assessing
Joint modeling of longitudinal and timetoevent data: an overview
 Statistica Sinica
, 2004
"... A common objective in longitudinal studies is to characterize the relationship between a longitudinal response process and a timetoevent. Considerable recent interest has focused on socalled joint models, where models for the event time distribution and longitudinal data are taken to depend on a ..."
Abstract

Cited by 26 (0 self)
A common objective in longitudinal studies is to characterize the relationship between a longitudinal response process and a timetoevent. Considerable recent interest has focused on socalled joint models, where models for the event time distribution and longitudinal data are taken to depend on a common set of latent random effects. In the literature, precise statement of the underlying assumptions typically made for these models has been rare. We review the rationale for and development of joint models, offer insight into the structure of the likelihood for model parameters that clarifies the nature of common assumptions, and describe and contrast some of our recent proposals for implementation and inference.
The lasso method for variable selection in the cox model
 Statistics in Medicine
, 1997
"... I propose a new method for variable selection and shrinkage in Cox's proportional hazards model. My proposal minimizes the log partial likelihood subject to the sum of the absolute values of the parameters being bounded by a constant. Because of the nature of this constraint, it shrinks coefficients ..."
Abstract

Cited by 22 (0 self)
I propose a new method for variable selection and shrinkage in Cox’s proportional hazards model. My proposal minimizes the log partial likelihood subject to the sum of the absolute values of the parameters being bounded by a constant. Because of the nature of this constraint, it shrinks coefficients and produces some coefficients that are exactly zero. As a result it reduces the estimation variance while providing an interpretable final model. The method is a variation of the ‘lasso ’ proposal of Tibshirani, designed for the linear regression context. Simulations indicate that the lasso can be more accurate than stepwise selection in this setting. 1.
Efficient quadratic regularization for expression arrays
 Biostatistics
, 2004
"... have been many attempts to adapt statistical models for regression and classification to these data, and in many cases these attempts have challenged the computational resources. In this article we expose a class of techniques based on quadratic regularization of linear models, including regularized ..."
Abstract

Cited by 21 (3 self)
have been many attempts to adapt statistical models for regression and classification to these data, and in many cases these attempts have challenged the computational resources. In this article we expose a class of techniques based on quadratic regularization of linear models, including regularized (ridge) regression, logistic and multinomial regression, linear and mixture discriminant analysis, the Cox model and neural networks. For all of these models, we show that dramatic computational savings are possible over naive implementations, using standard transformations in numerical linear algebra. Keywords: Eigengenes; Euclidean methods; Quadratic regularization; SVD. 1.
Bayesian information criterion for censored survival models
 Biometrics
"... We investigate the Bayesian Information Criterion (BIC) for variable selection in models for censored survival data. Kass and Wasserman (1995) showed that BIC provides a close approximation to the Bayes factor when a unitinformation prior on the parameter space is used. We propose a revision of the ..."
Abstract

Cited by 21 (2 self)
We investigate the Bayesian Information Criterion (BIC) for variable selection in models for censored survival data. Kass and Wasserman (1995) showed that BIC provides a close approximation to the Bayes factor when a unitinformation prior on the parameter space is used. We propose a revision of the penalty term in BIC so that it is de ned in terms of the number of uncensored events instead of the number of observations. For the simplest censored data model, that of exponential distributions of survival times (i.e. a constant hazard rate), this revision results in a better approximation to the exact Bayes factor based on a conjugate unitinformation prior. In the Cox proportional hazards regression model, we propose de ning BIC in terms of the maximized partial likelihood. Using the number of deaths rather than the number of individuals in the BIC penalty term corresponds to a more realistic prior on the parameter space, and is shown to improve predictive performance for assessing stroke risk in the Cardiovascular Health Study.