Results 1 - 10
of
200
Polynomial Splines and Their Tensor Products in Extended Linear Modeling
- Ann. Statist
, 1997
"... ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to m ..."
Abstract
-
Cited by 121 (14 self)
- Add to MetaCart
ANOVA type models are considered for a regression function or for the logarithm of a probability function, conditional probability function, density function, conditional density function, hazard function, conditional hazard function, or spectral density function. Polynomial splines are used to model the main effects, and their tensor products are used to model any interaction components that are included. In the special context of survival analysis, the baseline hazard function is modeled and nonproportionality is allowed. In general, the theory involves the L 2 rate of convergence for the fitted model and its components. The methodology involves least squares and maximum likelihood estimation, stepwise addition of basis functions using Rao statistics, stepwise deletion using Wald statistics, and model selection using BIC, cross-validation or an independent test set. Publically available software, written in C and interfaced to S/S-PLUS, is used to apply this methodology to...
Hazard Regression
- Journal of the American Statistical Association
, 1995
"... An automatic procedure that uses linear splines and their tensor products is proposed for tting a regression model to data involving a polychotomous response variable and one or more predictors. The tted model can be used for multiple classi cation. The automatic tting procedure involves maximum lik ..."
Abstract
-
Cited by 70 (15 self)
- Add to MetaCart
An automatic procedure that uses linear splines and their tensor products is proposed for tting a regression model to data involving a polychotomous response variable and one or more predictors. The tted model can be used for multiple classi cation. The automatic tting procedure involves maximum likelihood estimation, stepwise addition, stepwise deletion, and model selection by AIC, cross-validation or an independent test set. A modi ed version of the algorithm has been constructed that is applicable to large data sets, and it is illustrated using a phoneme recognition data set with 250,000 cases, 45 classes and 63 predictors.
Accounting for Model Uncertainty in Survival Analysis Improves Predictive Performance
- In Bayesian Statistics 5
, 1995
"... Survival analysis is concerned with finding models to predict the survival of patients or to assess the efficacy of a clinical treatment. A key part of the model-building process is the selection of the predictor variables. It is standard to use a stepwise procedure guided by a series of significanc ..."
Abstract
-
Cited by 37 (12 self)
- Add to MetaCart
Survival analysis is concerned with finding models to predict the survival of patients or to assess the efficacy of a clinical treatment. A key part of the model-building process is the selection of the predictor variables. It is standard to use a stepwise procedure guided by a series of significance tests to select a single model, and then to make inference conditionally on the selected model. However, this ignores model uncertainty, which can be substantial. We review the standard Bayesian model averaging solution to this problem and extend it to survival analysis, introducing partial Bayes factors to do so for the Cox proportional hazards model. In two examples, taking account of model uncertainty enhances predictive performance, to an extent that could be clinically useful. 1 Introduction From 1974 to 1984 the Mayo Clinic conducted a double-blinded randomized clinical trial involving 312 patients to compare the drug DPCA with a placebo in the treatment of primary biliary cirrhosis...
Bayesian model averaging
- STAT.SCI
, 1999
"... Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-con dent inferences and decisions tha ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
Standard statistical practice ignores model uncertainty. Data analysts typically select a model from some class of models and then proceed as if the selected model had generated the data. This approach ignores the uncertainty in model selection, leading to over-con dent inferences and decisions that are more risky than one thinks they are. Bayesian model averaging (BMA) provides a coherent mechanism for accounting for this model uncertainty. Several methods for implementing BMA haverecently emerged. We discuss these methods and present anumber of examples. In these examples, BMA provides improved out-of-sample predictive performance. We also provide a catalogue of
Supervised Harvesting of Expression Trees
, 2000
"... Background We propose a new method for supervising learning from gene expression data. We call it \Tree Harvesting". This technique starts with a hierarchical clustering of genes, and models the outcome variable as a sum of the average expression proles of chosen clusters, and their products. I ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
Background We propose a new method for supervising learning from gene expression data. We call it \Tree Harvesting". This technique starts with a hierarchical clustering of genes, and models the outcome variable as a sum of the average expression proles of chosen clusters, and their products. It can be applied to many dierent kinds of outcome measures, such as censored survival times, or a response falling in two or more classes (e.g. cancer classes). The method can discover genes that have strong eects on their own, and genes that interact with other genes. Results We illustrate the method on data from a lymphoma study, and on a dataset containing samples from 8 dierent cancers. It identied some interesting gene clusters and interactions between genes. Conclusions Tree Harvesting is a potentially useful tool for exploration of gene expression data and identication of interesting clusters of genes worthy of further investigation. Depts. of Statistics, and Hea...
Bayesian Model Averaging in proportional hazard models: Assessing the risk of a stroke
- Applied Statistics
, 1997
"... Evaluating the risk of stroke is important in reducing the incidence of this devastating disease. Here, we apply Bayesian model averaging to variable selection in Cox proportional hazard models in the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
Evaluating the risk of stroke is important in reducing the incidence of this devastating disease. Here, we apply Bayesian model averaging to variable selection in Cox proportional hazard models in the context of the Cardiovascular Health Study, a comprehensive investigation into the risk factors for stroke. We introduce a technique based on the leaps and bounds algorithm which e ciently locates and ts the best models in the very large model space and thereby extends all subsets regression to Cox models. For each independent variable considered, the method provides the posterior probability that it belongs in the model. This is more directly interpretable than the corresponding P-values, and also more valid in that it takes account of model uncertainty. P-values from models preferred by stepwise methods tend to overstate the evidence for the predictive value of a variable. In our data Bayesian model averaging predictively outperforms standard model selection methods for assessing
Coronary Risk Prediction by Logical Analysis of Data
, 2002
"... The objective of this study was to distinguish within a population of patients with known or suspected coronary artery disease groups at high and at low mortality rates. The study was based on Cleveland Clinic Foundation's dataset of 9454 patients, of whom 312 died during an observation period of 9 ..."
Abstract
-
Cited by 16 (9 self)
- Add to MetaCart
The objective of this study was to distinguish within a population of patients with known or suspected coronary artery disease groups at high and at low mortality rates. The study was based on Cleveland Clinic Foundation's dataset of 9454 patients, of whom 312 died during an observation period of 9 years. The Logical Analysis of Data method was adapted to handle the disproportioned size of the two groups of patients, and the inseparable character of this dataset -- characteristic to many medical problems. As a result of the study, we have identified a high-risk group of patients representing 1/5 of the population, with a mortality rate 4 times higher than the average, and including 3/4 of the patients who died. The low-risk group identified in the study, representing approximately 4/5 of the population, had a mortality rate 3 times lower than the average.
Statistical issues in the design, analysis and interpretation of animal carcinogenicity studies. Environ Health Persp 58: 385−392
, 1984
"... Statistical issues in the design, analysis and interpretation of animal carcinogenicity studies are discussed. In the area of experimental design, issues that must be considered include randomization of animals, sample size considerations, dose selection and allocation of animals to experimental gro ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Statistical issues in the design, analysis and interpretation of animal carcinogenicity studies are discussed. In the area of experimental design, issues that must be considered include randomization of animals, sample size considerations, dose selection and allocation of animals to experimental groups, and control of potentially confounding factors. In the analysis of tumor incidence data, survival differences among groups should be taken into account. It is important to try to distinguish between tumors that contribute to the death of the animal and "incidental " tumors discovered at autopsy in an animal dying of an unrelated cause. Life table analyses (appropriate for lethal tumors) and incidental tumor tests (appropriate for nonfatal tumors) are described, and the utilization of these procedures by the National Tbxicology Program is discussed. Despite the fact that past interpretations of carcinogenicity data have tended to focus on pairwise comparisons in general and high-dose effects in particular, the importance of trend tests should not be overlooked, since these procedures are more sensitive than pairwise comparisons to the detection of carcinogenic effects. No rigid statistical "decision rule " should be employed in the interpretation of carcinogenicity data. Although the statistical significance of an observed tumor increase is perhaps the single most important piece of evidence used in the evaluation process, a number of biological factors must also be taken into account. The use of historical control data, the false-positive issue and the interpretation of negative trends are also discussed.
Efficient quadratic regularization for expression arrays
- Biostatistics
, 2004
"... have been many attempts to adapt statistical models for regression and classification to these data, and in many cases these attempts have challenged the computational resources. In this article we expose a class of techniques based on quadratic regularization of linear models, including regularized ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
have been many attempts to adapt statistical models for regression and classification to these data, and in many cases these attempts have challenged the computational resources. In this article we expose a class of techniques based on quadratic regularization of linear models, including regularized (ridge) regression, logistic and multinomial regression, linear and mixture discriminant analysis, the Cox model and neural networks. For all of these models, we show that dramatic computational savings are possible over naive implementations, using standard transformations in numerical linear algebra. Keywords: Eigengenes; Euclidean methods; Quadratic regularization; SVD. 1.
Bayesian information criterion for censored survival models
- Biometrics
"... We investigate the Bayesian Information Criterion (BIC) for variable selection in models for censored survival data. Kass and Wasserman (1995) showed that BIC provides a close approximation to the Bayes factor when a unit-information prior on the parameter space is used. We propose a revision of the ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
We investigate the Bayesian Information Criterion (BIC) for variable selection in models for censored survival data. Kass and Wasserman (1995) showed that BIC provides a close approximation to the Bayes factor when a unit-information prior on the parameter space is used. We propose a revision of the penalty term in BIC so that it is de ned in terms of the number of uncensored events instead of the number of observations. For the simplest censored data model, that of exponential distributions of survival times (i.e. a constant hazard rate), this revision results in a better approximation to the exact Bayes factor based on a conjugate unit-information prior. In the Cox proportional hazards regression model, we propose de ning BIC in terms of the maximized partial likelihood. Using the number of deaths rather than the number of individuals in the BIC penalty term corresponds to a more realistic prior on the parameter space, and is shown to improve predictive performance for assessing stroke risk in the Cardiovascular Health Study.

