## An integrated machine learning approach to stroke prediction (2010)

### Cached

### Download Links

Venue: | In KDD |

Citations: | 3 - 0 self |

### BibTeX

@INPROCEEDINGS{Khosla10anintegrated,

author = {Aditya Khosla and Hsu-kuang Chiu and Yu Cao and Junling Hu and Cliff Chiung-yu Lin and Honglak Lee},

title = {An integrated machine learning approach to stroke prediction},

booktitle = {In KDD},

year = {2010},

pages = {183--192},

publisher = {ACM}

}

### OpenURL

### Abstract

Stroke is the third leading cause of death and the principal cause of serious long-term disability in the United States. Accurate prediction of stroke is highly valuable for early intervention and treatment. In this study, we compare the Cox proportional hazards model with a machine learning approach for stroke prediction on the Cardiovascular Health Study (CHS) dataset. Specifically, we consider the common problems of data imputation, feature selection, and prediction in medical datasets. We propose a novel automatic feature selection algorithm that selects robust features based on our proposed heuristic: conservative mean. Combined with Support Vector Machines (SVMs), our proposed feature selection algorithm achieves a greater area under the ROC curve (AUC) as compared to the Cox proportional hazards

### Citations

1826 | Regression Shrinkage and Selection via the Lasso
- Tibshirani
- 1996
(Show Context)
Citation Context ...goL1 rithm for feature selection. L1 regularization has the beneficial effect of regularizing model coefficients (as in L2 regularization), but yields sparse models that are more easily interpretable =-=[27, 32, 35]-=-. This model has a regularization parameter that controls the “sparseness” of the weights. Consequently, the features with nonzero weights are selected for prediction. 3.3.3 Conservative mean feature ... |

686 | An introduction to variable and feature selection
- Guyon, Elisseeff
- 2003
(Show Context)
Citation Context ...AD) (c) Bias (mean of imputed values - mean of groundtruth data) 2. Overall stroke prediction performance (measured by the area under the ROC curve). 3.3 Feature Selection Selecting relevant features =-=[13]-=- is crucial for building an accurate model of clinical data. For example, the CHS dataset has a large number of attributes, ranging from demographic information and clinical history to biomedical and ... |

539 |
The meaning and use of the area under a receiver operating characteristic (ROC) curve
- Hanley, McNeil
- 1982
(Show Context)
Citation Context ...mall in number) as it considers both sensitivity and specificity, providing a balanced measure for classifier performance. Specifically, the AUC (associated with the function f) is defined as follows =-=[5, 14]-=-: AUC = 1 |Mp| · |Mn| X X i∈Mp j∈Mn 1 f(x (i) )>f(x (j) ) , (4) where 1(·) is an indicator function. The AUC is used to evaluate the performance of the binary stroke classification task. Essentially, ... |

339 |
M.C.: ‘cvx – MATLAB software for disciplined convex programming’, 2005, http://www. stanford.edu/ boyd/cvx
- BOYD, GRANT
(Show Context)
Citation Context ...i, where C and γ are the hyperparameters for the misclassification loss penalty and for regularization respectively, and φ : R → R is the regression loss penalty, which we fixed as the Huber function =-=[11]-=- 8 in our study. To solve problem (14), we used CVX, a package for specifying and solving convex programs [11, 10]. Note that we can easily apply the kernel trick to this model, as in SVM. Furthermore... |

191 | A support vector method for multivariate performance measures
- Joachims
- 2005
(Show Context)
Citation Context ... SVM. Furthermore, SVM solvers can be used to optimize the area under the ROC curve directly, so they are well suited for the task of stroke prediction. We used linear SVMs implemented using SVM-perf =-=[17]-=- in this study. 3.4.2 Margin-based Censored Regression Since the SVM is in principle developed for classification, we use it to predict whether or not a stroke would occur within a given time frame wh... |

151 |
Survival analysis : techniques for censored and truncated data
- Klein, Moeschberger
- 2005
(Show Context)
Citation Context ...aseline hazard function is treated non-parametrically. Thus, we can see that the parameters have a multiplicative effect on the hazard value which makes it different from the linear regression models =-=[20]-=-. The Cox model is part of the Generalized Linear Model (GLM) family. Another member of this family is the logistic regression model, where the output takes the following form: h(x) = (1 + exp(−β T x)... |

136 | Feature selection, l1 vs. l2 regularization, and rotational invariance
- Ng
- 2002
(Show Context)
Citation Context ...goL1 rithm for feature selection. L1 regularization has the beneficial effect of regularizing model coefficients (as in L2 regularization), but yields sparse models that are more easily interpretable =-=[27, 32, 35]-=-. This model has a regularization parameter that controls the “sparseness” of the weights. Consequently, the features with nonzero weights are selected for prediction. 3.3.3 Conservative mean feature ... |

122 | Feature selection for high-dimensional genomic microarray data
- Xing, Jordan, et al.
- 2001
(Show Context)
Citation Context ...goL1 rithm for feature selection. L1 regularization has the beneficial effect of regularizing model coefficients (as in L2 regularization), but yields sparse models that are more easily interpretable =-=[27, 32, 35]-=-. This model has a regularization parameter that controls the “sparseness” of the weights. Consequently, the features with nonzero weights are selected for prediction. 3.3.3 Conservative mean feature ... |

115 |
Regression modeling strategies: With applications to linear models, logistic regression and survival analysis
- Harrell
- 2001
(Show Context)
Citation Context ... to measure how accurately the predictions reflect relative risk of stroke of two randomly selected individuals. A commonly used metric in survival models for this evaluation is the concordance index =-=[15, 29]-=-. The concordance index is a generalization of the concept of AUC designed to handle (i) continuous values for prediction and (ii) censored data. Similar to the AUC, it takes values from 0.5 (complete... |

102 | AUC optimization vs. error rate minimization
- Cortes, Mohri
- 2003
(Show Context)
Citation Context ...mall in number) as it considers both sensitivity and specificity, providing a balanced measure for classifier performance. Specifically, the AUC (associated with the function f) is defined as follows =-=[5, 14]-=-: AUC = 1 |Mp| · |Mn| X X i∈Mp j∈Mn 1 f(x (i) )>f(x (j) ) , (4) where 1(·) is an indicator function. The AUC is used to evaluate the performance of the binary stroke classification task. Essentially, ... |

65 |
Graph implementations for nonsmooth convex programs. Recent Advances in Learning and Control
- Grant, Boyd
- 2008
(Show Context)
Citation Context ...ively, and φ : R → R is the regression loss penalty, which we fixed as the Huber function [11] 8 in our study. To solve problem (14), we used CVX, a package for specifying and solving convex programs =-=[11, 10]-=-. Note that we can easily apply the kernel trick to this model, as in SVM. Furthermore, the objective function can be modified to optimize the AUC directly in the same way as SVM-perf [17]. 4. EXPERIM... |

54 |
Feature Extraction. Foundations and Applications
- Guyon, Gunn, et al.
- 2006
(Show Context)
Citation Context ...r selecting features automatically: forward feature selection, L1 regularized logistic regression, and “conservative mean” feature selection. 3.3.1 Forward feature selection Forward feature selection =-=[12]-=- greedily adds one feature at a time. The best subset of features was selected based on cross-validation. Note that adding more features does not necessarily improve the test performance since overfit... |

54 | Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values
- Schneider
(Show Context)
Citation Context ...’s observed values • Column median: replace each missing value with the median of the feature’s observed values • Imputation through linear regression [19] • Regularized Expectation Maximization (EM) =-=[31]-=- As a post-processing step to impute discrete-valued features, we rounded the imputed values to the nearest discrete value. The imputation algorithms were evaluated using the following metrics: 1. Imp... |

48 | Fast optimization methods for L1 regularization: a comparative study and two new approaches
- Schmidt, Fung, et al.
- 2007
(Show Context)
Citation Context ...tes to optimize the value of the threshold, t. CM(T , K) refers to the ConservativeMean(D, K) function defined in Algorithm 1. 3.3.2 L1 regularized logistic regression regularized logistic regression =-=[30]-=- is a popular algoL1 rithm for feature selection. L1 regularization has the beneficial effect of regularizing model coefficients (as in L2 regularization), but yields sparse models that are more easil... |

47 | SLEP: Sparse Learning with Efficient Projections
- Liu, Ji, et al.
- 2009
(Show Context)
Citation Context ...egularized logistic regression The L1 regularized logistic regression (L1LR) was used for feature selection, followed by SVM for prediction. The implementation of L1LR was done using the SLEP package =-=[22]-=-. The optimal regularization parameter λ ∗ was assigned to be the value that maximized the area under the ROC curve for 10-fold cross-validation. The value of λ ∗ was then used to run L1LR on the enti... |

34 |
An ℓ1 regularization-path algorithm for generalized linear models
- Park, Hastie
(Show Context)
Citation Context ...e [16, 24, 21]. However, the performance of the original Cox model depends heavily on the quality of the pre-selected features. To address this problem, several approaches have been proposed recently =-=[9, 28]-=-. Thus far, there have been very few studies on comparing the Cox regression with machine learning methods in making predictions on censored data. Kattan [18] compared Cox proportional hazards regress... |

13 |
Generating survival times to simulate Cox proportional hazards models
- Bender, Augustin, et al.
(Show Context)
Citation Context ...d (ii) discover new risk factors. Lumley et al.’s [24] 5-year stroke prediction model adopted the Cox proportional hazards model, one of the most commonly used statistical methods in medical research =-=[3]-=-. It has been extensively studied [1, 3] and applied to the prediction of various diseases including stroke [16, 24, 21]. However, the performance of the original Cox model depends heavily on the qual... |

13 |
E.: Epidemiological approaches to heart disease. The Framingham
- Dawber, Meadors, et al.
- 1951
(Show Context)
Citation Context ...ion can contribute significantly to its prevention and early treatment. Numerous medical studies and data analyses have been conducted to identify effective predictors of stroke. The Framingham Study =-=[6, 34]-=- reported a list of stroke risk factors including age, systolic blood pressure, the use of anti-hypertensive therapy, diabetes mellitus, cigarette smoking, prior cardiovascular disease, atrial fibrill... |

10 |
The cardiovascular health study: Design and rationale
- Fried, Borhani, et al.
- 1991
(Show Context)
Citation Context ... factors) that are verified by clinical trials or selected manually by medical experts. For example, Lumley et al. [24] built a 5-year stroke prediction model based on the Cardiovascular Health Study =-=[8]-=- dataset using a set of 16 manually selected features (given in [25]) from a total of roughly one thousand features. With a large number of features in current medical datasets, it is a cumbersome tas... |

9 |
Heart Disease and Stroke Statistics - 2009 Update (At-a-Glance Version). Retrieved September 12, 2009, from http://www.americanheart.org/presenter.jhtml?identifier=3037327 American Lung Association (2008). Smoking and woman fact sheet. Retrieved November
- Association
- 2009
(Show Context)
Citation Context ... DC, USA. Copyright 2010 ACM 978-1-4503-0055-1/10/07 ...$10.00. 1. INTRODUCTION Stroke is the third leading cause of death and the principal cause of serious long-term disability in the United States =-=[2]-=-. Stroke risk prediction can contribute significantly to its prevention and early treatment. Numerous medical studies and data analyses have been conducted to identify effective predictors of stroke. ... |

9 |
Imputation of Missing Longitudinal Data: A Comparison of Methods
- Engels, Diehr
- 2003
(Show Context)
Citation Context ...mpute discrete-valued features, we rounded the imputed values to the nearest discrete value. The imputation algorithms were evaluated using the following metrics: 1. Imputation accuracy (adopted from =-=[7]-=-): (a) Root-Mean-Square Deviation (RMSD) (b) Mean Absolute Deviation (MAD) (c) Bias (mean of imputed values - mean of groundtruth data) 2. Overall stroke prediction performance (measured by the area u... |

7 |
L1 penalized estimation in the cox proportional hazards model
- Goeman
- 2010
(Show Context)
Citation Context ...e [16, 24, 21]. However, the performance of the original Cox model depends heavily on the quality of the pre-selected features. To address this problem, several approaches have been proposed recently =-=[9, 28]-=-. Thus far, there have been very few studies on comparing the Cox regression with machine learning methods in making predictions on censored data. Kattan [18] compared Cox proportional hazards regress... |

6 |
Comparison of Cox regression with other methods for determining prediction models and nomograms
- Kattan
(Show Context)
Citation Context ...approaches have been proposed recently [9, 28]. Thus far, there have been very few studies on comparing the Cox regression with machine learning methods in making predictions on censored data. Kattan =-=[18]-=- compared Cox proportional hazards regression with several machine learning methods (neural networks and tree-based methods) based on three urological datasets. However, Kattan’sstudy focused on data... |

3 |
A stroke prediction score in the elderly: validation and web-based application
- Lumley
- 2002
(Show Context)
Citation Context ...betes mellitus, cigarette smoking, prior cardiovascular disease, atrial fibrillation, and left ventricular hypertrophy by electrocardiogram. Furthermore, in the past decade, a number of other studies =-=[25, 23, 24, 26]-=- have led to the discovery of more risk factors such as creatinine level, time to walk 15 feet, and others. Most previous prediction models have adopted features (risk factors) that are verified by cl... |

2 |
Simulation program for estimating statistical power of Cox's proportionaI hazards model assuming no specific distribution for the survival time
- Akazawa, Nakamura, et al.
- 1991
(Show Context)
Citation Context ...ley et al.’s [24] 5-year stroke prediction model adopted the Cox proportional hazards model, one of the most commonly used statistical methods in medical research [3]. It has been extensively studied =-=[1, 3]-=- and applied to the prediction of various diseases including stroke [16, 24, 21]. However, the performance of the original Cox model depends heavily on the quality of the pre-selected features. To add... |

2 | Imputation of missing values in DNA microarray gene expression data
- Kim, Golub, et al.
- 2004
(Show Context)
Citation Context ...e each missing value with the mean of the feature’s observed values • Column median: replace each missing value with the median of the feature’s observed values • Imputation through linear regression =-=[19]-=- • Regularized Expectation Maximization (EM) [31] As a post-processing step to impute discrete-valued features, we rounded the imputed values to the nearest discrete value. The imputation algorithms w... |

2 |
The Cox proportional hazards model with change point: An epidemiologic application
- Liang, Self, et al.
- 1990
(Show Context)
Citation Context ...nal hazards model, one of the most commonly used statistical methods in medical research [3]. It has been extensively studied [1, 3] and applied to the prediction of various diseases including stroke =-=[16, 24, 21]-=-. However, the performance of the original Cox model depends heavily on the quality of the pre-selected features. To address this problem, several approaches have been proposed recently [9, 28]. Thus ... |

2 | Short-term predictors of incident stroke in older adults: The Cardiovascular Health Study - Manolio, Kronmal, et al. - 1996 |

2 | On ranking in survival analysis: Bounds on the concordance index
- Raykar, Steck, et al.
- 2008
(Show Context)
Citation Context ... to measure how accurately the predictions reflect relative risk of stroke of two randomly selected individuals. A commonly used metric in survival models for this evaluation is the concordance index =-=[15, 29]-=-. The concordance index is a generalization of the concept of AUC designed to handle (i) continuous values for prediction and (ii) censored data. Similar to the AUC, it takes values from 0.5 (complete... |

2 |
Probability of stroke: a risk profile from the Framingham study
- Wolf, D’Agostino, et al.
- 1991
(Show Context)
Citation Context ...ion can contribute significantly to its prevention and early treatment. Numerous medical studies and data analyses have been conducted to identify effective predictors of stroke. The Framingham Study =-=[6, 34]-=- reported a list of stroke risk factors including age, systolic blood pressure, the use of anti-hypertensive therapy, diabetes mellitus, cigarette smoking, prior cardiovascular disease, atrial fibrill... |

1 |
Prediction of ischemic stroke risk in the atherosclerosis risk in communities study
- Chambless, Heiss, et al.
(Show Context)
Citation Context ...ibutes is highly relevant to stroke prediction. The traditional approach to stroke prediction has been to use manually selected features based on risk factors analyzed by medical and clinical studies =-=[4, 24, 33, 36]-=-. Instead of manually selecting features, we evaluate three machine learning-based algorithms for selecting features automatically: forward feature selection, L1 regularized logistic regression, and “... |

1 |
Effect of repeated transcatheter arterial embolization on the survival time in patients with hepatocellular carcinoma
- Ikeda, Kumada, et al.
(Show Context)
Citation Context ...nal hazards model, one of the most commonly used statistical methods in medical research [3]. It has been extensively studied [1, 3] and applied to the prediction of various diseases including stroke =-=[16, 24, 21]-=-. However, the performance of the original Cox model depends heavily on the quality of the pre-selected features. To address this problem, several approaches have been proposed recently [9, 28]. Thus ... |

1 |
Frequency and predictors of stroke death
- Longstreth, Bernick, et al.
- 2001
(Show Context)
Citation Context ...betes mellitus, cigarette smoking, prior cardiovascular disease, atrial fibrillation, and left ventricular hypertrophy by electrocardiogram. Furthermore, in the past decade, a number of other studies =-=[25, 23, 24, 26]-=- have led to the discovery of more risk factors such as creatinine level, time to walk 15 feet, and others. Most previous prediction models have adopted features (risk factors) that are verified by cl... |

1 | Walking speed and risk of incident ischemic stroke among postmenopausal women
- McGinn, Kaplan, et al.
- 2008
(Show Context)
Citation Context ...betes mellitus, cigarette smoking, prior cardiovascular disease, atrial fibrillation, and left ventricular hypertrophy by electrocardiogram. Furthermore, in the past decade, a number of other studies =-=[25, 23, 24, 26]-=- have led to the discovery of more risk factors such as creatinine level, time to walk 15 feet, and others. Most previous prediction models have adopted features (risk factors) that are verified by cl... |

1 |
How do American stroke risk functions perform in a western European population
- Vokó, Hollander, et al.
- 2004
(Show Context)
Citation Context ...ibutes is highly relevant to stroke prediction. The traditional approach to stroke prediction has been to use manually selected features based on risk factors analyzed by medical and clinical studies =-=[4, 24, 33, 36]-=-. Instead of manually selecting features, we evaluate three machine learning-based algorithms for selecting features automatically: forward feature selection, L1 regularized logistic regression, and “... |

1 |
A risk score predicted coronary heart disease and stroke in a Chinese cohort
- Zhang, Attia, et al.
(Show Context)
Citation Context ...ibutes is highly relevant to stroke prediction. The traditional approach to stroke prediction has been to use manually selected features based on risk factors analyzed by medical and clinical studies =-=[4, 24, 33, 36]-=-. Instead of manually selecting features, we evaluate three machine learning-based algorithms for selecting features automatically: forward feature selection, L1 regularized logistic regression, and “... |