Results 1 - 10
of
11
The Entire Regularization Path for the Support Vector Machine
, 2004
"... In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. ..."
Abstract
-
Cited by 107 (8 self)
- Add to MetaCart
In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.
Prediction by supervised principal components
- Journal of the American Statistical Association
, 2006
"... In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal co ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal components is similar to conventional principal components analysis except that it uses a subset of the predictors selected based on their association with the outcome. Supervised principal components can be applied to regression and generalized regression problems, such as survival analysis. It compares favorably to other techniques for this type of problem, and can also account for the effects of other covariates and help identify which predictor variables are most important. We also provide asymptotic consistency results to help support our empirical findings. These methods could become important tools for DNA microarray data, where they may be used to more accurately diagnose and treat cancer. KEY WORDS: Gene expression; Microarray; Regression; Survival analysis. 1.
Margin trees for high-dimensional classification
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2006
"... We propose a method for the classification of more than two classes, from high-dimensional features. Our approach is to build a binary decision tree in a top-down manner, using the optimal margin classifier at each split. We implement an exact greedy algorithm for this task, and compare its perf ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
We propose a method for the classification of more than two classes, from high-dimensional features. Our approach is to build a binary decision tree in a top-down manner, using the optimal margin classifier at each split. We implement an exact greedy algorithm for this task, and compare its performance to less greedy procedures based on clustering of the matrix of pairwise margins,. We compare the performance of the "margin tree" to the closely related "all-pairs" (one versus one) support vector machine, and nearest centroids on a number of cancer microarray datasets. We also develop a simple method for feature selection. We find that the margin tree has accuracy that is competitive with other methods and offers additional interpretability in its putative grouping of the classes.
Survival prediction using gene expression data: a review and comparison. submitted
, 2007
"... Background: Knowledge of the transcription of the humane genome might greatly enhance our understanding of cancer. In particular, gene expression may be used to predict the survival of cancer patients. A microarray measures the expression of thousands of genes simultaneously. The high-dimensionality ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Background: Knowledge of the transcription of the humane genome might greatly enhance our understanding of cancer. In particular, gene expression may be used to predict the survival of cancer patients. A microarray measures the expression of thousands of genes simultaneously. The high-dimensionality of the data poses the following problem: the number of covariates (∼10000) greatly exceeds the number of samples (∼200). Results: Here we give an inventory of methods that have been used to model survival using gene expression. These methods are critically reviewed and compared in a qualitative way. Finally, the methods are applied to artificial and real-life datasets for a quantitative comparison. Conclusions: The choice of the evaluation measure of predictive performance is crucial for the selection of the best method. Depending on the evaluation measure, either the L2-penalized Cox regression or the random forest ensemble method yields the best survival time prediction using gene expression for the data sets used. Consensus, on which evaluation measure of predictive performance is best used, is much needed. 1 1
A study on three Linear Discriminant Analysis based methods in Small Sample Size problem Abstract
"... In this paper, we make a study on three Linear Discriminant Analysis (LDA) based ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we make a study on three Linear Discriminant Analysis (LDA) based
Advance Access publication on December 12, 2005 Survival ensembles
"... We propose a unified and flexible framework for ensemble learning in the presence of censoring. For right-censored data, we introduce a random forest algorithm and a generic gradient boosting algorithm for the construction of prognostic and diagnostic models. The methodology is utilized for predicti ..."
Abstract
- Add to MetaCart
We propose a unified and flexible framework for ensemble learning in the presence of censoring. For right-censored data, we introduce a random forest algorithm and a generic gradient boosting algorithm for the construction of prognostic and diagnostic models. The methodology is utilized for predicting the survival time of patients suffering from acute myeloid leukemia based on clinical and genetic covariates. Furthermore, we compare the diagnostic capabilities of the proposed censored data random forest and boosting methods, applied to the recurrence-free survival time of node-positive breast cancer patients, with previously published findings.
Dimensional Spaces with Independent Component Analysis
, 2004
"... The application of Independent Component Analysis (ICA) to genomic data is here considered. In recent years, microarrays have delivered to researchers huge series of measurements of gene expression levels under different experimental conditions. This work, for instance, emphasizes exploratory data a ..."
Abstract
- Add to MetaCart
The application of Independent Component Analysis (ICA) to genomic data is here considered. In recent years, microarrays have delivered to researchers huge series of measurements of gene expression levels under different experimental conditions. This work, for instance, emphasizes exploratory data analysis following experimental work on the popular Escherichia coli; the context is a typical one in which changes in gene expression values are observed after perturbing genes at an initial time and measuring the responses at regular time intervals until the steady state is achieved. The gene temporal patterns are as usual very short, and it is no exception for the application here described, as only six time points are available. This aspect combines with a very large feature space (i.e., the gene dimensionality). Thus, several kinds of fluctuations have to be monitored, and many are discarded because not significantly different from noise. ICA represents a very flexible signal processing tool which attempts to deal with noise as well, although the expected impact involves its most inherent property of delivering a decomposition of the gene profiles according to statistically independent
Statistical Learning for Analyzing Functional Genomic Data
, 2006
"... signatures single biomarkers Prognostic Factor Studies response to treatment toxicity survival Custom Drug Selection predictive factors for response/ resistance to certain therapy indicators of adverse events ..."
Abstract
- Add to MetaCart
signatures single biomarkers Prognostic Factor Studies response to treatment toxicity survival Custom Drug Selection predictive factors for response/ resistance to certain therapy indicators of adverse events
An integrative pathway-based clinical–genomic model for cancer . . .
- STATISTICS AND PROBABILITY LETTERS
, 2010
"... ..."
Structured Variable Selection in Support Vector Machines
, 710
"... Abstract: When applying the support vector machine (SVM) to highdimensional classification problems, we often impose a sparse structure in the SVM to eliminate the influences of the irrelevant predictors. The lasso and other variable selection techniques have been successfully used in the SVM to per ..."
Abstract
- Add to MetaCart
Abstract: When applying the support vector machine (SVM) to highdimensional classification problems, we often impose a sparse structure in the SVM to eliminate the influences of the irrelevant predictors. The lasso and other variable selection techniques have been successfully used in the SVM to perform automatic variable selection. In some problems, there is a natural hierarchical structure among the variables. Thus, in order to have an interpretable SVM classifier, it is important to respect the heredity principle when enforcing the sparsity in the SVM. Many variable selection methods, however, do not respect the heredity principle. In this paper we enforce both sparsity and the heredity principle in the SVM by using the so-called structured variable selection (SVS) framework originally proposed in Yuan, Joseph and Zou (2007). We minimize the empirical hinge loss under a set of linear inequality constraints and a lasso-type penalty. The solution always obeys the desired heredity principle and enjoys sparsity. The new SVM classifier can be efficiently fitted, because the optimization problem is a linear program. Another contribution of this work is to present a nonparametric extension of the SVS framework, and we propose nonparametric heredity SVMs. Simulated and read data are used to illustrate the merits of the proposed method. corresponding author. 1

