Results 1  10
of
21
The Entire Regularization Path for the Support Vector Machine
, 2004
"... In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. ..."
Abstract

Cited by 159 (9 self)
 Add to MetaCart
(Show Context)
In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model.
An Empirical Bayes Approach to Inferring LargeScale Gene Association Networks
 BIOINFORMATICS
, 2004
"... Motivation: Genetic networks are often described statistically by graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standar ..."
Abstract

Cited by 151 (6 self)
 Add to MetaCart
Motivation: Genetic networks are often described statistically by graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an “illposed” inverse problem. Methods: We introduce a novel framework for smallsample inference of graphical models from gene expression data. Specifically, we focus on socalled graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (i) improved (regularized) smallsample point estimates of partial correlation, (ii) an exact test of edge inclusion with adaptive estimation of the degree of freedom, and (iii) a heuristic network search based on false discovery rate multiple testing. Steps (ii) and (iii) correspond to an empirical Bayes estimate of the network topology. Results: Using computer simulations we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for smallsample data sets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding largescale gene association network for 3,883 genes. Availability: The authors have implemented the approach in the R package “GeneTS ” that is freely available from
Prediction by supervised principal components
 Journal of the American Statistical Association
, 2006
"... In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal co ..."
Abstract

Cited by 63 (7 self)
 Add to MetaCart
(Show Context)
In regression problems where the number of predictors greatly exceeds the number of observations, conventional regression techniques may produce unsatisfactory results. We describe a technique called supervised principal components that can be applied to this type of problem. Supervised principal components is similar to conventional principal components analysis except that it uses a subset of the predictors selected based on their association with the outcome. Supervised principal components can be applied to regression and generalized regression problems, such as survival analysis. It compares favorably to other techniques for this type of problem, and can also account for the effects of other covariates and help identify which predictor variables are most important. We also provide asymptotic consistency results to help support our empirical findings. These methods could become important tools for DNA microarray data, where they may be used to more accurately diagnose and treat cancer. KEY WORDS: Gene expression; Microarray; Regression; Survival analysis. 1.
Margin trees for highdimensional classification
 Journal of Machine Learning Research
"... We propose a method for the classification of more than two classes, from highdimensional features. Our approach is to build a binary decision tree in a topdown manner, using the optimal margin classifier at each split. We implement an exact greedy algorithm for this task, and compare its performa ..."
Abstract

Cited by 18 (0 self)
 Add to MetaCart
We propose a method for the classification of more than two classes, from highdimensional features. Our approach is to build a binary decision tree in a topdown manner, using the optimal margin classifier at each split. We implement an exact greedy algorithm for this task, and compare its performance to less greedy procedures based on clustering of the matrix of pairwise margins,. We compare the performance of the “margin tree ” to the closely related “allpairs ” (one versus one) support vector machine, and nearest centroids on a number of cancer microarray datasets. We also develop a simple method for feature selection. We find that the margin tree has accuracy that is competitive with other methods and offers additional interpretability in its putative grouping of the classes.
BioMed Central
, 2006
"... A novel approach to phylogenetic tree construction using stochastic optimization and clustering ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
A novel approach to phylogenetic tree construction using stochastic optimization and clustering
Survival prediction using gene expression data: a review and comparison. submitted
, 2007
"... Background: Knowledge of the transcription of the humane genome might greatly enhance our understanding of cancer. In particular, gene expression may be used to predict the survival of cancer patients. A microarray measures the expression of thousands of genes simultaneously. The highdimensionality ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Background: Knowledge of the transcription of the humane genome might greatly enhance our understanding of cancer. In particular, gene expression may be used to predict the survival of cancer patients. A microarray measures the expression of thousands of genes simultaneously. The highdimensionality of the data poses the following problem: the number of covariates (∼10000) greatly exceeds the number of samples (∼200). Results: Here we give an inventory of methods that have been used to model survival using gene expression. These methods are critically reviewed and compared in a qualitative way. Finally, the methods are applied to artificial and reallife datasets for a quantitative comparison. Conclusions: The choice of the evaluation measure of predictive performance is crucial for the selection of the best method. Depending on the evaluation measure, either the L2penalized Cox regression or the random forest ensemble method yields the best survival time prediction using gene expression for the data sets used. Consensus, on which evaluation measure of predictive performance is best used, is much needed. 1 1
Computational Statistics and Data Analysis
"... This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal noncommercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or sel ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal noncommercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit:
Structured Variable Selection in Support Vector Machines
, 710
"... Abstract: When applying the support vector machine (SVM) to highdimensional classification problems, we often impose a sparse structure in the SVM to eliminate the influences of the irrelevant predictors. The lasso and other variable selection techniques have been successfully used in the SVM to per ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract: When applying the support vector machine (SVM) to highdimensional classification problems, we often impose a sparse structure in the SVM to eliminate the influences of the irrelevant predictors. The lasso and other variable selection techniques have been successfully used in the SVM to perform automatic variable selection. In some problems, there is a natural hierarchical structure among the variables. Thus, in order to have an interpretable SVM classifier, it is important to respect the heredity principle when enforcing the sparsity in the SVM. Many variable selection methods, however, do not respect the heredity principle. In this paper we enforce both sparsity and the heredity principle in the SVM by using the socalled structured variable selection (SVS) framework originally proposed in Yuan, Joseph and Zou (2007). We minimize the empirical hinge loss under a set of linear inequality constraints and a lassotype penalty. The solution always obeys the desired heredity principle and enjoys sparsity. The new SVM classifier can be efficiently fitted, because the optimization problem is a linear program. Another contribution of this work is to present a nonparametric extension of the SVS framework, and we propose nonparametric heredity SVMs. Simulated and read data are used to illustrate the merits of the proposed method. corresponding author. 1
A study on three Linear Discriminant Analysis based methods in Small Sample Size problem Abstract
"... In this paper, we make a study on three Linear Discriminant Analysis (LDA) based ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In this paper, we make a study on three Linear Discriminant Analysis (LDA) based