Results 1  10
of
84
A review of feature selection techniques in bioinformatics. Bioinformatics
 Proceedings of LBM’07) Y Saeys
, 2007
"... Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed t ..."
Abstract

Cited by 337 (9 self)
 Add to MetaCart
(Show Context)
Feature selection techniques have become an apparent need in many bioinformatics applications. In addition to the large pool of techniques that have already been developed in the machine learning and data mining fields, specific applications in bioinformatics have led to a wealth of newly proposed techniques. In this paper, we make the interested reader aware of the possibilities of feature selection, providing a basic taxonomy of feature selection techniques, and discussing their use, variety and potential in a number of both common as well as upcoming bioinformatics applications. Companion website:
Gene Selection Using Support Vector Machines With Nonconvex Penalty
 Bioinformatics
, 2006
"... Motivation: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of “high dimensional low sample size.” ..."
Abstract

Cited by 51 (2 self)
 Add to MetaCart
(Show Context)
Motivation: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of “high dimensional low sample size.” Therefore, robust and accurate gene selection methods are required to identify differentially expressed group of genes across different samples, e.g., between cancerous and normal cells. Successful gene selection will help to classify different cancer types, lead to a better understanding of genetic signatures in cancers, and improve treatment strategies. Although gene selection and cancer classification are two closely related problems, most existing approaches handle them separately by selecting genes prior to classification. We provide
Bayesian model averaging: development of an improved multiclass, gene selection and classification tool for microarray data
, 2005
"... ..."
(Show Context)
Bayesian robust inference for differential gene expression in microarrays with multiple samples
 Biometrics
"... We consider the problem of identifying differentially expressed genes under different conditions using gene expression microarrays. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outly ..."
Abstract

Cited by 42 (11 self)
 Add to MetaCart
(Show Context)
We consider the problem of identifying differentially expressed genes under different conditions using gene expression microarrays. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outlying data value could occur because of scratches or dust on the surface, imperfections in the glass, or imperfections in the array production. We develop a robust Bayesian hierarchical model for testing for differential expression. Errors are modeled explicitly using a tdistribution, which accounts for outliers. The model includes an exchangeable prior for the variances which allow different variances for the genes but still shrink extreme empirical variances. Our model can be used for testing for differentially expressed genes among multiple samples, and it can distinguish between the different possible patterns of differential expression when there are three or more samples. Parameter estimation is carried out using a novel version of Markov chain Monte Carlo that is appropriate when the model puts mass on subspaces of the full parameter space. The method is illustrated using two publicly available gene expression data sets. We compare our method to six other baseline and commonly used techniques, namely the ttest, the Bonferroniadjusted ttest, Significance Analysis of Microarrays (SAM), Efron’s empirical Bayes, and EBarrays in both its LognormalNormal and GammaGamma forms. In an experiment with HIV data, our method performed better than these alternatives, on the basis of betweenreplicate agreement and disagreement.
Sparse Statistical Modelling in Gene Expression Genomics
, 2006
"... The concept of sparsity is more and more central to practical data analysis and inference with increasingly highdimensional data. Gene expression genomics is a key example context. As part of a series of projects that has developed Bayesian methodology for largescale regression, ANOVA and latent f ..."
Abstract

Cited by 31 (10 self)
 Add to MetaCart
The concept of sparsity is more and more central to practical data analysis and inference with increasingly highdimensional data. Gene expression genomics is a key example context. As part of a series of projects that has developed Bayesian methodology for largescale regression, ANOVA and latent factor models, we have extended traditional Bayesian “variable selection” priors and modelling ideas to new hierarchical sparsity priors that are providing substantial practical gains in addressing false discovery and isolating significant genespecific parameters/effects in highly multivariate studies involving thousands of genes. We discuss and review these developments, in the contexts of multivariate regression, ANOVA and latent factor models for multivariate gene expression data arising in either observational or designed experimental studies. The development includes the use of sparse regression components to provide genesample specific normalisation/correction based on control and housekeeping factors, an important general issue and one that can be critical and critically misleading if ignored in many gene expression studies. Two rich data sets are used to provide context and illustration. The first data set arises from a gene expression experiment designed to investigate the transcriptional response in terms of responsive gene subsets and their expression signatures to interventions that upregulate a series of key oncogenes. The second data set is observational, breast cancer tumourderived data evaluated utilising a sparse latent factor model to define and isolate factors underlying the hugely complex patterns of association in gene expression patterns. We also mention software that implements these and other models and methods in one comprehensive framework.
On the consistency of Bayesian variable selection for high dimensional binary regression and classification
 Neural Comput
, 2006
"... Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1,...,xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj’s have very small effects on the response y, we ..."
Abstract

Cited by 25 (1 self)
 Add to MetaCart
(Show Context)
Bayesian variable selection has gained much empirical success recently in a variety of applications when the number K of explanatory variables (x1,...,xK) is possibly much larger than the sample size n. For generalized linear models, if most of the xj’s have very small effects on the response y, we show that it is possible to use Bayesian variable selection to reduce overfitting caused by the curse of dimensionality K ≫ n. In this approach a suitable prior can be used to choose a few out of the many xj’s to model y, so that the posterior will propose probability densities p that are “often close ” to the true density p ∗ in some sense. The closeness can be described by a Hellinger distance between p and p ∗ that scales at a power very close to n −1/2, which is the “finitedimensional rate ” corresponding to a lowdimensional situation. These findings extend some recent work of Jiang [Technical Report 0502 (2005) Dept. Statistics, Northwestern Univ.] on consistency of Bayesian variable selection for binary classification.
MSVMRFE: extensions of SVMRFE for multiclass gene selection on DNA microarray data
 Bioinformatics
, 2007
"... ..."
Missingvalue estimation using linear and nonlinear regression with Bayesian gene selection
, 2003
"... Motivation: Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream ana ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
Motivation: Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missingvalue estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule.
A Bayesian approach to nonlinear probit gene selection and classification
, 2004
"... We considerth problem of gene selection and classification based on th expression data. Specifically, we propose a bootstrap Bayesian gene selectionmetht for nonlinear probit regression. A binomial probit regression modelwith data augmentation is used to transform th binomial problem into a sequence ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
We considerth problem of gene selection and classification based on th expression data. Specifically, we propose a bootstrap Bayesian gene selectionmetht for nonlinear probit regression. A binomial probit regression modelwith data augmentation is used to transform th binomial problem into a sequence of smoothc. problems.Th probit regressor is approximated as a nonlinear combination of th genes. A Gibbs sampler is employed to find th strongest genes. Some numericaltechcalcS to speed up th computation are discussed. WethM develop a nonlinear probit Bayesian classifier consisting of a linear term plus a nonlinear term,th parameters ofwhSz are estimated usingth sequential Monte Carlo techcGSqG Thch newmethGS are applied to analyze several data sets, includingth hludingc breast cancer data,th small round bluecell tumor data, and th acute leukemia tumor data.Th experimental resultsshu th proposedmethse can effectively find important genes whsc are consistentwith th existing biological belief, and th classification accuracies are very hryc Some robustness and sensitivity properties of th proposedmethse are also discussed to dealwith noisy microarray data.