Results 1 - 10
of
30
Stability selection
"... Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. 1 2 ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. 1 2
Bayesian robust inference for differential gene expression in microarrays with multiple samples
- Biometrics
, 2006
"... We consider the problem of identifying differentially expressed genes under different conditions using cDNA microarrays. Standard statistical methods cannot be used because typically there are thousands of genes and few replicates. Because of the many steps involved in the experimental process, from ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
We consider the problem of identifying differentially expressed genes under different conditions using cDNA microarrays. Standard statistical methods cannot be used because typically there are thousands of genes and few replicates. Because of the many steps involved in the experimental process, from hybridization to image analysis, cDNA microarray data often contain outliers. For example, an outlying data value could occur because of scratches or dust on the surface, imperfections in the glass, or imperfections in the array production. We develop a robust Bayesian hierarchical model for testing for differential expression. Outliers are modeled explicitly using a t-distribution. The model includes an exchangeable prior for the variances which allow different variances for the genes but still shrink extreme empirical variances. Our model can be used for testing for differentially expressed genes among multiple samples, and can distinguish between the different possible patterns of differential expression when there are three or more samples. Parameter estimation is carried out using a novel version of Markov Chain Monte Carlo that is appropriate when the model puts mass on subspaces of the full parameter space. The method is illustrated using two publicly available
Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data
, 2005
"... ..."
Gene Selection Using Support Vector Machines With Nonconvex Penalty
- Bioinformatics
, 2006
"... Motivation: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of “high dimensional low sample size.” ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Motivation: With the development of DNA microarray technology, scientists can now measure the expression levels of thousands of genes simultaneously in one single experiment. One current difficulty in interpreting microarray data comes from their innate nature of “high dimensional low sample size.” Therefore, robust and accurate gene selection methods are required to identify differentially expressed group of genes across different samples, e.g., between cancerous and normal cells. Successful gene selection will help to classify different cancer types, lead to a better understanding of genetic signatures in cancers, and improve treatment strategies. Although gene selection and cancer classification are two closely related problems, most existing approaches handle them separately by selecting genes prior to classification. We provide
Sparse Statistical Modelling in Gene Expression Genomics
, 2006
"... The concept of sparsity is more and more central to practical data analysis and inference with increasingly high-dimensional data. Gene expression genomics is a key example context. As part of a series of projects that has developed Bayesian methodology for large-scale regression, ANOVA and latent f ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
The concept of sparsity is more and more central to practical data analysis and inference with increasingly high-dimensional data. Gene expression genomics is a key example context. As part of a series of projects that has developed Bayesian methodology for large-scale regression, ANOVA and latent factor models, we have extended traditional Bayesian “variable selection” priors and modelling ideas to new hierarchical sparsity priors that are providing substantial practical gains in addressing false discovery and isolating significant gene-specific parameters/effects in highly multivariate studies involving thousands of genes. We discuss and review these developments, in the contexts of multivariate regression, ANOVA and latent factor models for multivariate gene expression data arising in either observational or designed experimental studies. The development includes the use of sparse regression components to provide gene-sample specific normalisation/correction based on control and housekeeping factors, an important general issue and one that can be critical- and critically misleading if ignored- in many gene expression studies. Two rich data sets are used to provide context and illustration. The first data set arises from a gene expression experiment designed to investigate the transcriptional response- in terms of responsive gene subsets and their expression signatures- to interventions that up-regulate a series of key oncogenes. The second data set is observational, breast cancer tumour-derived data evaluated utilising a sparse latent factor model to define and isolate factors underlying the hugely complex patterns of association in gene expression patterns. We also mention software that implements these and other models and methods in one comprehensive framework.
High-dimensional sparse factor models and latent factor regression
- Duke University
, 2005
"... In studies of molecular profiling and biological pathway analysis using DNA microarray gene expression data we are utilising a broad class of sparse latent factor and regression models for large-scale multivariate analysis and regression prediction. We present examples of these applica-tions with di ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
In studies of molecular profiling and biological pathway analysis using DNA microarray gene expression data we are utilising a broad class of sparse latent factor and regression models for large-scale multivariate analysis and regression prediction. We present examples of these applica-tions with discussion of key aspects of the modelling and computational methodology. Our case studies are drawn from breast cancer genomics, where we are concerned with the investigation and characterisation of heterogeneity of structure related to specific oncogenic pathways, as well as predictive/prognostic uses of aggregate patterns in gene expression profiles in clinical contexts. Based on the metaphor of statistically derived “factors ” as representing biological “subpathway” structure, we explore the decomposition of fitted sparse factor models into pathway subcompo-nents, and how these components overlay multiple aspects of known biological structure in this network. We discuss the discovery and predictive uses of this approach, and the ability to use such models to generate enrichment of existing biological descriptions through identification of interactions between factors and subsequent experimental validation. We further illustrate the cou-pled use of predictive factor regression models with the high-dimensional sparse factor analysis of
Missing-value estimation using linear and non-linear regression with Bayesian gene selection
, 2003
"... Motivation: Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream ana ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Motivation: Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule.
A Bayesian approach to nonlinear probit gene selection and classification
, 2004
"... We considerth problem of gene selection and classification based on th expression data. Specifically, we propose a bootstrap Bayesian gene selectionmetht for nonlinear probit regression. A binomial probit regression modelwith data augmentation is used to transform th binomial problem into a sequence ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
We considerth problem of gene selection and classification based on th expression data. Specifically, we propose a bootstrap Bayesian gene selectionmetht for nonlinear probit regression. A binomial probit regression modelwith data augmentation is used to transform th binomial problem into a sequence of smoothc. problems.Th probit regressor is approximated as a nonlinear combination of th genes. A Gibbs sampler is employed to find th strongest genes. Some numericaltechcalcS to speed up th computation are discussed. WethM develop a nonlinear probit Bayesian classifier consisting of a linear term plus a nonlinear term,th parameters ofwhSz are estimated usingth sequential Monte Carlo techcGSqG Thch newmethGS are applied to analyze several data sets, includingth hludingc breast cancer data,th small round blue-cell tumor data, and th acute leukemia tumor data.Th experimental resultsshu th proposedmethse can effectively find important genes whsc are consistentwith th existing biological belief, and th classification accuracies are very hryc Some robustness and sensitivity properties of th proposedmethse are also discussed to dealwith noisy microarray data.
Empirical Bayes screening (EBS) of many p-values with applications to microarray studies
- Bioinformatics
, 2005
"... Motivation: Statistical tests for the detection of differentially expressed genes lead to a large collection of p-values one for each gene comparison. Without any further adjustment, these pvalues may lead to a large number of false positives, simply because the number of genes to be tested is huge, ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Motivation: Statistical tests for the detection of differentially expressed genes lead to a large collection of p-values one for each gene comparison. Without any further adjustment, these pvalues may lead to a large number of false positives, simply because the number of genes to be tested is huge, which might mean wastage of laboratory resources. To account for multiple hypotheses, these p-values are typically adjusted using a single step method or a step-down method in order to achieve an overall control of the error rate (the so called familywise error rate). In many applications, this may lead to an overly conservative strategy leading to too few genes being flagged. Results: In this paper we introduce a novel empirical Bayes screening (EBS) technique to inspect a large number of p-values in an effort to detect additional positive cases. In effect, each case borrows strength from an overall picture of the alternative hypotheses computed from all the p-values, while the entire procedure is calibrated by a step-down method so that the familywise error rate at the complete null hypothesis is still controlled. It is shown that the empirical Bayes screening has substantially higher sensitivity than the standard step-down approach for multiple comparison at the cost of a modest increase in the FDR. The EBS
Simultaneous cancer classification and gene selection with Bayesian nearest . . .
- COMPUTATIONAL STATISTICS AND DATA ANALYSIS
, 2009
"... ..."

