Results 11  20
of
8,361
Identifying differentially expressed genes using false discovery rate controlling procedures
 BIOINFORMATICS 19: 368–375
, 2003
"... Motivation: DNA microarrays have recently been used for the purpose of monitoring expression levels of thousands of genes simultaneously and identifying those genes that are differentially expressed. The probability that a false identification (type I error) is committed can increase sharply when th ..."
Abstract

Cited by 223 (2 self)
 Add to MetaCart
(Show Context)
Motivation: DNA microarrays have recently been used for the purpose of monitoring expression levels of thousands of genes simultaneously and identifying those genes that are differentially expressed. The probability that a false identification (type I error) is committed can increase sharply when the number of tested genes gets large. Correlation between the test statistics attributed to gene coregulation and dependency in the measurement errors of the gene expression levels further complicates the problem. In this paper we address this very large multiplicity problem by adopting the false discovery rate (FDR) controlling approach. In order to address the dependency problem, we present three resamplingbased FDR controlling procedures, that account for the test statistics distribution, and compare their performance to that of the naïve application of the linear stepup procedure in Benjamini and Hochberg (1995). The procedures are studied using simulated microarray data, and their performance is examined relative to their ease of implementation. Results: Comparative simulation analysis shows that all four FDR controlling procedures control the FDR at the desired level, and retain substantially more power then the familywise error rate controlling procedures. In terms of power, using resampling of the marginal distribution of each test statistics substantially improves the performance over the naïve one. The highest power is achieved, at the expense of a more sophisticated algorithm, by the resamplingbased procedures that resample the joint distribution of the test statistics and estimate the level of FDR control.
Microarrays, Empirical Bayes Methods, and False Discovery Rates
 Genet. Epidemiol
, 2001
"... In a classic twosample problem one might use Wilcoxon's statistic to test for a dierence between Treatment and Control subjects. The analogous microarray experiment yields thousands of Wilcoxon statistics, one for each gene on the array, and confronts the statistician with a dicult simultan ..."
Abstract

Cited by 221 (16 self)
 Add to MetaCart
In a classic twosample problem one might use Wilcoxon's statistic to test for a dierence between Treatment and Control subjects. The analogous microarray experiment yields thousands of Wilcoxon statistics, one for each gene on the array, and confronts the statistician with a dicult simultaneous inference situation. We will discuss two inferential approaches to this problem: an empirical Bayes method that requires very little a priori Bayesian modeling, and the frequentist method of \False Discovery Rates" proposed by Benjamini and Hochberg in 1995. It turns out that the two methods are closely related and can be used together to produce sensible simultaneous inferences.
Calibration and Empirical Bayes Variable Selection
 Biometrika
, 1997
"... this paper, is that with F =2logp. This choice was proposed by Foster &G eorge (1994) where it was called the Risk Inflation Criterion (RIC) because it asymptotically minimises the maximum predictive risk inflation due to selection when X is orthogonal. This choice and its minimax property were ..."
Abstract

Cited by 191 (21 self)
 Add to MetaCart
this paper, is that with F =2logp. This choice was proposed by Foster &G eorge (1994) where it was called the Risk Inflation Criterion (RIC) because it asymptotically minimises the maximum predictive risk inflation due to selection when X is orthogonal. This choice and its minimax property were also discovered independently by Donoho & Johnstone (1994) in the wavelet regression context, where they refer to it as the universal hard thresholding rule
Adapting to unknown sparsity by controlling the false discovery rate
, 2000
"... We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the order ..."
Abstract

Cited by 182 (23 self)
 Add to MetaCart
We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the ordered entries; and controlling the ℓp norm for p small. We obtain a procedure which is asymptotically minimax for ℓr loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a dataadaptive thresholding scheme, driven by control of the False Discovery Rate (FDR). FDR control is a recent innovation in simultaneous testing, in which one seeks to ensure that at most a certain fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter q also plays a controlling role in asymptotic minimaxity. Our results say that letting q = qn → 0 with problem size n is sufficient for asymptotic minimaxity, while keeping fixed q>1/2prevents asymptotic minimaxity. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2·log ( potential model size / actual model size). We exhibit a close connection with FDRcontrolling procedures having q tending to 0; this connection strongly supports a conjecture of simultaneous asymptotic minimaxity for such model selection rules.
STRING v9.1: proteinprotein interaction networks, with increased coverage and integration
 Nucleic Acids Res
, 2013
"... Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made—particularly for ce ..."
Abstract

Cited by 169 (9 self)
 Add to MetaCart
(Show Context)
Complete knowledge of all direct and indirect interactions between proteins in a given cell would represent an important milestone towards a comprehensive description of cellular mechanisms and functions. Although this goal is still elusive, considerable progress has been made—particularly for certain model organisms and functional systems. Currently, protein interactions and associations are annotated at various levels of detail in online resources, ranging from raw data repositories to highly formalized pathway databases. For many applications, a global view of all the available interaction data is desirable, including lowerquality data and/or computational predictions. The STRING database
Controlling the familywise error rate in functional neuroimaging: a comparative review
 Statistical Methods in Medical Research
, 2003
"... Functional neuroimaging data embodies a massive multiple testing problem, where 100 000 correlated test statistics must be assessed. The familywise error rate, the chance of any false positives is the standard measure of Type I errors in multiple testing. In this paper we review and evaluate three a ..."
Abstract

Cited by 167 (7 self)
 Add to MetaCart
(Show Context)
Functional neuroimaging data embodies a massive multiple testing problem, where 100 000 correlated test statistics must be assessed. The familywise error rate, the chance of any false positives is the standard measure of Type I errors in multiple testing. In this paper we review and evaluate three approaches to thresholding images of test statistics: Bonferroni, random �eld and the permutation test. Owing to recent developments, improved Bonferroni procedures, such as Hochberg’s methods, are now applicable to dependent data. Continuous random �eld methods use the smoothness of the image to adapt to the severity of the multiple testing problem. Also, increased computing power has made both permutation and bootstrap methods applicable to functional neuroimaging. We evaluate these approaches on t images using simulations and a collection of real datasets. We �nd that Bonferronirelated tests offer little improvement over Bonferroni, while the permutation method offers substantial improvement over the random �eld method for low smoothness and low degrees of freedom. We also show the limitations of trying to �nd an equivalent number of independent tests for an image of correlated test statistics. 1
Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix
 Heredity (Edinb). 2005;95:221–227. 414 PTP4A1‐PHF3‐EYS and Alcohol Dependence July–August 2014
"... Correlated multiple testing is widely performed in genetic research, particularly in multilocus analyses of complex diseases. Failure to control appropriately for the effect of multiple testing will either result in a flood of falsepositive claims or in true hits being overlooked. Cheverud proposed ..."
Abstract

Cited by 165 (0 self)
 Add to MetaCart
(Show Context)
Correlated multiple testing is widely performed in genetic research, particularly in multilocus analyses of complex diseases. Failure to control appropriately for the effect of multiple testing will either result in a flood of falsepositive claims or in true hits being overlooked. Cheverud proposed the idea of adjusting correlated tests as if they were independent, according to an ‘effective number ’ (Meff) of independent tests. However, our experience has indicated that Cheverud’s estimate of the Meff is overly large and will lead to excessively conservative results. We propose a more accurate estimate of the Meff, and design Meffbased procedures to control the experimentwise significant level and the false discovery rate. In an evaluation, based on both real and simulated data, the Meffbased procedures were able to control the error rate accurately and consequently resulted in a power increase, especially in multilocus analyses. The results confirm that the Meff is a useful concept in the errorrate control of correlated tests. With its efficiency and accuracy, the Meff method provides an alternative to computationally intensive methods such as the permutation test. Heredity advance online publication, 3 August 2005; doi:10.1038/sj.hdy.6800717
On testing the significance of sets of genes
 Annals of Applied Statistics
"... This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways derived from biological databases. Our starting point is the interesting Gene Set Enrichment Analysis ..."
Abstract

Cited by 164 (3 self)
 Add to MetaCart
This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways derived from biological databases. Our starting point is the interesting Gene Set Enrichment Analysis (GSEA) procedure of Subramanian et al. (2005). We study the problem in some generality and propose two potential improvements to GSEA: the maxmean statistic for summarizing genesets, and restandardization for more accurate inferences. We discuss a variety of examples and extensions, including the use of geneset scores for class predictions. We also describe a new R language package GSA that implements our ideas. 1
Higher criticism for detecting sparse heterogeneous mixtures
 Ann. Statist
, 2004
"... Higher Criticism, or secondlevel significance testing, is a multiple comparisons concept mentioned in passing by Tukey (1976). It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested to compare the ..."
Abstract

Cited by 160 (23 self)
 Add to MetaCart
(Show Context)
Higher Criticism, or secondlevel significance testing, is a multiple comparisons concept mentioned in passing by Tukey (1976). It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested to compare the fraction of observed significances at a given αlevel to the expected fraction under the joint null, in fact he suggested to standardize the difference of the two quantities and form a zscore; the resulting zscore tests the significance of the body of significance tests. We consider a generalization, where we maximize this zscore over a range of significance levels 0 < α ≤ α0. We are able to show that the resulting Higher Criticism statistic is effective at resolving a very subtle testing problem: testing whether n normal means are all zero versus the alternative that a small fraction is nonzero. The subtlety of this ‘sparse normal means ’ testing problem can be seen from work of Ingster (1999) and Jin (2002), who studied such problems in great detail. In their studies, they identified an interesting range of cases where the small fraction of nonzero means is so
ResamplingBased Multiple Testing for Microarray Data Analysis
, 2003
"... The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. In their 1993 book, We ..."
Abstract

Cited by 141 (3 self)
 Add to MetaCart
The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. In their 1993 book, Westfall & Young propose resamplingbased pvalue adjustment procedures which are highly relevant to microarray experiments. This article discusses different criteria for error control in resamplingbased multiple testing, including (a) the family wise error rate of Westfall & Young (1993) and (b) the false discovery rate developed by Benjamini & Hochberg (1995), both from a frequentist viewpoint; and (c) the positive false discovery rate of Storey (2002), which has a Bayesian motivation. We also introduce our recently developed fast algorithm for implementing the minP adjustment to control familywise error rate. Adjusted pvalues for different approaches are applied to gene expression data from two recently published microarray studies. The properties of these procedures for multiple testing are compared.