Results 1  10
of
217
Linear models and empirical Bayes methods for assessing differential expression in microarray experiments
 STAT. APPL. GENET. MOL. BIOL
, 2004
"... ..."
A direct approach to false discovery rates
, 2002
"... Summary. Multiplehypothesis testing involves guarding against much more complicated errors than singlehypothesis testing. Whereas we typically control the type I error rate for a singlehypothesis test, a compound error rate is controlled for multiplehypothesis tests. For example, controlling the ..."
Abstract

Cited by 347 (6 self)
 Add to MetaCart
Summary. Multiplehypothesis testing involves guarding against much more complicated errors than singlehypothesis testing. Whereas we typically control the type I error rate for a singlehypothesis test, a compound error rate is controlled for multiplehypothesis tests. For example, controlling the false discovery rate FDR traditionally involves intricate sequential pvalue rejection methods based on the observed data. Whereas a sequential pvalue method fixes the error rate and estimates its corresponding rejection region, we propose the opposite approach—we fix the rejection region and then estimate its corresponding error rate. This new approach offers increased applicability, accuracy and power. We apply the methodology to both the positive false discovery rate pFDR and FDR, and provide evidence for its benefits. It is shown that pFDR is probably the quantity of interest over FDR. Also discussed is the calculation of the qvalue, the pFDR analogue of the pvalue, which eliminates the need to set the error rate beforehand as is traditionally done. Some simple numerical examples are presented that show that this new approach can yield an increase of over eight times in power compared with the Benjamini–Hochberg FDR method.
Largescale simultaneous hypothesis testing: the choice of a null hypothesis
 JASA
, 2004
"... Current scientific techniques in genomics and image processing routinely produce hypothesis testing problems with hundreds or thousands of cases to consider simultaneously. This poses new difficulties for the statistician, but also opens new opportunities. In particular it allows empirical estimatio ..."
Abstract

Cited by 162 (14 self)
 Add to MetaCart
Current scientific techniques in genomics and image processing routinely produce hypothesis testing problems with hundreds or thousands of cases to consider simultaneously. This poses new difficulties for the statistician, but also opens new opportunities. In particular it allows empirical estimation of an appropriate null hypothesis. The empirical null may be considerably more dispersed than the usual theoretical null distribution that would be used for any one case considered separately. An empirical Bayes analysis plan for this situation is developed, using a local version of the false discovery rate to examine the inference issues. Two genomics problems are used as examples to show the importance of correctly choosing the null hypothesis. Key Words: local false discovery rate, empirical Bayes, microarray analysis, empirical null hypothesis, unobserved covariates
The positive false discovery rate: A Bayesian interpretation and the qvalue
 Annals of Statistics
, 2003
"... Multiple hypothesis testing is concerned with controlling the rate of false positives when testing several hypotheses simultaneously. One multiple hypothesis testing error measure is the false discovery rate (FDR), which is loosely defined to be the expected proportion of false positives among all s ..."
Abstract

Cited by 158 (3 self)
 Add to MetaCart
Multiple hypothesis testing is concerned with controlling the rate of false positives when testing several hypotheses simultaneously. One multiple hypothesis testing error measure is the false discovery rate (FDR), which is loosely defined to be the expected proportion of false positives among all significant hypotheses. The FDR is especially appropriate for exploratory analyses in which one is interested in finding several significant results among many tests. In this work, we introduce a modified version of the FDR called the “positive false discovery rate ” (pFDR). We discuss the advantages and disadvantages of the pFDR and investigate its statistical properties. When assuming the test statistics follow a mixture distribution, we show that the pFDR can be written as a Bayesian posterior probability and can be connected to classification theory. These properties remain asymptotically true under fairly general conditions, even under certain forms of dependence. Also, a new quantity called the “qvalue ” is introduced and investigated, which is a natural “Bayesian posterior pvalue, ” or rather the pFDR analogue of the pvalue.
Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19: 368–375
, 2003
"... Motivation: DNA microarrays have recently been used for the purpose of monitoring expression levels of thousands of genes simultaneously and identifying those genes that are differentially expressed. The probability that a false identification (type I error) is committed can increase sharply when th ..."
Abstract

Cited by 122 (2 self)
 Add to MetaCart
Motivation: DNA microarrays have recently been used for the purpose of monitoring expression levels of thousands of genes simultaneously and identifying those genes that are differentially expressed. The probability that a false identification (type I error) is committed can increase sharply when the number of tested genes gets large. Correlation between the test statistics attributed to gene coregulation and dependency in the measurement errors of the gene expression levels further complicates the problem. In this paper we address this very large multiplicity problem by adopting the false discovery rate (FDR) controlling approach. In order to address the dependency problem, we present three resamplingbased FDR controlling procedures, that account for the test statistics distribution, and compare their performance to that of the naïve application of the linear stepup procedure in Benjamini and Hochberg (1995). The procedures are studied using simulated microarray data, and their performance is examined relative to their ease of implementation. Results: Comparative simulation analysis shows that all four FDR controlling procedures control the FDR at the desired level, and retain substantially more power then the familywise error rate controlling procedures. In terms of power, using resampling of the marginal distribution of each test statistics substantially improves the performance over the naïve one. The highest power is achieved, at the expense of a more sophisticated algorithm, by the resamplingbased procedures that resample the joint distribution of the test statistics and estimate the level of FDR control.
Microarrays, Empirical Bayes Methods, and False Discovery Rates
 Genet. Epidemiol
, 2001
"... In a classic twosample problem one might use Wilcoxon's statistic to test for a dierence between Treatment and Control subjects. The analogous microarray experiment yields thousands of Wilcoxon statistics, one for each gene on the array, and confronts the statistician with a dicult simultaneous ..."
Abstract

Cited by 118 (16 self)
 Add to MetaCart
In a classic twosample problem one might use Wilcoxon's statistic to test for a dierence between Treatment and Control subjects. The analogous microarray experiment yields thousands of Wilcoxon statistics, one for each gene on the array, and confronts the statistician with a dicult simultaneous inference situation. We will discuss two inferential approaches to this problem: an empirical Bayes method that requires very little a priori Bayesian modeling, and the frequentist method of \False Discovery Rates" proposed by Benjamini and Hochberg in 1995. It turns out that the two methods are closely related and can be used together to produce sensible simultaneous inferences.
Adapting to unknown sparsity by controlling the false discovery rate
, 2000
"... We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the order ..."
Abstract

Cited by 108 (15 self)
 Add to MetaCart
We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the ordered entries; and controlling the ℓp norm for p small. We obtain a procedure which is asymptotically minimax for ℓr loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a dataadaptive thresholding scheme, driven by control of the False Discovery Rate (FDR). FDR control is a recent innovation in simultaneous testing, in which one seeks to ensure that at most a certain fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter q also plays a controlling role in asymptotic minimaxity. Our results say that letting q = qn → 0 with problem size n is sufficient for asymptotic minimaxity, while keeping fixed q>1/2prevents asymptotic minimaxity. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2·log ( potential model size / actual model size). We exhibit a close connection with FDRcontrolling procedures having q tending to 0; this connection strongly supports a conjecture of simultaneous asymptotic minimaxity for such model selection rules.
Use of withinarray replicate spots for assessing differential expression in microarray experiments
 Bioinformatics
, 2005
"... Motivation. Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. Usual practice is to average the duplicate or triplicate results for each probe before assessing ..."
Abstract

Cited by 86 (3 self)
 Add to MetaCart
Motivation. Spotted arrays are often printed with probes in duplicate or triplicate, but current methods for assessing differential expression are not able to make full use of the resulting information. Usual practice is to average the duplicate or triplicate results for each probe before assessing differential expression. This loses valuable information about genewise variability. Results. A method is proposed for extracting more information from withinarray replicate spots in microarray experiments by estimating the strength of the correlation between them. The method involves fitting separate linear models to the expression data for each gene but with a common value for the betweenreplicate correlation. The method greatly improves the precision with which the genewise variances are estimated and thereby improves inference methods designed to identify differentially expressed genes. The method may be combined with empirical Bayes methods for moderating the genewise variances between genes. The method is validated using data from a microarray experiment involving calibration and ratio control spots in conjunction with spikedin RNA. Comparing results for calibration and ratio control spots shows that the common correlation method results in substantially better discrimination of differentially expressed genes from those which are not. The spikein experiment also confirms that the results may be further improved by empirical Bayes smoothing of the variances when the sample size is small. Availability. The methodology is implemented in the limma software package for R, available from the CRAN repository
A stochastic process approach to False discovery rates
, 2001
"... This paper extends the theory of false discovery rates (FDR) pioneered by Benjamini and Hochberg (1995). We develop a framework in which the False Discovery Proportion (FDP) – the number of false rejections divided by the number of rejections – is treated as a stochastic process. After obtaining th ..."
Abstract

Cited by 81 (6 self)
 Add to MetaCart
This paper extends the theory of false discovery rates (FDR) pioneered by Benjamini and Hochberg (1995). We develop a framework in which the False Discovery Proportion (FDP) – the number of false rejections divided by the number of rejections – is treated as a stochastic process. After obtaining the limiting distribution of the process, we demonstrate the validitiy of a class of procedures for controlling the False Discovery Rate (the expected FDP). We construct a confidence envelope for the whole FDP process. From these envelopes we derive confidence thresholds, for controlling the quantiles of the distribution of the FDP as well as controlling the number of false discoveries. We also
ResamplingBased Multiple Testing for Microarray Data Analysis
, 2003
"... The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. In their 1993 book, We ..."
Abstract

Cited by 66 (1 self)
 Add to MetaCart
The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. In their 1993 book, Westfall & Young propose resamplingbased pvalue adjustment procedures which are highly relevant to microarray experiments. This article discusses different criteria for error control in resamplingbased multiple testing, including (a) the family wise error rate of Westfall & Young (1993) and (b) the false discovery rate developed by Benjamini & Hochberg (1995), both from a frequentist viewpoint; and (c) the positive false discovery rate of Storey (2002), which has a Bayesian motivation. We also introduce our recently developed fast algorithm for implementing the minP adjustment to control familywise error rate. Adjusted pvalues for different approaches are applied to gene expression data from two recently published microarray studies. The properties of these procedures for multiple testing are compared.