Results 1  10
of
6,880
The control of the false discovery rate in multiple testing under dependency
 Annals of Statistics
, 2001
"... Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than comparab ..."
Abstract

Cited by 931 (17 self)
 Add to MetaCart
Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than comparable procedures which control the traditional familywise error rate. We prove that this same procedure also controls the false discovery rate when the test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses. This condition for positive dependency is general enough to cover many problems of practical interest, including the comparisons of many treatments with a single control, multivariate normal test statistics with positive correlation matrix and multivariate t. Furthermore, the test statistics may be discrete, and the tested hypotheses composite without posing special difficulties. For all other forms of dependency, a simple conservative modification of the procedure controls the false discovery rate. Thus the range of problems for which
A direct approach to false discovery rates
, 2002
"... Summary. Multiplehypothesis testing involves guarding against much more complicated errors than singlehypothesis testing. Whereas we typically control the type I error rate for a singlehypothesis test, a compound error rate is controlled for multiplehypothesis tests. For example, controlling the ..."
Abstract

Cited by 641 (13 self)
 Add to MetaCart
Summary. Multiplehypothesis testing involves guarding against much more complicated errors than singlehypothesis testing. Whereas we typically control the type I error rate for a singlehypothesis test, a compound error rate is controlled for multiplehypothesis tests. For example, controlling the false discovery rate FDR traditionally involves intricate sequential pvalue rejection methods based on the observed data. Whereas a sequential pvalue method fixes the error rate and estimates its corresponding rejection region, we propose the opposite approach—we fix the rejection region and then estimate its corresponding error rate. This new approach offers increased applicability, accuracy and power. We apply the methodology to both the positive false discovery rate pFDR and FDR, and provide evidence for its benefits. It is shown that pFDR is probably the quantity of interest over FDR. Also discussed is the calculation of the qvalue, the pFDR analogue of the pvalue, which eliminates the need to set the error rate beforehand as is traditionally done. Some simple numerical examples are presented that show that this new approach can yield an increase of over eight times in power compared with the Benjamini–Hochberg FDR method.
Limma: linear models for microarray data
 Bioinformatics and Computational Biology Solutions using R and Bioconductor
, 2005
"... This free opensource software implements academic research by the authors and coworkers. If you use it, please support the project by citing the appropriate journal articles listed in Section 2.1.Contents ..."
Abstract

Cited by 636 (12 self)
 Add to MetaCart
(Show Context)
This free opensource software implements academic research by the authors and coworkers. If you use it, please support the project by citing the appropriate journal articles listed in Section 2.1.Contents
Thresholding of statistical maps in functional neuroimaging using the false discovery rate
 Neuroimage
, 2002
"... Finding objective and effective thresholds for voxelwise statistics derived from neuroimaging data has been a longstanding problem. With at least one test performed for every voxel in an image, some correction of the thresholds is needed to control the error rates, but standard procedures for multi ..."
Abstract

Cited by 443 (7 self)
 Add to MetaCart
(Show Context)
Finding objective and effective thresholds for voxelwise statistics derived from neuroimaging data has been a longstanding problem. With at least one test performed for every voxel in an image, some correction of the thresholds is needed to control the error rates, but standard procedures for multiple hypothesis testing (e.g., Bonferroni) tend to not be sensitive enough to be useful in this context. This paper introduces to the neuroscience literature statistical procedures for controlling the false discovery rate (FDR). Recent theoretical work in statistics suggests that FDRcontrolling procedures will be effective for the analysis of neuroimaging data. These procedures operate simultaneously on all voxelwise test statistics to determine which tests should be considered statistically significant. The innovation of the procedures is that they control the expected proportion of the rejected hypotheses that are falsely rejected. We demonstrate this approach using both simulations and functional magnetic resonance imaging data from two
Empirical Bayes Analysis of a Microarray Experiment
 Journal of the American Statistical Association
, 2001
"... Microarrays are a novel technology that facilitates the simultaneous measurement of thousands of gene expression levels. A typical microarray experiment can produce millions of data points, raising serious problems of data reduction, and simultaneous inference. We consider one such experiment in whi ..."
Abstract

Cited by 429 (20 self)
 Add to MetaCart
Microarrays are a novel technology that facilitates the simultaneous measurement of thousands of gene expression levels. A typical microarray experiment can produce millions of data points, raising serious problems of data reduction, and simultaneous inference. We consider one such experiment in which oligonucleotide arrays were employed to assess the genetic effects of ionizing radiation on seven thousand human genes. A simple nonparametric empirical Bayes model is introduced, which is used to guide the ef � cient reduction of the data to a single summary statistic per gene, and also to make simultaneous inferences concerning which genes were affected by the radiation. Although our focus is on one speci � c experiment, the proposed methods can be applied quite generally. The empirical Bayes inferences are closely related to the frequentist false discovery rate (FDR) criterion. 1.
The positive false discovery rate: A Bayesian interpretation and the qvalue
 Annals of Statistics
, 2003
"... Multiple hypothesis testing is concerned with controlling the rate of false positives when testing several hypotheses simultaneously. One multiple hypothesis testing error measure is the false discovery rate (FDR), which is loosely defined to be the expected proportion of false positives among all s ..."
Abstract

Cited by 282 (8 self)
 Add to MetaCart
Multiple hypothesis testing is concerned with controlling the rate of false positives when testing several hypotheses simultaneously. One multiple hypothesis testing error measure is the false discovery rate (FDR), which is loosely defined to be the expected proportion of false positives among all significant hypotheses. The FDR is especially appropriate for exploratory analyses in which one is interested in finding several significant results among many tests. In this work, we introduce a modified version of the FDR called the “positive false discovery rate ” (pFDR). We discuss the advantages and disadvantages of the pFDR and investigate its statistical properties. When assuming the test statistics follow a mixture distribution, we show that the pFDR can be written as a Bayesian posterior probability and can be connected to classification theory. These properties remain asymptotically true under fairly general conditions, even under certain forms of dependence. Also, a new quantity called the “qvalue ” is introduced and investigated, which is a natural “Bayesian posterior pvalue, ” or rather the pFDR analogue of the pvalue.
Largescale simultaneous hypothesis testing: the choice of a null hypothesis
 JASA
, 2004
"... Current scientific techniques in genomics and image processing routinely produce hypothesis testing problems with hundreds or thousands of cases to consider simultaneously. This poses new difficulties for the statistician, but also opens new opportunities. In particular it allows empirical estimatio ..."
Abstract

Cited by 262 (15 self)
 Add to MetaCart
Current scientific techniques in genomics and image processing routinely produce hypothesis testing problems with hundreds or thousands of cases to consider simultaneously. This poses new difficulties for the statistician, but also opens new opportunities. In particular it allows empirical estimation of an appropriate null hypothesis. The empirical null may be considerably more dispersed than the usual theoretical null distribution that would be used for any one case considered separately. An empirical Bayes analysis plan for this situation is developed, using a local version of the false discovery rate to examine the inference issues. Two genomics problems are used as examples to show the importance of correctly choosing the null hypothesis. Key Words: local false discovery rate, empirical Bayes, microarray analysis, empirical null hypothesis, unobserved covariates
Sparse solution of underdetermined linear equations by stagewise orthogonal matching pursuit
, 2006
"... Finding the sparsest solution to underdetermined systems of linear equations y = Φx is NPhard in general. We show here that for systems with ‘typical’/‘random ’ Φ, a good approximation to the sparsest solution is obtained by applying a fixed number of standard operations from linear algebra. Our pr ..."
Abstract

Cited by 261 (23 self)
 Add to MetaCart
(Show Context)
Finding the sparsest solution to underdetermined systems of linear equations y = Φx is NPhard in general. We show here that for systems with ‘typical’/‘random ’ Φ, a good approximation to the sparsest solution is obtained by applying a fixed number of standard operations from linear algebra. Our proposal, Stagewise Orthogonal Matching Pursuit (StOMP), successively transforms the signal into a negligible residual. Starting with initial residual r0 = y, at the sth stage it forms the ‘matched filter ’ Φ T rs−1, identifies all coordinates with amplitudes exceeding a speciallychosen threshold, solves a leastsquares problem using the selected coordinates, and subtracts the leastsquares fit, producing a new residual. After a fixed number of stages (e.g. 10), it stops. In contrast to Orthogonal Matching Pursuit (OMP), many coefficients can enter the model at each stage in StOMP while only one enters per stage in OMP; and StOMP takes a fixed number of stages (e.g. 10), while OMP can take many (e.g. n). StOMP runs much faster than competing proposals for sparse solutions, such as ℓ1 minimization and OMP, and so is attractive for solving largescale problems. We use phase diagrams to compare algorithm performance. The problem of recovering a ksparse vector x0 from (y, Φ) where Φ is random n × N and y = Φx0 is represented by a point (n/N, k/n)
A Shrinkage Approach to LargeScale Covariance Matrix Estimation and Implications for Functional Genomics
, 2005
"... ..."
An Empirical Bayes Approach to Inferring LargeScale Gene Association Networks
 BIOINFORMATICS
, 2004
"... Motivation: Genetic networks are often described statistically by graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standar ..."
Abstract

Cited by 210 (6 self)
 Add to MetaCart
Motivation: Genetic networks are often described statistically by graphical models (e.g. Bayesian networks). However, inferring the network structure offers a serious challenge in microarray analysis where the sample size is small compared to the number of considered genes. This renders many standard algorithms for graphical models inapplicable, and inferring genetic networks an “illposed” inverse problem. Methods: We introduce a novel framework for smallsample inference of graphical models from gene expression data. Specifically, we focus on socalled graphical Gaussian models (GGMs) that are now frequently used to describe gene association networks and to detect conditionally dependent genes. Our new approach is based on (i) improved (regularized) smallsample point estimates of partial correlation, (ii) an exact test of edge inclusion with adaptive estimation of the degree of freedom, and (iii) a heuristic network search based on false discovery rate multiple testing. Steps (ii) and (iii) correspond to an empirical Bayes estimate of the network topology. Results: Using computer simulations we investigate the sensitivity (power) and specificity (true negative rate) of the proposed framework to estimate GGMs from microarray data. This shows that it is possible to recover the true network topology with high accuracy even for smallsample data sets. Subsequently, we analyze gene expression data from a breast cancer tumor study and illustrate our approach by inferring a corresponding largescale gene association network for 3,883 genes. Availability: The authors have implemented the approach in the R package “GeneTS ” that is freely available from