Results 1  10
of
235
The positive false discovery rate: A Bayesian interpretation and the qvalue
 Annals of Statistics
, 2003
"... Multiple hypothesis testing is concerned with controlling the rate of false positives when testing several hypotheses simultaneously. One multiple hypothesis testing error measure is the false discovery rate (FDR), which is loosely defined to be the expected proportion of false positives among all s ..."
Abstract

Cited by 337 (8 self)
 Add to MetaCart
Multiple hypothesis testing is concerned with controlling the rate of false positives when testing several hypotheses simultaneously. One multiple hypothesis testing error measure is the false discovery rate (FDR), which is loosely defined to be the expected proportion of false positives among all significant hypotheses. The FDR is especially appropriate for exploratory analyses in which one is interested in finding several significant results among many tests. In this work, we introduce a modified version of the FDR called the “positive false discovery rate ” (pFDR). We discuss the advantages and disadvantages of the pFDR and investigate its statistical properties. When assuming the test statistics follow a mixture distribution, we show that the pFDR can be written as a Bayesian posterior probability and can be connected to classification theory. These properties remain asymptotically true under fairly general conditions, even under certain forms of dependence. Also, a new quantity called the “qvalue ” is introduced and investigated, which is a natural “Bayesian posterior pvalue, ” or rather the pFDR analogue of the pvalue.
Adapting to unknown sparsity by controlling the false discovery rate
, 2000
"... We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the order ..."
Abstract

Cited by 183 (23 self)
 Add to MetaCart
We attempt to recover a highdimensional vector observed in white noise, where the vector is known to be sparse, but the degree of sparsity is unknown. We consider three different ways of defining sparsity of a vector: using the fraction of nonzero terms; imposing powerlaw decay bounds on the ordered entries; and controlling the ℓp norm for p small. We obtain a procedure which is asymptotically minimax for ℓr loss, simultaneously throughout a range of such sparsity classes. The optimal procedure is a dataadaptive thresholding scheme, driven by control of the False Discovery Rate (FDR). FDR control is a recent innovation in simultaneous testing, in which one seeks to ensure that at most a certain fraction of the rejected null hypotheses will correspond to false rejections. In our treatment, the FDR control parameter q also plays a controlling role in asymptotic minimaxity. Our results say that letting q = qn → 0 with problem size n is sufficient for asymptotic minimaxity, while keeping fixed q>1/2prevents asymptotic minimaxity. To our knowledge, this relation between ideas in simultaneous inference and asymptotic decision theory is new. Our work provides a new perspective on a class of model selection rules which has been introduced recently by several authors. These new rules impose complexity penalization of the form 2·log ( potential model size / actual model size). We exhibit a close connection with FDRcontrolling procedures having q tending to 0; this connection strongly supports a conjecture of simultaneous asymptotic minimaxity for such model selection rules.
ResamplingBased Multiple Testing for Microarray Data Analysis
, 2003
"... The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. In their 1993 book, We ..."
Abstract

Cited by 145 (3 self)
 Add to MetaCart
The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. In their 1993 book, Westfall & Young propose resamplingbased pvalue adjustment procedures which are highly relevant to microarray experiments. This article discusses different criteria for error control in resamplingbased multiple testing, including (a) the family wise error rate of Westfall & Young (1993) and (b) the false discovery rate developed by Benjamini & Hochberg (1995), both from a frequentist viewpoint; and (c) the positive false discovery rate of Storey (2002), which has a Bayesian motivation. We also introduce our recently developed fast algorithm for implementing the minP adjustment to control familywise error rate. Adjusted pvalues for different approaches are applied to gene expression data from two recently published microarray studies. The properties of these procedures for multiple testing are compared.
Correlation and LargeScale Simultaneous Significance Testing
 Journal of the American Statistical Association
"... Largescale hypothesis testing problems, with hundreds or thousands of test statistics “zi ” to consider at once, have become familiar in current practice. Applications of popular analysis methods such as false discovery rate techniques do not require independence of the zi’s, but their accuracy can ..."
Abstract

Cited by 97 (8 self)
 Add to MetaCart
Largescale hypothesis testing problems, with hundreds or thousands of test statistics “zi ” to consider at once, have become familiar in current practice. Applications of popular analysis methods such as false discovery rate techniques do not require independence of the zi’s, but their accuracy can be compromised in highcorrelation situations. This paper presents computational and theoretical methods for assessing the size and effect of correlation in largescale testing. A simple theory leads to the identification of a single omnibus measure of correlation. The theory relates to the correct choice of a null distribution for simultaneous significance testing, and its effect on inference. 1. Introduction Modern computing machinery and improved scientific equipment have combined to revolutionize experimentation in fields such as biology, medicine, genetics, and neuroscience. One effect on statistics has been to vastly magnify the scope of multiple hypothesis testing, now often involving thousands of cases considered simultaneously. The cases themselves are typically of familiar form, each perhaps a simple twosample comparison,
Microarrays, empirical Bayes and the twogroups model
 STATIST. SCI
, 2006
"... The classic frequentist theory of hypothesis testing developed by Neyman, Pearson, and Fisher has a claim to being the Twentieth Century’s most influential piece of applied mathematics. Something new is happening in the TwentyFirst Century: high throughput devices, such as microarrays, routinely re ..."
Abstract

Cited by 75 (10 self)
 Add to MetaCart
The classic frequentist theory of hypothesis testing developed by Neyman, Pearson, and Fisher has a claim to being the Twentieth Century’s most influential piece of applied mathematics. Something new is happening in the TwentyFirst Century: high throughput devices, such as microarrays, routinely require simultaneous hypothesis tests for thousands of individual cases, not at all what the classical theory had in mind. In these situations empirical Bayes information begins to force itself upon frequentists and Bayesians alike. The twogroups model is a simple Bayesian construction that facilitates empirical Bayes analysis. This article concerns the interplay of Bayesian and frequentist ideas in the twogroups setting, with particular attention focussed on Benjamini and Hochberg’s False Discovery Rate method. Topics include the choice and meaning of the null hypothesis in largescale testing situations, power considerations, the limitations of permutation methods, significance testing for groups of cases (such as pathways in microarray studies), correlation effects, multiple confidence intervals, and Bayesian competitors to the twogroups model.
Statistical challenges with high dimensionality: feature selection in knowledge discovery
, 2006
"... ..."
(Show Context)
False discovery rate: adjusted multiple confidence intervals for selected parameters [with comments, rejoinder]. J Am Stat Assoc
, 2005
"... Full terms and conditions of use: ..."
False Discoveries in Mutual Fund Performance: Measuring Luck in Estimated Alphas
 Journal of Finance
, 2010
"... and SGF 2006 for their helpful comments. The first and second authors acknowledge ..."
Abstract

Cited by 65 (6 self)
 Add to MetaCart
and SGF 2006 for their helpful comments. The first and second authors acknowledge
An evaluation of thresholding techniques in fMRI analysis
, 2004
"... This paper reviews and compares individual voxelwise thresholding methods for identifying active voxels in singlesubject fMRI datasets. Different error rates are described which may be used to calibrate activation thresholds. We discuss methods which control each of the error rates at a prespecifi ..."
Abstract

Cited by 53 (21 self)
 Add to MetaCart
This paper reviews and compares individual voxelwise thresholding methods for identifying active voxels in singlesubject fMRI datasets. Different error rates are described which may be used to calibrate activation thresholds. We discuss methods which control each of the error rates at a prespecified level a, including simple procedures which ignore spatial correlation among the test statistics as well as more elaborate ones which incorporate this correlation information. The operating characteristics of the methods are shown through a simulation study, indicating that the error rate used has an important impact on the sensitivity of the thresholding method, but that accounting for correlation has little impact. Therefore, the simple procedures described work well for thresholding most singlesubject fMRI experiments and are recommended. The methods are illustrated with a real bilateral finger tapping experiment
Size, power and false discovery rates
, 2007
"... Modern scientific technology has provided a new class of largescale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
Modern scientific technology has provided a new class of largescale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science surveys. This paper uses false discovery rate methods to carry out both size and power calculations on largescale problems. A simple empirical Bayes approach allows the fdr analysis to proceed with a minimum of frequentist or Bayesian modeling assumptions. Closedform accuracy formulas are derived for estimated false discovery rates, and used to compare different methodologies: local or tailarea fdr’s, theoretical, permutation, or empirical null hypothesis estimates. Two microarray data sets as well as simulations are used to evaluate the methodology the power diagnostics showing why nonnull cases might easily fail to appear on a list of “significant ” discoveries. Short Title “Size, Power, and Fdr’s”