Results 1 - 10
of
24
A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics
, 2005
"... ..."
Microarrays, empirical Bayes and the two-groups model
- STATIST. SCI
, 2006
"... The classic frequentist theory of hypothesis testing developed by Neyman, Pearson, and Fisher has a claim to being the Twentieth Century’s most influential piece of applied mathematics. Something new is happening in the Twenty-First Century: high throughput devices, such as microarrays, routinely re ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
The classic frequentist theory of hypothesis testing developed by Neyman, Pearson, and Fisher has a claim to being the Twentieth Century’s most influential piece of applied mathematics. Something new is happening in the Twenty-First Century: high throughput devices, such as microarrays, routinely require simultaneous hypothesis tests for thousands of individual cases, not at all what the classical theory had in mind. In these situations empirical Bayes information begins to force itself upon frequentists and Bayesians alike. The two-groups model is a simple Bayesian construction that facilitates empirical Bayes analysis. This article concerns the interplay of Bayesian and frequentist ideas in the two-groups setting, with particular attention focussed on Benjamini and Hochberg’s False Discovery Rate method. Topics include the choice and meaning of the null hypothesis in large-scale testing situations, power considerations, the limitations of permutation methods, significance testing for groups of cases (such as pathways in microarray studies), correlation effects, multiple confidence intervals, and Bayesian competitors to the two-groups model.
Size, power and false discovery rates
, 2007
"... Modern scientific technology has provided a new class of large-scale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Modern scientific technology has provided a new class of large-scale simultaneous inference problems, with thousands of hypothesis tests to consider at the same time. Microarrays epitomize this type of technology, but similar situations arise in proteomics, spectroscopy, imaging, and social science surveys. This paper uses false discovery rate methods to carry out both size and power calculations on large-scale problems. A simple empirical Bayes approach allows the fdr analysis to proceed with a minimum of frequentist or Bayesian modeling assumptions. Closed-form accuracy formulas are derived for estimated false discovery rates, and used to compare different methodologies: local or tail-area fdr’s, theoretical, permutation, or empirical null hypothesis estimates. Two microarray data sets as well as simulations are used to evaluate the methodology the power diagnostics showing why non-null cases might easily fail to appear on a list of “significant ” discoveries. Short Title “Size, Power, and Fdr’s”
locfdr Vignette: Complete Help Documentation Including Usage Tips and Simulation Example,” The Comprehensive R Archive Network, November 1, 2007. As of November 3, 2008: http://cran.r-project.org/web/packages/locfdr/vignettes/locfdr-example.pdf Fridell, L
- Executive Research Forum, 2004. As of November 26, 2007: http://www.policeforum.org/library.asp?MENU=229
, 1973
"... This vignette includes locfdr’s complete help documentation, including usage tips, which could not fit in the R help file. It also demonstrates usage of locfdr through an example using the simulated data included in the package. 1 Description and Usage locfdr computes local false discovery rates, fo ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This vignette includes locfdr’s complete help documentation, including usage tips, which could not fit in the R help file. It also demonstrates usage of locfdr through an example using the simulated data included in the package. 1 Description and Usage locfdr computes local false discovery rates, following the definitions and description in the references listed below. locfdr(zz, bre=120, df=7, pct=0, pct0=1/4, nulltype=1, type=0, plot=1, mult, mlests, main= " ", sw=0)
Are a set of microarrays independent of each other
, 2009
"... Having observed an m × n matrix X whose rows are possibly correlated, we wish to test the hypothesis that the columns are independent of each other. Our motivation comes from microarray studies, where the rows of X record expression levels for m different genes, often highly correlated, while the co ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Having observed an m × n matrix X whose rows are possibly correlated, we wish to test the hypothesis that the columns are independent of each other. Our motivation comes from microarray studies, where the rows of X record expression levels for m different genes, often highly correlated, while the columns represent n individual microarrays, presumably obtained independently. The presumption of independence underlies all the familiar permutation, cross-validation, and bootstrap methods for microarray analysis, so it is important to know when independence fails. We develop nonparametric and normal-theory testing methods. The row and column correlations of X interact with each other in a way that complicates test procedures, essentially by reducing the accuracy of the relevant estimators.
CHOOSING THE LESSER EVIL: TRADE-OFF BETWEEN FALSE DISCOVERY RATE AND NON-DISCOVERY RATE
"... Abstract: The problem of multiple comparisons has become increasingly important in light of the significant surge in volume of data available to statisticians. The seminal work of Benjamini and Hochberg (1995) on the control of the false discovery rate (FDR) has brought forth an alternative way of m ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract: The problem of multiple comparisons has become increasingly important in light of the significant surge in volume of data available to statisticians. The seminal work of Benjamini and Hochberg (1995) on the control of the false discovery rate (FDR) has brought forth an alternative way of measuring type I error rate that is often more relevant than the one based on the family-wise error rate. In this paper, we emphasize the importance of considering type II error rates in the context of multiple hypothesis testing. We propose a suitable quantity, the expected proportion of false negatives among the true alternative hypotheses, which we call non-discovery rate (NDR). We argue that NDR is a natural extension of the type II error rate of single hypothesis to multiple comparisons. The utility of NDR is emphasized through the trade-off between FDR and NDR, which is demonstrated using a few real and simulated examples. We also show analytically the equivalence between the FDR-adjusted p-value approach of Yekutieli and Benjamini (1999) and the q-value method of Storey (2002). This equivalence dissolves the dilemma encountered by many practitioners of choosing the “right ” FDR controlling procedure. Key words and phrases: False discovery rate, genome-scans, microarray data, multiple comparisons, multiple hypothesis testing, non-discovery rate, power, type I error, type II error. 1.
BMC Bioinformatics BioMed Central Methodology article
, 2008
"... which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background: False discovery rate (FDR) methods play an important role in analyzing highdimensional data. There are two types of FDR, tail area-based FDR and local FDR, as well ..."
Abstract
- Add to MetaCart
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background: False discovery rate (FDR) methods play an important role in analyzing highdimensional data. There are two types of FDR, tail area-based FDR and local FDR, as well as numerous statistical algorithms for estimating or controlling FDR. These differ in terms of underlying test statistics and procedures employed for statistical learning. Results: A unifying algorithm for simultaneous estimation of both local FDR and tail area-based FDR is presented that can be applied to a diverse range of test statistics, including p-values, correlations, z- and t-scores. This approach is semipararametric and is based on a modified Grenander density estimator. For test statistics other than p-values it allows for empirical null modeling, so that dependencies among tests can be taken into account. The inference of the underlying model employs truncated maximum-likelihood estimation, with the cut-off point chosen according to the false non-discovery rate. Conclusion: The proposed procedure generalizes a number of more specialized algorithms and thus offers a common framework for FDR estimation consistent across test statistics and types of FDR. In comparative study the unified approach performs on par with the best competing yet more specialized alternatives. The algorithm is implemented in R in the "fdrtool " package, available under the GNU GPL from
2009 Chen Volume and 10, Zheng Issue 1, Article R3 Open Access
, 2009
"... Studying alternative splicing regulatory networks through partial correlation analysis ..."
Abstract
- Add to MetaCart
Studying alternative splicing regulatory networks through partial correlation analysis

