## Microarrays, empirical Bayes and the two-groups model (2006)

### Cached

### Download Links

Venue: | STATIST. SCI |

Citations: | 30 - 10 self |

### BibTeX

@ARTICLE{Efron06microarrays,empirical,

author = {Bradley Efron},

title = {Microarrays, empirical Bayes and the two-groups model},

journal = {STATIST. SCI},

year = {2006},

pages = {1--22}

}

### OpenURL

### Abstract

The classic frequentist theory of hypothesis testing developed by Neyman, Pearson, and Fisher has a claim to being the Twentieth Century’s most influential piece of applied mathematics. Something new is happening in the Twenty-First Century: high throughput devices, such as microarrays, routinely require simultaneous hypothesis tests for thousands of individual cases, not at all what the classical theory had in mind. In these situations empirical Bayes information begins to force itself upon frequentists and Bayesians alike. The two-groups model is a simple Bayesian construction that facilitates empirical Bayes analysis. This article concerns the interplay of Bayesian and frequentist ideas in the two-groups setting, with particular attention focussed on Benjamini and Hochberg’s False Discovery Rate method. Topics include the choice and meaning of the null hypothesis in large-scale testing situations, power considerations, the limitations of permutation methods, significance testing for groups of cases (such as pathways in microarray studies), correlation effects, multiple confidence intervals, and Bayesian competitors to the two-groups model.

### Citations

3448 |
Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing
- Benjamini, Hochberg
- 1995
(Show Context)
Citation Context ...s out of N = 15,455, coded “−” for zi < 0, “+” for zi ≥ 0 and solid circle for zi > 2.MICROARRAYS, EMPIRICAL BAYES AND THE TWO-GROUPS MODEL 3 with particular attention paid to False Discovery Rates (=-=Benjamini and Hochberg, 1995-=-). Figure 1 concerns four examples of large-scale simultaneous hypothesis testing. Each example consists of N individual cases, with each case represented by its own z-value “zi,” for i = 1,2,...,N. T... |

1451 | Significance analysis of microarrays applied to the ionizing radiation response - Tusher - 2001 |

602 | Linear models and empirical bayes methods for assessing differential expressionin microarray experiments
- Smyth
(Show Context)
Citation Context ... Subramanian et al. (2005). For a given gene-set “S” with m members, let ¯zS denote the mean of the m z-values within S; ¯zS is the enrichment statistic suggested in the Bioconductor R package limma (=-=Smyth, 2004-=-), (8.1) ¯zS = 0.842 for the CTL pathway. How significant is this result? I will consider assigning an individual p-value to (8.1), not taking into account multiple inference for a catalogue of possib... |

440 | A direct approach to false discovery rates - Storey - 2002 |

392 | Gene set enrichment analysis: a knowledge-based approach for interpreting genomewide expression profiles - Subramanian, Tamayo, et al. - 2005 |

314 | Empirical Bayes analysis of a microarray experiment
- Efron
- 2001
(Show Context)
Citation Context ...rule gives (2.7) f(z) = p0f0(z) + p1f1(z), fdr(z) ≡ Pr{null|Z = z} = p0f0(z)/f(z) for the probability of a gene being in the null group given z-score z. Here fdr(z) is the local false discovery rate (=-=Efron et al., 2001-=-; Efron, 2005). There is a simple relationship between Fdr(z) and fdr(z), (2.8) Fdr(z) = Ef{fdr(Z)|Z ≤ z}, “Ef” indicating expectation with respect to the mixture density f(z). That is, Fdr(z) is the ... |

294 | Gene expression correlates of clinical prostate cancer behavior - Singh, Febbo, et al. - 2002 |

232 |
Gene expression profiles in hereditary breast
- Hedenfalk, Duggan, et al.
- 2001
(Show Context)
Citation Context ...theory provides the only information available for null behavior. But things change inMICROARRAYS, EMPIRICAL BAYES AND THE TWO-GROUPS MODEL 9 Fig. 4. z-values from two microarray studies. BRCA data (=-=Hedenfalk et al., 2001-=-), comparing seven breast cancer patients having BRCA1 mutation to eight with BRCA2 mutation N = 3226 genes. HIV data (van’t Wout et al., 2003) comparing four HIV+ males with four HIV− males, N = 7680... |

195 | On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data - Newton, Kendziorski, et al. |

190 | Multiple hypothesis testing in microarray experiments - Dudoit - 2003 |

179 | Large-scale simultaneous hypothesis testing: the choice of a null hypothesis - EFRON - 2004 |

147 | Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: A unified approach - STOREY, TAYLOR, et al. - 2004 |

145 | Importance of replication in microarray gene expression studies: statistical methods and evidence from repetitive cDNA hybridizations - Lee, Kuo, et al. - 2000 |

136 | Empirical bayes methods and false discovery rates for microarrays. Genetic Epidemiology 23 - EFRON, TIBSHIRANI - 2002 |

93 | Data Analysis using Stein’s Estimator and its Generalizations - Effron, Morris - 1975 |

86 | Detecting differential gene expression with a semiparametric hierarchical mixture model - Newton, Noueiry, et al. - 2004 |

79 | Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of P-values. Bioinformatics 2003;19:1236–42 - Pounds, SW |

55 | A mixture model approach for the analysis of microarray gene expression data - Allison - 2002 |

54 | Correlation and Large-Scale Simultaneous Significance Testing
- Efron
- 2006
(Show Context)
Citation Context ...ne simulation we may be able to see, from σ0, that it is probably misleading. Using the empirical null counteracts this fallacy which, again, is not apparent from the permutation null. (Section 4 of =-=Efron, 2007-=-, discusses more elaborate permutation methods that do bear on Figure 6. See Qui et al., 2005, for a gloomier assessment of correlation effects in microarray analyses.) What is causing the overdispers... |

47 | Cellular gene expression upon human immunodeciency virus type 1 infection of CD4+ T-cell lines - Wout, Lehrma, et al. - 2003 |

46 | Generalizations of the familywise error rate - Lehmann, Romano - 2005 |

44 | A mixture model approach to detecting differentially expressed genes with microarray data, Funct Integr Genomics - Pan, Lin, et al. - 2003 |

44 | 2005): Testing Statistical Hypotheses, 3rd ed - Lehmann, Romano |

39 | False discovery rate-adjusted multiple confidence intervals for selected parameters - Benjamini, Yekutieli - 2005 |

32 | A statistical framework for expression-based molecular classification in cancer - Parmigiani, Garrett, et al. - 2002 |

29 | Using Specially Designed Exponential Families for Density Estimation,” The Annals of Statistics - Efron, Tibshirani - 1996 |

29 | A Bayesian mixture model for differential gene expression - Do, Müller, et al. - 2005 |

29 | Cross-subject comparison of principal diffusion direction maps. Magn Reson Med 2005;53:1423–31 - Schwartzman, RF, et al. |

22 | The control of false discovery rate under dependency - Benjamini, Yekutieli - 2001 |

21 | A comparative review of estimates of the proportion unchanged genes and the false discovery rate - Broberg - 2005 |

19 | A mixture model for estimating the local false discovery rate in DNA microarray analysis - Liao, Lin, et al. - 2004 |

16 | and False Discovery Rates - Size, Power - 2004 |

15 | Bias in the estimation of false discovery rate in microarray studies - Pawitan, Murthy, et al. - 2005 |

14 | Bayesian modelling of differential gene expression, Biometrics 62 - Lewin, Richardson, et al. - 2005 |

13 |
The effects of normalization on the correlation structure of microarray data
- Qiu, Brooks, et al.
(Show Context)
Citation Context ... empirical null counteracts this fallacy which, again, is not apparent from the permutation null. (Section 4 of Efron, 2007, discusses more elaborate permutation methods that do bear on Figure 6. See =-=Qui et al., 2005-=-, for a gloomier assessment of correlation effects in microarray analyses.) What is causing the overdispersion in the Education data of panel B, (4.7)? Correlation across schools, Reason 44, seems rul... |

12 | empirical Bayes and microarrays - Efron |

11 | Determination of the differentially expressed genes in microarray experiments using local FDR - Aubert, Bar-Hen, et al. - 2004 |

10 | Simultaneous inference: When should hypothesis testing problems be combined - EFRON, B - 2008 |

6 |
Scales of evidence for model selection: Fisher versus Jeffreys. Model Selection IMS Monograph 38 208–256. MR2000754
- EFRON, andGOUS
- 2001
(Show Context)
Citation Context ...rule gives (2.7) f(z) = p0f0(z) + p1f1(z), fdr(z) ≡ Pr{null|Z = z} = p0f0(z)/f(z) for the probability of a gene being in the null group given z-score z. Here fdr(z) is the local false discovery rate (=-=Efron et al., 2001-=-; Efron, 2005). There is a simple relationship between Fdr(z) and fdr(z), (2.8) Fdr(z) = Ef{fdr(Z)|Z ≤ z}, “Ef” indicating expectation with respect to the mixture density f(z). That is, Fdr(z) is the ... |

5 | Local false discovery rates”, http://www-stat.stanford.edu/∼brad/papers/False.pdf - Efron - 2005 |

5 | Analysis of variance in microarray data - KERR, MARTIN, et al. - 2000 |

4 | A nonparametric Bayesian mixture model for gene expression”, mbi.osu.edu/2004/ws1materials/do.pdf - Do, Mueller, et al. - 2003 |

4 | Statistical methods for detecting stellar occultations by Kuiper belt objects: The Taiwanese-American occultation survey - LIANG, RICE, et al. - 2004 |

3 | A mixture model approach for finding informative genes in microarray studies. Unpublished manuscript - HELLER, J - 2003 |

3 | Estimating the proportion of true null hypotheses, with application to DNA microarray data - LANGASS, LINDQUIST, et al. - 2005 |

3 | Accuracy of API index and school base report elements: 2003 Academic Performance Index, California Department of Education. Available at http://www.cde.cagov/ ta/ac/ap/researchreports.asp - ROGOSA - 2003 |

3 | Local false discovery rates. Available at http://www-stat.stanford.edu/˜brad/papers/False.pdf - Efron, B - 2005 |

2 | BEST proteomics data. Available at www. stanford.edu/people/brit.turnbull/BESTproteomics.pdf - TURNBULL - 2006 |

1 | 2006/09/07 file: Two-Group-Model.tex date: April 3 - ver - 2005 |

1 | On testing the significance of sets of genes”, http://www-stat.stanford.edu/∼brad/papers/genesetpaper.pdf (To appear Annals of Applied Statistics - Efron, Tibshirani - 2006 |