Results 11 - 20
of
1,038
A stochastic process approach to False discovery rates
, 2001
"... This paper extends the theory of false discovery rates (FDR) pioneered by Benjamini and Hochberg (1995). We develop a framework in which the False Discovery Proportion (FDP) – the number of false rejections divided by the number of rejections – is treated as a stochastic process. After obtaining th ..."
Abstract
-
Cited by 54 (6 self)
- Add to MetaCart
This paper extends the theory of false discovery rates (FDR) pioneered by Benjamini and Hochberg (1995). We develop a framework in which the False Discovery Proportion (FDP) – the number of false rejections divided by the number of rejections – is treated as a stochastic process. After obtaining the limiting distribution of the process, we demonstrate the validitiy of a class of procedures for controlling the False Discovery Rate (the expected FDP). We construct a confidence envelope for the whole FDP process. From these envelopes we derive confidence thresholds, for controlling the quantiles of the distribution of the FDP as well as controlling the number of false discoveries. We also
Empirical Bayes Selection of Wavelet Thresholds
- ANN. STATIST
, 2005
"... This paper explores a class of empirical Bayes methods for level-dependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavy-tailed density. The mixing weight, or sparsity parameter, for each lev ..."
Abstract
-
Cited by 53 (3 self)
- Add to MetaCart
This paper explores a class of empirical Bayes methods for level-dependent threshold selection in wavelet shrinkage. The prior considered for each wavelet coefficient is a mixture of an atom of probability at zero and a heavy-tailed density. The mixing weight, or sparsity parameter, for each level of the transform is chosen by marginal maximum likelihood. If estimation
Higher criticism for detecting sparse heterogeneous mixtures
- Ann. Statist
, 2004
"... Higher Criticism, or second-level significance testing, is a multiple comparisons concept mentioned in passing by Tukey (1976). It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested to compare the ..."
Abstract
-
Cited by 51 (10 self)
- Add to MetaCart
Higher Criticism, or second-level significance testing, is a multiple comparisons concept mentioned in passing by Tukey (1976). It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested to compare the fraction of observed significances at a given α-level to the expected fraction under the joint null, in fact he suggested to standardize the difference of the two quantities and form a z-score; the resulting z-score tests the significance of the body of significance tests. We consider a generalization, where we maximize this z-score over a range of significance levels 0 < α ≤ α0. We are able to show that the resulting Higher Criticism statistic is effective at resolving a very subtle testing problem: testing whether n normal means are all zero versus the alternative that a small fraction is nonzero. The subtlety of this ‘sparse normal means ’ testing problem can be seen from work of Ingster (1999) and Jin (2002), who studied such problems in great detail. In their studies, they identified an interesting range of cases where the small fraction of nonzero means is so
Resampling-Based Multiple Testing for Microarray Data Analysis
, 2003
"... The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. In their 1993 book, We ..."
Abstract
-
Cited by 40 (0 self)
- Add to MetaCart
The burgeoning field of genomics has revived interest in multiple testing procedures by raising new methodological and computational challenges. For example, microarray experiments generate large multiplicity problems in which thousands of hypotheses are tested simultaneously. In their 1993 book, Westfall & Young propose resampling-based p-value adjustment procedures which are highly relevant to microarray experiments. This article discusses different criteria for error control in resampling-based multiple testing, including (a) the family wise error rate of Westfall & Young (1993) and (b) the false discovery rate developed by Benjamini & Hochberg (1995), both from a frequentist viewpoint; and (c) the positive false discovery rate of Storey (2002), which has a Bayesian motivation. We also introduce our recently developed fast algorithm for implementing the minP adjustment to control familywise error rate. Adjusted p-values for different approaches are applied to gene expression data from two recently published microarray studies. The properties of these procedures for multiple testing are compared.
Statistical Issues in cDNA Microarray Data Analysis
, 2003
"... This article summarizes some of the issues involved and provides a brief review of the analysis tools which are available to researchers to deal with them. Any microarray experiment involves a number of distinct stages. Firstly there is the design of the experiment. The researchers must decide which ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
This article summarizes some of the issues involved and provides a brief review of the analysis tools which are available to researchers to deal with them. Any microarray experiment involves a number of distinct stages. Firstly there is the design of the experiment. The researchers must decide which genes are to be printed on the arrays, which sources of RNA are to be hybridized to the arrays and on how many arrays the hybridizations will be replicated. Secondly, after hybridization, there follows a number of data-cleaning steps or `low-level analysis' of the microarray data. The microarray images must be processed to acquire red and green foreground and background intensities for each spot. The acquired red/green ratios must be normalized to adjust for dye-bias and for any systematic variation other than that due to the differences between the RNA samples being studied. Thirdly, the normalized ratios are analyzed by various graphical and numerical means to select differentially expressed genes or to find groups of genes whose expression profiles can reliably classify the different RNA sources into meaningful groups. The sections of this article correspond roughly to the various analysis steps. The following notation will be used throughout the article. The foreground red and green intensities will be written Pp and 9p for each spot. The background intensities will be Pf and 9f . The background-corrected intensities will be P and 9 where usually P Pp Pf 0 # and 9 9p 9f 0 # . The log-differential expression ratio will be vyq # E P 9 0 for each spot. Finally, the log-intensity of the spot will be vyq 3 P9 0 , a measure of the overall brightness of the spot. (The letter E is a mnemonic for minus as vyq vyq E P 9 0 # while 3 is a mnemonic for add as #vyq vyq #...
A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites
- Algorithms in Bioinformatics: Proc. First International Workshop, number 2149 in LNCS
, 2001
"... A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The recent ood of genomic and post-genomic data opens the way for computational methods elucidating the key components that play a role in these mechanisms. ..."
Abstract
-
Cited by 36 (6 self)
- Add to MetaCart
A central issue in molecular biology is understanding the regulatory mechanisms that control gene expression. The recent ood of genomic and post-genomic data opens the way for computational methods elucidating the key components that play a role in these mechanisms.
Adaptive Thresholding Of Wavelet Coefficients
- Computational Statistics and Data Analysis
, 1996
"... Wavelet techniques have become an attractive and efficient tool in function estimation. Given noisy data, its discrete wavelet transform is an estimator of the wavelet coefficients. It has been shown by Donoho and Johnstone (1994) that thresholding the estimated coefficients and then reconstructing ..."
Abstract
-
Cited by 34 (7 self)
- Add to MetaCart
Wavelet techniques have become an attractive and efficient tool in function estimation. Given noisy data, its discrete wavelet transform is an estimator of the wavelet coefficients. It has been shown by Donoho and Johnstone (1994) that thresholding the estimated coefficients and then reconstructing an estimated function reduces the expected risk close to the possible minimum. They offered a global threshold ¸ oe p 2 log n for j ? j 0 , while the coefficients of the first coarse j 0 levels are always included. We demonstrate that the choice of j 0 may strongly affect the corresponding estimators. Then, we use the connection between thresholding and hypotheses testing to construct a thresholding procedure based on the False Discovery Rate (FDR) approach to multiple testing of Benjamini and Hochberg (1995). The suggested procedure controls the expected proportion of incorrectly included coefficients among those chosen for the wavelet reconstruction. The resulting procedure is inherent...
Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences
- Ann. Statist
, 2002
"... An empirical Bayes approach to the estimation of possibly sparse sequences observed in Gaussian white noise is set out and investigated. The prior considered is a mixture of an atom of probability at zero and a heavy-tailed density, with the mixing weight chosen by marginal maximum likelihood, in ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
An empirical Bayes approach to the estimation of possibly sparse sequences observed in Gaussian white noise is set out and investigated. The prior considered is a mixture of an atom of probability at zero and a heavy-tailed density, with the mixing weight chosen by marginal maximum likelihood, in the hope of adapting between sparse and dense sequences. If estimation is then carried out using the posterior median, this is a random thresholding procedure. Other thresholding rules using the same threshold can also be used. Probability bounds on the threshold chosen by the marginal maximum likelihood approach lead to overall bounds on the risk of the method over the class of signal sequences of length n with normalized ` p norm bounded by , for > 0 and 0 < p 2: Estimation error is measured by mean q loss, for 0 < q 2: For all p and q in (0; 2], the method achieves the optimal estimation rate as n ! 1 and ! 0 at various rates, and in this sense adapts automatically to the sparseness or otherwise of the underlying signal. In addition the risk is uniformly bounded over all signals. If the posterior mean is used as the estimator, the results still hold for q > 1: Simulations show excellent performance. Computationally, the method is tractable and essentially of O(n) complexity, and software is available. The extension to a modi ed thresholding method relevant to the wavelet estimation of derivatives of functions is also considered.
On testing the significance of sets of genes
- Annals of Applied Statistics
"... This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways derived from biological databases. Our starting point is the interesting Gene Set Enrichment Analysis ..."
Abstract
-
Cited by 33 (2 self)
- Add to MetaCart
This paper discusses the problem of identifying differentially expressed groups of genes from a microarray experiment. The groups of genes are externally defined, for example, sets of gene pathways derived from biological databases. Our starting point is the interesting Gene Set Enrichment Analysis (GSEA) procedure of Subramanian et al. (2005). We study the problem in some generality and propose two potential improvements to GSEA: the maxmean statistic for summarizing gene-sets, and restandardization for more accurate inferences. We discuss a variety of examples and extensions, including the use of gene-set scores for class predictions. We also describe a new R language package GSA that implements our ideas. 1

