Results 1 
3 of
3
Higher criticism for detecting sparse heterogeneous mixtures
 Ann. Statist
, 2004
"... Higher Criticism, or secondlevel significance testing, is a multiple comparisons concept mentioned in passing by Tukey (1976). It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested to compare the ..."
Abstract

Cited by 86 (15 self)
 Add to MetaCart
Higher Criticism, or secondlevel significance testing, is a multiple comparisons concept mentioned in passing by Tukey (1976). It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested to compare the fraction of observed significances at a given αlevel to the expected fraction under the joint null, in fact he suggested to standardize the difference of the two quantities and form a zscore; the resulting zscore tests the significance of the body of significance tests. We consider a generalization, where we maximize this zscore over a range of significance levels 0 < α ≤ α0. We are able to show that the resulting Higher Criticism statistic is effective at resolving a very subtle testing problem: testing whether n normal means are all zero versus the alternative that a small fraction is nonzero. The subtlety of this ‘sparse normal means ’ testing problem can be seen from work of Ingster (1999) and Jin (2002), who studied such problems in great detail. In their studies, they identified an interesting range of cases where the small fraction of nonzero means is so
Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data
 Ann. Statist
, 2006
"... Control of the False Discovery Rate (FDR) is an important development in multiple hypothesis testing, allowing the user to limit the fraction of rejected null hypotheses which correspond to false rejections (i.e. false discoveries). The FDR principle also can be used in multiparameter estimation pro ..."
Abstract

Cited by 14 (4 self)
 Add to MetaCart
Control of the False Discovery Rate (FDR) is an important development in multiple hypothesis testing, allowing the user to limit the fraction of rejected null hypotheses which correspond to false rejections (i.e. false discoveries). The FDR principle also can be used in multiparameter estimation problems to set thresholds for separating signal from noise when the signal is sparse. Success has been proven when the noise is Gaussian; see [3]. In this paper, we consider the application of FDR thresholding to a nonGaussian setting, in hopes of learning whether the good asymptotic properties of FDR thresholding as an estimation tool hold more broadly than just at the standard Gaussian model. We consider a vector Xi, i = 1,..., n, whose coordinates are independent exponential with individual means µi. The vector µ is thought to be sparse, with most coordinates 1 and a small fraction significantly larger than 1. This models a situation where most coordinates are simply ‘noise’, but a small fraction of the coordinates contain ‘signal’. We develop an estimation theory working with log(µi) as the estimand, and use the percoordinate meansquared error in recovering log(µi) to measure risk. We consider minimax
Asymptotic Minimaxity of False Discovery Rate Thresholding for Sparse Exponential Data
, 2004
"... Control of the False Discovery Rate (FDR) is a recent innovation in multiple hypothesis testing, allowing the user to limit the fraction of rejected null hypotheses which correspond to false rejections (i.e. false discoveries). The FDR principle also can be used in multiparameter estimation problems ..."
Abstract
 Add to MetaCart
Control of the False Discovery Rate (FDR) is a recent innovation in multiple hypothesis testing, allowing the user to limit the fraction of rejected null hypotheses which correspond to false rejections (i.e. false discoveries). The FDR principle also can be used in multiparameter estimation problems to set thresholds for separating signal from noise when the signal is sparse. Success has been proven when the noise is Gaussian; see [1]. In this paper, we consider the application of FDR thresholding to a nonGaussian setting, in hopes of learning whether the good asymptotic properties of FDR thresholding as an estimation tool hold more broadly than just at the standard Gaussian model. We consider a vector Xi, i = 1,..., n whose coordinates are independent exponential with individual means µi. The vector µ is thought to be sparse, with most coordinates 1 and a small fraction significantly larger than 1. This models a situation where most coordinates are simply ‘noise’, but a small fraction of the coordinates contain ‘signal’. We develop an estimation theory working with log(µi) as the estimand, and use the percoordinate meansquared error in recovering log(µi) to measure risk. We consider minimax estimation over parameter spaces defined by constraints on the percoordinate ℓ p norm of log(µi): Avei log p (µi) ≤ η p. Members of such spaces are vectors (µi) which are sparsely heterogeneous. We find that, for large n and small η, FDR thresholding is nearly minimax, increasingly so as η decreases. The FDR control parameter 0 < q < 1 plays an important role: when q ≤ 1 2 prevents near minimaxity. These conclusions mirror those found by Abramovich et al in the Gaussian case. The techniques developed here seem applicable to a wide range of other distributional assumptions, other loss measures, and noni.i.d. dependency structures. We will also compare our results with work in the Gaussian setting [1]. This is joint work with David Donoho., the FDR estimator is nearly minimax, while choosing a fixed q>