Results 1  10
of
157
The control of the false discovery rate in multiple testing under dependency
 Annals of Statistics
, 2001
"... Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than comparab ..."
Abstract

Cited by 469 (8 self)
 Add to MetaCart
Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than comparable procedures which control the traditional familywise error rate. We prove that this same procedure also controls the false discovery rate when the test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses. This condition for positive dependency is general enough to cover many problems of practical interest, including the comparisons of many treatments with a single control, multivariate normal test statistics with positive correlation matrix and multivariate t. Furthermore, the test statistics may be discrete, and the tested hypotheses composite without posing special difficulties. For all other forms of dependency, a simple conservative modification of the procedure controls the false discovery rate. Thus the range of problems for which
Statistical Comparisons of Classifiers over Multiple Data Sets
, 2006
"... While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but igno ..."
Abstract

Cited by 243 (0 self)
 Add to MetaCart
While methods for comparing two learning algorithms on a single data set have been scrutinized for quite some time already, the issue of statistical tests for comparisons of more algorithms on multiple data sets, which is even more essential to typical machine learning studies, has been all but ignored. This article reviews the current practice and then theoretically and empirically examines several suitable tests. Based on that, we recommend a set of simple, yet safe and robust nonparametric tests for statistical comparisons of classifiers: the Wilcoxon signed ranks test for comparison of two classifiers and the Friedman test with the corresponding posthoc tests for comparison of more classifiers over multiple data sets. Results of the latter can also be neatly presented with the newly introduced CD (critical difference) diagrams.
Controlling the familywise error rate in functional neuroimaging: a comparative review
 Statistical Methods in Medical Research
, 2003
"... Functional neuroimaging data embodies a massive multiple testing problem, where 100 000 correlated test statistics must be assessed. The familywise error rate, the chance of any false positives is the standard measure of Type I errors in multiple testing. In this paper we review and evaluate three a ..."
Abstract

Cited by 68 (3 self)
 Add to MetaCart
Functional neuroimaging data embodies a massive multiple testing problem, where 100 000 correlated test statistics must be assessed. The familywise error rate, the chance of any false positives is the standard measure of Type I errors in multiple testing. In this paper we review and evaluate three approaches to thresholding images of test statistics: Bonferroni, random �eld and the permutation test. Owing to recent developments, improved Bonferroni procedures, such as Hochberg’s methods, are now applicable to dependent data. Continuous random �eld methods use the smoothness of the image to adapt to the severity of the multiple testing problem. Also, increased computing power has made both permutation and bootstrap methods applicable to functional neuroimaging. We evaluate these approaches on t images using simulations and a collection of real datasets. We �nd that Bonferronirelated tests offer little improvement over Bonferroni, while the permutation method offers substantial improvement over the random �eld method for low smoothness and low degrees of freedom. We also show the limitations of trying to �nd an equivalent number of independent tests for an image of correlated test statistics. 1
A linear nongaussian acyclic model for causal discovery
 J. Machine Learning Research
, 2006
"... In recent years, several methods have been proposed for the discovery of causal structure from nonexperimental data. Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to ..."
Abstract

Cited by 54 (23 self)
 Add to MetaCart
In recent years, several methods have been proposed for the discovery of causal structure from nonexperimental data. Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to discover the complete causal structure of continuousvalued data, under the assumptions that (a) the data generating process is linear, (b) there are no unobserved confounders, and (c) disturbance variables have nonGaussian distributions of nonzero variances. The solution relies on the use of the statistical method known as independent component analysis, and does not require any prespecified timeordering of the variables. We provide a complete Matlab package for performing this LiNGAM analysis (short for Linear NonGaussian Acyclic Model), and demonstrate the effectiveness of the method using artificially generated data and realworld data.
An extension on ―statistical comparisons of classifiers over multiple data sets‖ for all pairwise comparisons
 Journal of Machine Learning Research
"... In a recently published paper in JMLR, Demˇsar (2006) recommends a set of nonparametric statistical tests and procedures which can be safely used for comparing the performance of classifiers over multiple data sets. After studying the paper, we realize that the paper correctly introduces the basic ..."
Abstract

Cited by 54 (13 self)
 Add to MetaCart
In a recently published paper in JMLR, Demˇsar (2006) recommends a set of nonparametric statistical tests and procedures which can be safely used for comparing the performance of classifiers over multiple data sets. After studying the paper, we realize that the paper correctly introduces the basic procedures and some of the most advanced ones when comparing a control method. However, it does not deal with some advanced topics in depth. Regarding these topics, we focus on more powerful proposals of statistical procedures for comparing n×n classifiers. Moreover, we illustrate an easy way of obtaining adjusted and comparable pvalues in multiple comparison procedures.
Testing Efficient Risk Sharing with Heterogeneous Risk Preferences ∗
"... Previous papers have tested efficient risk sharing under the assumption of identical risk preferences. In this paper we show that, if in the data households have heterogeneous risk preferences, the tests proposed in the past reject efficiency even if households share risk efficiently. To address thi ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
Previous papers have tested efficient risk sharing under the assumption of identical risk preferences. In this paper we show that, if in the data households have heterogeneous risk preferences, the tests proposed in the past reject efficiency even if households share risk efficiently. To address this issue we propose a method that enables one to test efficiency even when households have different preferences for risk. The method is composed of three tests. The first one can be used to determine whether in the data under investigation households have homogeneous risk preferences. The second and third test can be used to evaluate efficient risk sharing when the hypothesis of homogeneous risk preferences is rejected. We use this method to test efficient risk sharing in rural India. Using the first test, we strongly reject the hypothesis of identical risk preferences. We then test efficiency with and without the assumption of preference homogeneity. In the first case we reject efficient risk sharing at the village and caste level. In the second case we still reject efficiency at the village level, but we cannot reject this hypothesis at the caste level. This finding suggests that the relevant risksharing unit in rural India is the caste and not the village. 1
Stepup procedures for control of generalizations of the familywise error rate
 Ann. Statist
, 2006
"... Consider the multiple testing problem of testing null hypotheses H1,...,Hs. A classical approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of even one false rejection. But if s is large, control of t ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
Consider the multiple testing problem of testing null hypotheses H1,...,Hs. A classical approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of even one false rejection. But if s is large, control of the FWER is so stringent that the ability of a procedure that controls the FWER to detect false null hypotheses is limited. It is therefore desirable to consider other measures of error control. This article considers two generalizations of the FWER. The first is the kFWER, in which one is willing to tolerate k or more false rejections for some fixed k ≥ 1. The second is based on the false discovery proportion (FDP), defined to be the number of false rejections divided by the total number of rejections (and defined to be 0 if there are no rejections). Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B 57 (1995) 289–300] proposed control of the false discovery rate (FDR), by which they meant that, for fixed α, E(FDP) ≤ α. Here, we consider control of the FDP in the sense that, for fixed γ and α, P {FDP> γ} ≤ α. Beginning with any nondecreasing sequence of constants and pvalues for the individual tests, we derive stepup procedures that control each of these two measures of error control without imposing any assumptions on the dependence structure of the pvalues. We use our results to point out a few interesting connections with some closely related stepdown procedures. We then compare and contrast two FDPcontrolling procedures obtained using our results with the stepup procedure for control of the FDR of Benjamini and Yekutieli [Ann. Statist. 29 (2001) 1165–1188]. 1. Introduction. In
On optimality of stepdown and stepup multiple test procedures
 Ann. Statist
, 2005
"... Consider the multiple testing problem of testing k null hypotheses, where the unknown family of distributions is assumed to satisfy a certain monotonicity assumption. Attention is restricted to procedures that control the familywise error rate in the strong sense and which satisfy a monotonicity con ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Consider the multiple testing problem of testing k null hypotheses, where the unknown family of distributions is assumed to satisfy a certain monotonicity assumption. Attention is restricted to procedures that control the familywise error rate in the strong sense and which satisfy a monotonicity condition. Under these assumptions, we prove certain maximin optimality results for some wellknown stepdown and stepup procedures. 1. Introduction. For