Results 1 - 10
of
92
The control of the false discovery rate in multiple testing under dependency
- Annals of Statistics
, 2001
"... Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than comparab ..."
Abstract
-
Cited by 267 (3 self)
- Add to MetaCart
Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than comparable procedures which control the traditional familywise error rate. We prove that this same procedure also controls the false discovery rate when the test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses. This condition for positive dependency is general enough to cover many problems of practical interest, including the comparisons of many treatments with a single control, multivariate normal test statistics with positive correlation matrix and multivariate t. Furthermore, the test statistics may be discrete, and the tested hypotheses composite without posing special difficulties. For all other forms of dependency, a simple conservative modification of the procedure controls the false discovery rate. Thus the range of problems for which
The sample average approximation method for stochastic discrete optimization
- SIAM Journal on Optimization
, 2001
"... Abstract. In this paper we study a Monte Carlo simulation based approach to stochastic discrete optimization problems. The basic idea of such methods is that a random sample is generated and consequently the expected value function is approximated by the corresponding sample average function. The ob ..."
Abstract
-
Cited by 97 (16 self)
- Add to MetaCart
Abstract. In this paper we study a Monte Carlo simulation based approach to stochastic discrete optimization problems. The basic idea of such methods is that a random sample is generated and consequently the expected value function is approximated by the corresponding sample average function. The obtained sample average optimization problem is solved, and the procedure is repeated several times until a stopping criterion is satisfied. We discuss convergence rates and stopping rules of this procedure and present a numerical example of the stochastic knapsack problem. Key words. Stochastic programming, discrete optimization, Monte Carlo sampling, Law of Large Numbers, Large Deviations theory, sample average approximation, stopping rules, stochastic knapsack problem AMS subject classifications. 90C10, 90C15
Nonparametric Permutation Tests for Functional Neuroimaging: A Primer with Examples. Human Brain Mapping
, 2001
"... The statistical analyses of functional mapping experiments usually proceeds at the voxel level, involving the formation and assessment of a statistic image: at each voxel a statistic indicating evidence of the experimental effect of interest, at that voxel, is computed, giving an image of statistics ..."
Abstract
-
Cited by 73 (6 self)
- Add to MetaCart
The statistical analyses of functional mapping experiments usually proceeds at the voxel level, involving the formation and assessment of a statistic image: at each voxel a statistic indicating evidence of the experimental effect of interest, at that voxel, is computed, giving an image of statistics, a statistic
Detecting group differences: Mining contrast sets
- Data Mining and Knowledge Discovery
, 2001
"... A fundamental task in data analysis is understanding the differences between several con-trasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mini ..."
Abstract
-
Cited by 61 (3 self)
- Add to MetaCart
A fundamental task in data analysis is understanding the differences between several con-trasting groups. These groups can represent different classes of objects, such as male or female students, or the same group over time, e.g. freshman students in 1993 through 1998. We present the problem of mining contrast sets: conjunctions of attributes and values that differ meaningfully in their distribution across groups. We provide a search algorithm for mining contrast sets with pruning rules that drastically reduce the computational complexity. Once the contrast sets are found, we post-process the results to present a subset that are surprising to the user given what we have already shown. We explicitly control the probability of Type I error (false positives) and guarantee a maximum error rate for the entire analysis by using Bonferroni corrections.
Simple Procedures for Selecting the Best Simulated System when the Number of Alternatives Is Large
- Operations Research
, 1999
"... In this paper we address the problem of finding the simulated system with the best (maximum or minimum) expected performance when the number of alternatives is finite, but large enough that ranking-and-selection (R&S) procedures may require too much computation to be practical. Our approach is to ..."
Abstract
-
Cited by 34 (8 self)
- Add to MetaCart
In this paper we address the problem of finding the simulated system with the best (maximum or minimum) expected performance when the number of alternatives is finite, but large enough that ranking-and-selection (R&S) procedures may require too much computation to be practical. Our approach is to use the data provided by the first stage of sampling in an R&S procedure to screen out alternatives that are not competitive and thereby avoid the (typically much larger) second-stage sample for these systems. Our procedures represent a compromise between standard R&S procedures---that are easy to implement, but can be computationally inefficient---and fully sequential procedures---that can be statistically efficient, but are more difficult to implement and depend on more restrictive assumptions. We present a general theory for constructing combined screening and indifference-zone selection procedures, several specific procedures and a portion of an extensive empirical evaluation. ...
Discovering Predictive Association Rules
- In Proc. of the 4th Int'l Conference on Knowledge Discovery in Databases and Data Mining
, 1998
"... Association rule algorithms can produce a very large number of output patterns. This has raised questions of whether the set of discovered rules "overfit" the data because all the patterns that satisfy some constraints are generated (the Bonferroni effect). In other words, the question is whether so ..."
Abstract
-
Cited by 33 (0 self)
- Add to MetaCart
Association rule algorithms can produce a very large number of output patterns. This has raised questions of whether the set of discovered rules "overfit" the data because all the patterns that satisfy some constraints are generated (the Bonferroni effect). In other words, the question is whether some of the rules are "false discoveries" that are not statistically significant. We present a novel approach for estimating the number of "false discoveries" at any cutoff level. Empirical evaluation shows that on typical datasets the fraction of rules that may be false discoveries is very small. A bonus of this work is that the statistical significance measures we compute are a good basis for ordering the rules for presentation to users, since they correspond to the statistical "surprise" of the rule. We also show how to compute confidence intervals for the support and confidence of an association rule, enabling the rule to be used predictively on future data.
A linear non-gaussian acyclic model for causal discovery
- J. Machine Learning Research
, 2006
"... In recent years, several methods have been proposed for the discovery of causal structure from non-experimental data. Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to ..."
Abstract
-
Cited by 33 (16 self)
- Add to MetaCart
In recent years, several methods have been proposed for the discovery of causal structure from non-experimental data. Such methods make various assumptions on the data generating process to facilitate its identification from purely observational data. Continuing this line of research, we show how to discover the complete causal structure of continuous-valued data, under the assumptions that (a) the data generating process is linear, (b) there are no unobserved confounders, and (c) disturbance variables have non-Gaussian distributions of non-zero variances. The solution relies on the use of the statistical method known as independent component analysis, and does not require any pre-specified time-ordering of the variables. We provide a complete Matlab package for performing this LiNGAM analysis (short for Linear Non-Gaussian Acyclic Model), and demonstrate the effectiveness of the method using artificially generated data and real-world data.
A Fully Sequential Procedure for Indifference-Zone Selection in Simulation
- ACM TOMACS
, 1999
"... We present procedures for selecting the best or near-best of a finite number of simulated systems when best is defined by maximum or minimum expected performance. The procedures are appropriate when it is possible to repeatedly obtain small, incremental samples from each simulated system. The goal o ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
We present procedures for selecting the best or near-best of a finite number of simulated systems when best is defined by maximum or minimum expected performance. The procedures are appropriate when it is possible to repeatedly obtain small, incremental samples from each simulated system. The goal of such a sequential procedure is to eliminate, at an early stage of experimentation, those simulated systems that are clearly inferior, and thereby reduce the overall computational effort required to find the best. The procedures we present accommodate unequal variances across systems and the use of common random numbers. However, they are based on the assumption of normally distributed data, so we analyze the impact of batching (to achieve approximate normality or independence) on the performance of the procedures. Comparisons with existing procedures are also provided.
Generalizations of the familywise error rate
- Ann. Statist
, 2005
"... Consider the problem of simultaneously testing null hypotheses H1,...,Hs. The usual approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of even one false rejection. In many applications, particularly ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
Consider the problem of simultaneously testing null hypotheses H1,...,Hs. The usual approach to dealing with the multiplicity problem is to restrict attention to procedures that control the familywise error rate (FWER), the probability of even one false rejection. In many applications, particularly if s is large, one might be willing to tolerate more than one false rejection provided the number of such cases is controlled, thereby increasing the ability of the procedure to detect false null hypotheses. This suggests replacing control of the FWER by controlling the probability of k or more false rejections, which we call the k-FWER. We derive both single-step and stepdown procedures that control the k-FWER, without making any assumptions concerning the dependence structure of the p-values of the individual tests. In particular, we derive a stepdown procedure that is quite simple to apply, and prove that it cannot be improved without violation of control of the k-FWER. We also consider the false discovery proportion (FDP) defined by the number of false rejections divided by the total number of rejections (defined to be 0 if there are no rejections). The false discovery rate proposed by Benjamini
Confidence Statements for Efficiency Estimates from Stochastic Frontier Models
- Journal of Productivity Analysis
, 1996
"... Abstract: This paper is an empirical study of the uncertainty associated with technical efficiency estimates from stochastic frontier models. We show how to construct confidence intervals for estimates of technical efficiency under different sets of assumptions ranging from the very strong to the re ..."
Abstract
-
Cited by 17 (3 self)
- Add to MetaCart
Abstract: This paper is an empirical study of the uncertainty associated with technical efficiency estimates from stochastic frontier models. We show how to construct confidence intervals for estimates of technical efficiency under different sets of assumptions ranging from the very strong to the relatively weak. We demonstrate empirically how the degree of uncertainty associated with these estimates relates to the strength of the assumptions made and to various features of the data.

