Results 1  10
of
68
Multicriteria gene screening for analysis of differential expression with DNA microarrays
 EURASIP Journal on Applied Signal Processing
, 2004
"... Abstract £ This paper introduces a statistical methodology for identification of differentially expressed genes in DNA microarray experiments based on multiple criteria. These criteria are: false discovery rate (FDR); variancenormalized differential expression levels (paired t statistics); and mini ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
Abstract £ This paper introduces a statistical methodology for identification of differentially expressed genes in DNA microarray experiments based on multiple criteria. These criteria are: false discovery rate (FDR); variancenormalized differential expression levels (paired t statistics); and minimum acceptable difference (MAD). The methodology also provides a set of simultaneous FDR confidence intervals on the true expression differences. The analysis can be implemented as a two stage algorithm in which there is an initial screen that controls only FDR, which is then followed by a second screen which controls both FDR and MAD. It can also be implemented by computing and thresholding the set of FDR pvalues for each gene that satisfies the MAD criterion. We illustrate the procedure to identify differentially expressed genes from a wildtype vs. knockout comparison of microarray data.
On optimality of stepdown and stepup multiple test procedures
 Ann. Statist
, 2005
"... Consider the multiple testing problem of testing k null hypotheses, where the unknown family of distributions is assumed to satisfy a certain monotonicity assumption. Attention is restricted to procedures that control the familywise error rate in the strong sense and which satisfy a monotonicity con ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Consider the multiple testing problem of testing k null hypotheses, where the unknown family of distributions is assumed to satisfy a certain monotonicity assumption. Attention is restricted to procedures that control the familywise error rate in the strong sense and which satisfy a monotonicity condition. Under these assumptions, we prove certain maximin optimality results for some wellknown stepdown and stepup procedures. 1. Introduction. For
robust nonnegative matrix factorization analysis of microarray data
 Bioinformatics
"... Motivation: Modern methods like micro arrays, proteomics and metabolomics often produce data sets where there are many more predictor variables than observations. Research in these areas is often exploratory; even so, there is interest in statistical methods that accurately point to effects that are ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Motivation: Modern methods like micro arrays, proteomics and metabolomics often produce data sets where there are many more predictor variables than observations. Research in these areas is often exploratory; even so, there is interest in statistical methods that accurately point to effects that are likely to replicate. Correlations among predictors are used to improve the statistical analysis. We exploit two ideas: nonnegative matrix factorization methods that create ordered sets of predictors; and statistical testing within ordered sets which is done sequentially, removing the need for correction for multiple testing within the set. Results: Simulations and theory point to increased statistical power. Computational algorithms are described in detail. The analysis and biological interpretation of a real data set are given. In addition to the increased power, the benefit of our method is that the organized gene lists are likely to lead better understanding of the biology. Availablity: A SAS JMP executable script is available from
Data snooping, dredging and fishing: The dark side of data mining a SIGKDD99 panel report
 SIGKDD Explorations
, 2000
"... This article briefly describes a panel discussion at SIGKDD99. ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
This article briefly describes a panel discussion at SIGKDD99.
On the Statistical Comparison of Inductive Learning Methods
 In D. Fisher & H.J. Lenz (Eds.), Learning from Data: Artificial and Intelligence V
, 1996
"... Experimental comparisons between statistical and machine learning methods appear with increasing frequency in the literature. However, there does not seem to be a consensus on how such a comparison is performed in a methodologically sound way. Especially the effect of testing multiple hypotheses on ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Experimental comparisons between statistical and machine learning methods appear with increasing frequency in the literature. However, there does not seem to be a consensus on how such a comparison is performed in a methodologically sound way. Especially the effect of testing multiple hypotheses on the probability of producing a "false alarm" is often ignored. We transfer multiple comparison procedures from the statistical literature to the type of study discussed in this paper. These testing procedures take the number of tests performed into account, thereby controlling the probability of generating "false alarms". The multiple comparison procedures selected are illustrated on wellknown regression and classification data sets. 26.1 Introduction Recent interactions between the statistical and artificial intelligence communities (see e.g. [Han93, CO94]), have led to many studies that compare the performance of empirical statistical and machine learning methods on reallife data sets; ...
SOME NONASYMPTOTIC RESULTS ON RESAMPLING IN HIGH DIMENSION, I: CONFIDENCE REGIONS
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2009
"... We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality
NONASYMPTOTIC RESAMPLINGBASED CONFIDENCE REGIONS AND MULTIPLE TESTS IN HIGH DIMENSION
"... Abstract. We study generalized bootstrapped confidence regions for the mean of a random vector whose coordinates have an unknown dependence structure. The dimensionality of the vector can possibly be much larger than the number of observations and we focus on a nonasymptotic control of the confiden ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract. We study generalized bootstrapped confidence regions for the mean of a random vector whose coordinates have an unknown dependence structure. The dimensionality of the vector can possibly be much larger than the number of observations and we focus on a nonasymptotic control of the confidence level. The random vector is supposed to be either Gaussian or to have a symmetric bounded distribution. We consider two approaches, the first based on a concentration principle and the second on a direct boostrapped quantile. The first one allows us to deal with a very large class of resampling weights while our results for the second are specific to Rademacher weights. We present an application of these results to the onesided and twosided multiple testing problem, in which we derive several resamplingbased stepdown procedures providing a nonasymptotic FWER control. We compare our different procedures in a simulation study, and we show that they can outperform Bonferroni’s or Holm’s procedures as soon as the observed vector has sufficiently correlated coordinates. 1.
TEAM: Efficient TwoLocus Epistasis Tests in Human GenomeWide Association Study
"... As a promising tool for identifying genetic markers underlying phenotypic differences, genomewide association study (GWAS) has been extensively investigated in recent years. In GWAS, detecting epistasis (or genegene interaction) is preferable over single locus study since many diseases are known t ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
As a promising tool for identifying genetic markers underlying phenotypic differences, genomewide association study (GWAS) has been extensively investigated in recent years. In GWAS, detecting epistasis (or genegene interaction) is preferable over single locus study since many diseases are known to be complex traits. A brute force search is infeasible for epistasis detection in the genomewide scale because of the intensive computational burden. Existing epistasis detection algorithms are designed for dataset consisting of homozygous markers and small sample size. In human study, however, the genotype may be heterozygous, and number of individuals can be up to thousands. Thus existing methods are not readily applicable to human datasets. In this paper, we propose an efficient algorithm, TEAM, that significantly speeds up epistasis detection for human GWAS. Our algorithm is exhaustive, i.e., it does not ignore any epistatic interaction. Utilizing the minimum spanning tree structure, the algorithm incrementally updates the contingency tables for epistatic tests without scanning all individuals. Our algorithm has broader applicability and is more efficient than existing methods for large sample study. It supports any statistical test that is based on contingency tables, and enables both familywise error rate (FWER) and false discovery rate (FDR) controlling. Extensive experiments show that our algorithm only needs to examine a small portion of the individuals to update the contingency tables, and it achieves at least an order of magnitude speedup over the brute force approach. 1
Calibration for Simultaneity: (Re)Sampling Methods for Simultaneous Inference with Applications to Function Estimation and Functional Data
"... We survey and illustrate a Monte Carlo technique for carrying out simple simultaneous inference with arbitrarily many statistics. Special cases of the technique have appeared in the literature, but there exists widespread unawareness of the simplicity and broad applicability of this solution to simu ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We survey and illustrate a Monte Carlo technique for carrying out simple simultaneous inference with arbitrarily many statistics. Special cases of the technique have appeared in the literature, but there exists widespread unawareness of the simplicity and broad applicability of this solution to simultaneous inference. The technique, here called “calibration for simultaneity ” or CfS, consists of 1) limiting the search for coverage regions to a oneparameter family of nested regions, and 2) selecting from the family that region whose estimated coverage probability has the desired value. Natural oneparameter families are almost always available. CfS applies whenever inference is based on a single distribution, for example: 1) fixed distributions such as Gaussians when diagnosing distributional assumptions, 2) conditional null distributions in exact tests with Neyman structure, in particular permutation tests, 3) bootstrap distributions for bootstrap standard error bands, 4) Bayesian posterior distributions for highdimensional posterior probability regions, or 5) predictive distributions for multiple prediction intervals. CfS is particularly useful for estimation of any type of function, such as empirical QQ curves, empirical CDFs, density estimates, smooths, generally any type of fit, and functions estimated from functional data. A special case of CfS is equivalent to pvalue adjustment (Westfall and Young, 1993). Conversely, the notion of a pvalue can be extended to any simultaneous coverage problem that is solved with a oneparameter family of coverage regions.