Results 1  10
of
145
BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks
 Bioinformatics
, 2005
"... Summary: The Biological Networks Gene Ontology tool (BiNGO) is an opensource Java tool to determine which Gene Ontology (GO) terms are significantly overrepresented in a set of genes. BiNGO can be used either on a list of genes, pasted as text, or interactively on subgraphs of biological networks v ..."
Abstract

Cited by 535 (4 self)
 Add to MetaCart
(Show Context)
Summary: The Biological Networks Gene Ontology tool (BiNGO) is an opensource Java tool to determine which Gene Ontology (GO) terms are significantly overrepresented in a set of genes. BiNGO can be used either on a list of genes, pasted as text, or interactively on subgraphs of biological networks visualized in Cytoscape. BiNGO maps the predominant functional themes of the tested gene set on the GO hierarchy, and takes advantage of Cytoscape’s versatile visualization environment to produce an intuitive and customizable visual representation of the results.
Correlation and LargeScale Simultaneous Significance Testing
 Journal of the American Statistical Association
"... Largescale hypothesis testing problems, with hundreds or thousands of test statistics “zi ” to consider at once, have become familiar in current practice. Applications of popular analysis methods such as false discovery rate techniques do not require independence of the zi’s, but their accuracy can ..."
Abstract

Cited by 97 (8 self)
 Add to MetaCart
Largescale hypothesis testing problems, with hundreds or thousands of test statistics “zi ” to consider at once, have become familiar in current practice. Applications of popular analysis methods such as false discovery rate techniques do not require independence of the zi’s, but their accuracy can be compromised in highcorrelation situations. This paper presents computational and theoretical methods for assessing the size and effect of correlation in largescale testing. A simple theory leads to the identification of a single omnibus measure of correlation. The theory relates to the correct choice of a null distribution for simultaneous significance testing, and its effect on inference. 1. Introduction Modern computing machinery and improved scientific equipment have combined to revolutionize experimentation in fields such as biology, medicine, genetics, and neuroscience. One effect on statistics has been to vastly magnify the scope of multiple hypothesis testing, now often involving thousands of cases considered simultaneously. The cases themselves are typically of familiar form, each perhaps a simple twosample comparison,
Improving false discovery rate estimation
 Bioinformatics
"... Motivation: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR).However, rigorous control of the FDR at a preselected level is often impractical. Consequently, it has been suggested to use the qvalue as an est ..."
Abstract

Cited by 52 (6 self)
 Add to MetaCart
Motivation: Recent attempts to account for multiple testing in the analysis of microarray data have focused on controlling the false discovery rate (FDR).However, rigorous control of the FDR at a preselected level is often impractical. Consequently, it has been suggested to use the qvalue as an estimate of the proportion of false discoveries among a set of significant findings. However, such an interpretation of the qvalue may be unwarranted considering that the qvalue is based on an unstable estimator of the positive FDR (pFDR). Another method proposes estimating the FDR by modeling pvalues as arising from a betauniform mixture (BUM) distribution. Unfortunately, the BUM approach is reliable only in settings where the assumed model accurately represents the actual distribution of pvalues. Methods: A method called the spacings LOESS histogram (SPLOSH) is proposed for estimating the conditional FDR (cFDR), the expected proportion of false positives conditioned on having k ‘significant ’ findings. SPLOSH is designed to be more stable than the qvalue and applicable in a wider variety of settings than BUM. Results: In a simulation study and data analysis example, SPLOSH exhibits the desired characteristics relative to the qvalue and BUM. Availability: The Web site www.stjuderesearch.org/statistics/ splosh.html has links to freely available Splus code to implement the proposed procedure. Contact:
Young S: Sample size calculation for multiple testing in microarray data analysis
 Biostatistics
"... Microarray technology is rapidly emerging for genomewide screening of differentially expressed genes between clinical subtypes or different conditions of human diseases. Traditional statistical testing approaches, such as the twosample ttest or Wilcoxon test, are frequently used for evaluating st ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
Microarray technology is rapidly emerging for genomewide screening of differentially expressed genes between clinical subtypes or different conditions of human diseases. Traditional statistical testing approaches, such as the twosample ttest or Wilcoxon test, are frequently used for evaluating statistical significance of informative expressions but require adjustment for largescale multiplicity. Due to its simplicity, Bonferroni adjustment has been widely used to circumvent this problem. It is well known, however, that the standard Bonferroni test is often very conservative. In the present paper, we compare three multiple testing procedures in the microarray context: the original Bonferroni method, a Bonferronitype improved singlestep method and a stepdown method. The latter two methods are based on nonparametric resampling, by which the null distribution can be derived with the dependency structure among gene expressions preserved and the familywise error rate accurately controlled at the desired level. We also present a sample size calculation method for designing microarray studies. Through simulations and data analyses, we find that the proposed methods for testing and sample size calculation are computationally fast and control error and power precisely.
SOME NONASYMPTOTIC RESULTS ON RESAMPLING IN HIGH DIMENSION, I: CONFIDENCE REGIONS
 SUBMITTED TO THE ANNALS OF STATISTICS
, 2009
"... We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
We study generalized bootstrap confidence regions for the mean of a random vector whose coordinates have an unknown dependency structure. The random vector is supposed to be either Gaussian or to have a symmetric and bounded distribution. The dimensionality
Multiple Testing Part II: Stepdown procedures for control of the familywise error rate
 Stat Appl Genet Mol Biol
"... Copyright c©2003 by the authors. ..."
Association rule interestingness: Measure and statistical validation
 In Quality Measures in Data Mining
, 2007
"... Summary. The search for interesting Boolean association rules is an important topic in knowledge discovery in databases. The set of admissible rules for the selected support and con dence thresholds can easily be extracted by algorithms based on support and con dence, such as Apriori. However, they ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
Summary. The search for interesting Boolean association rules is an important topic in knowledge discovery in databases. The set of admissible rules for the selected support and con dence thresholds can easily be extracted by algorithms based on support and con dence, such as Apriori. However, they may produce a large number of rules, many of them are uninteresting. One has to resolve a twotier problem: choosing the measures best suited to the problem at hand, then validating the interesting rules against the selected measures. First, the usual measures suggested in the literature will be reviewed and criteria to appreciate the qualities of these measures will be proposed. Statistical validation of the most interesting rules requests performing a large number of tests. Thus, controlling for false discoveries (type I errors) is of prime importance. An original bootstrapbased validation method is proposed which controls, for a given level, the number of false discoveries. The interest of this method for the selection of interesting association rules will be illustrated by several examples.
CATdb: a public access to Arabidopsis transcriptome
, 2007
"... data from the URGVCATMA platform ..."
(Show Context)
Enhancing peptide identification confidence by combining search methods
 J. Proteome Res
"... Confident peptide identification is one of the most important components in massspectrometrybased proteomics. We propose a method to properly combine the results from different database search methods to enhance the accuracy of peptide identifications. The database search methods included in our a ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
(Show Context)
Confident peptide identification is one of the most important components in massspectrometrybased proteomics. We propose a method to properly combine the results from different database search methods to enhance the accuracy of peptide identifications. The database search methods included in our analysis are SEQUEST (v27 rev12), ProbID (v1.0), InsPecT (v20060505), Mascot (v2.1), X! Tandem (v2007.07.01.2), OMSSA (v2.0) and RAId_DbS. Using two data sets, one collected in profile mode and one collected in centroid mode, we tested the search performance of all 21 combinations of two search methods as well as all 35 possible combinations of three search methods. The results obtained from our study suggest that properly combining search methods does improve retrieval accuracy. In addition to performance results, we also describe the theoretical framework which in principle allows one to combine many independent scoring methods including de novo sequencing and spectral library searches. The correlations among different methods are also investigated in terms of common true positives, common false positives, and a global analysis. We find that the average correlation strength, between any pairwise combination of the seven methods studied, is usually smaller than the associated standard error. This indicates only weak correlation may be present among different methods and validates our approach in combining the search results. The usefulness of our approach is further confirmed by showing that the average cumulative number of false positive peptides agrees reasonably well with the combined Evalue. The data related to this study are freely available upon request.
Comparative analysis of gene sets in the gene ontology space under the multiple hypothesis testing framework
 In Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
"... The Gene Ontology (GO) resource can be used as a powerful tool to uncover the properties shared among, and specific to, a list of genes produced by highthroughput functional genomics studies, such as microarray studies. In the comparative analysis of several gene lists, researchers maybe interested ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
The Gene Ontology (GO) resource can be used as a powerful tool to uncover the properties shared among, and specific to, a list of genes produced by highthroughput functional genomics studies, such as microarray studies. In the comparative analysis of several gene lists, researchers maybe interested in knowing which GO terms are enriched in one list of genes but relatively depleted in another. Statistical tests such as Fisher’s exact test or Chisquare test can be performed to search for such GO terms. However, because multiple GO terms are tested simultaneously, individual pvalues from individual tests do not serve as good indicators for picking GO terms. Furthermore, these multiple tests are highly correlated, usual