Results 1  10
of
24
A Shrinkage Approach to LargeScale Covariance Matrix Estimation and Implications for Functional Genomics
, 2005
"... ..."
Oracle and adaptive compound decision rules for false discovery rate control
 J. Am. Statist. Ass
, 2007
"... We develop a compound decision theory framework for multipletesting problems and derive an oracle rule based on the z values that minimizes the false nondiscovery rate (FNR) subject to a constraint on the false discovery rate (FDR). We show that many commonly used multipletesting procedures, which ..."
Abstract

Cited by 48 (7 self)
 Add to MetaCart
We develop a compound decision theory framework for multipletesting problems and derive an oracle rule based on the z values that minimizes the false nondiscovery rate (FNR) subject to a constraint on the false discovery rate (FDR). We show that many commonly used multipletesting procedures, which are p value–based, are inefficient, and propose an adaptive procedure based on the z values. The z value–based adaptive procedure asymptotically attains the performance of the z value oracle procedure and is more efficient than the conventional p value–based methods. We investigate the numerical performance of the adaptive procedure using both simulated and real data. In particular, we demonstrate our method in an analysis of the microarray data from a human immunodeficiency virus study that involves testing a large number of hypotheses simultaneously.
Inferring gene dependency networks from genomic longitudinal data: a functional data approach
, 2006
"... A key aim of systems biology is to unravel the regulatory interactions among genes and gene products in a cell. Here we investigate a graphical model that treats the observed gene expression over time as realizations of random curves. This approach is centered around an estimator of dynamical pairw ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
A key aim of systems biology is to unravel the regulatory interactions among genes and gene products in a cell. Here we investigate a graphical model that treats the observed gene expression over time as realizations of random curves. This approach is centered around an estimator of dynamical pairwise correlation that takes account of the functional nature of the observed data. This allows to extend the graphical Gaussian modeling framework from i.i.d. data to analyze longitudinal genomic data. The new method is illustrated by analyzing highly replicated data from a genome experiment concerning the expression response of human Tcells to PMA and ionomicin treatment.
Determining the number of nonspurious arcs in a learned dag model: Investigation of a bayesian and a frequentist approach
 The 23rd Conference on Uncertainty in Artificial Intelligence, 2007. 6 www.pnas.org — — Footline Author
"... In many application areas where graphical models are used and where their structure is learned from data, the end goal is neither prediction nor density estimation. Rather, it is the uncovering of discrete relationships between entities. For example, in computational biology, one may be interested i ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
(Show Context)
In many application areas where graphical models are used and where their structure is learned from data, the end goal is neither prediction nor density estimation. Rather, it is the uncovering of discrete relationships between entities. For example, in computational biology, one may be interested in discovering which proteins within a large set of proteins interact with one another. In these problems, relationships can be represented by arcs in a graphical model. Consequently, given a learned model, we are interested in knowing how many of the arcs are real or nonspurious. In our approach to this problem, we estimate and control the False Discovery Rate (FDR) [1] of a set of arc hypotheses. The FDR is defined as the (expected) proportion of all hypotheses (e.g., arc hypotheses) which we label as true, but which are actually false (i.e., the number of false positives divided by the number of total hypotheses called true). In our evaluations, we concentrate on directed acyclic graphs (DAGs) for discrete variables with known variable orderings, as our problem of interest (concerning a particular problem related to HIV vaccine design) has these properties. We use the term arc hypothesis to denote the event that an arc is present in the underlying distribution of the data. In a typical computation of FDR, we are given a set of hypotheses where each hypothesis, i, is assigned a score, si (traditionally, a test statistic, or the pvalue resulting from such a test statistic). These scores are often assumed to be independent and identically distributed, although there has been much work to relax the assumption of independence [2]. The FDR is computed as a function of a threshold, t, on these scores, FDR = FDR(t). For threshold t, all hypotheses with si ≥ t are said to be significant (assuming, without loss of generality, that the higher a score, the more we believe a hypothesis). The FDR at threshold t is then given by FDR(t) = E
A tail strength measure for assessing the overall univariate significance in a dataset
, 2005
"... ..."
A hierarchical Bayesian approach to multiple testing in disease mapping
, 2010
"... We propose a Bayesian approach to multiple testing in disease mapping. This study was motivated by a real example regarding the mortality rate for lung cancer, males, in the Tuscan region (Italy). The data are relative to the period 1995–1999 for 287 municipalities. We develop a trilevel hierarchic ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We propose a Bayesian approach to multiple testing in disease mapping. This study was motivated by a real example regarding the mortality rate for lung cancer, males, in the Tuscan region (Italy). The data are relative to the period 1995–1999 for 287 municipalities. We develop a trilevel hierarchical Bayesian model to estimate for each area the posterior classification probability that is the posterior probability that the municipality belongs to the set of nondivergent areas. We show also the connections of our model with the false discovery rate approach. Posterior classification probabilities are used to explore areas at divergent risk from the reference while controlling for multiple testing. We consider both the PoissonGamma and the Besag, York and Mollie ´ model to account for extra Poisson variability in our Bayesian formulation. Posterior inference on classification probabilities is highly dependent on the choice of the prior. We perform a sensitivity analysis and suggest how to rely on subjectspecific information to derive informative a priori distributions. Hierarchical Bayesian models provide a sensible way to model classification probabilities in the context of disease mapping.
Empirical Bayes analysis of quantitative proteomics experiments
 PLoS ONE
, 2009
"... Background: Advances in mass spectrometrybased proteomics have enabled the incorporation of proteomic data into systems approaches to biology. However, development of analytical methods has lagged behind. Here we describe an empirical Bayes framework for quantitative proteomics data analysis. The m ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Background: Advances in mass spectrometrybased proteomics have enabled the incorporation of proteomic data into systems approaches to biology. However, development of analytical methods has lagged behind. Here we describe an empirical Bayes framework for quantitative proteomics data analysis. The method provides a statistical description of each experiment, including the number of proteins that differ in abundance between 2 samples, the experiment’s statistical power to detect them, and the falsepositive probability of each protein. Methodology/Principal Findings: We analyzed 2 types of mass spectrometric experiments. First, we showed that the method identified the protein targets of smallmolecules in affinity purification experiments with high precision. Second, we reanalyzed a mass spectrometric data set designed to identify proteins regulated by microRNAs. Our results were supported by sequence analysis of the 39 UTR regions of predicted target genes, and we found that the previously reported conclusion that a large fraction of the proteome is regulated by microRNAs was not supported by our statistical analysis of the data. Conclusions/Significance: Our results highlight the importance of rigorous statistical analysis of proteomic data, and the
Modeling Genetic Networks: Comparison of Static and Dynamic Models
"... Abstract. Biomedical research has been revolutionized by highthroughput techniques and the enormous amount of biological data they are able to generate. The interest shown over network models and systems biology is rapidly raising. Genetic networks arise as an essential task to mine these data sinc ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Biomedical research has been revolutionized by highthroughput techniques and the enormous amount of biological data they are able to generate. The interest shown over network models and systems biology is rapidly raising. Genetic networks arise as an essential task to mine these data since they explain the function of genes in terms of how they influence other genes. Many modeling approaches have been proposed for building genetic networks up. However, it is not clear what the advantages and disadvantages of each model are. There are several ways to discriminate network building models, being one of the most important whether the data being mined presents a static or dynamic fashion. In this work we compare static and dynamic models over a problem related to the inflammation and the host response to injury. We show how both models provide complementary information and crossvalidate the obtained results. 1
locfdr Vignette: Complete Help Documentation Including Usage Tips and Simulation Example,” The Comprehensive R Archive Network, November 1, 2007. As of November 3, 2008: http://cran.rproject.org/web/packages/locfdr/vignettes/locfdrexample.pdf Fridell, L
 Executive Research Forum, 2004. As of November 26, 2007: http://www.policeforum.org/library.asp?MENU=229
, 1973
"... This vignette includes locfdr’s complete help documentation, including usage tips, which could not fit in the R help file. It also demonstrates usage of locfdr through an example using the simulated data included in the package. 1 Description and Usage locfdr computes local false discovery rates, fo ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
This vignette includes locfdr’s complete help documentation, including usage tips, which could not fit in the R help file. It also demonstrates usage of locfdr through an example using the simulated data included in the package. 1 Description and Usage locfdr computes local false discovery rates, following the definitions and description in the references listed below. locfdr(zz, bre=120, df=7, pct=0, pct0=1/4, nulltype=1, type=0, plot=1, mult, mlests, main= " ", sw=0)