Results 1  10
of
115
Assigning significance to peptides identified by tandem mass spectrometry using decoy databases
 J. Proteome Res
, 2008
"... Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptidespectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These m ..."
Abstract

Cited by 65 (13 self)
 Add to MetaCart
Automated methods for assigning peptides to observed tandem mass spectra typically return a list of peptidespectrum matches, ranked according to an arbitrary score. In this article, we describe methods for converting these arbitrary scores into more useful statistical significance measures. These methods employ a decoy sequence database as a model of the null hypothesis, and use false discovery rate (FDR) analysis to correct for multiple testing. We first describe a simple FDR inference method and then describe how estimating and taking into account the percentage of incorrectly identified spectra in the entire data set can lead to increased statistical power.
Highdimensional semiparametric Gaussian copula graphical models
 THE ANNALS OF STATISTICS
, 2012
"... We propose a semiparametric approach called the nonparanormal SKEPTIC for efficiently and robustly estimating highdimensional undirected graphical models. To achieve modeling flexibility, we consider the nonparanormal graphical models proposed by Liu, Lafferty and Wasserman [J. Mach. Learn. Res. 10 ..."
Abstract

Cited by 51 (19 self)
 Add to MetaCart
We propose a semiparametric approach called the nonparanormal SKEPTIC for efficiently and robustly estimating highdimensional undirected graphical models. To achieve modeling flexibility, we consider the nonparanormal graphical models proposed by Liu, Lafferty and Wasserman [J. Mach. Learn. Res. 10 (2009) 2295–2328]. To achieve estimation robustness, we exploit nonparametric rankbased correlation coefficient estimators, including Spearman’s rho and Kendall’s tau. We prove that the nonparanormal SKEPTIC achieves the optimal parametric rates of convergence for both graph recovery and parameter estimation. This result suggests that the nonparanormal graphical models can be used as a safe replacement of the popular Gaussian graphical models, even when the data are truly Gaussian. Besides theoretical analysis, we also conduct thorough numerical simulations to compare the graph recovery performance of different estimators under both ideal and noisy settings. The proposed methods are then applied on a largescale genomic data set to illustrate their empirical usefulness. The R package huge implementing the proposed methods is available on the Comprehensive R
Countbased differential expression analysis of RNA sequencing data using R and Bioconductor
"... Correspondence and requests for materials should be addressed to M.D.R. or W.H. (email: ..."
Abstract

Cited by 43 (4 self)
 Add to MetaCart
Correspondence and requests for materials should be addressed to M.D.R. or W.H. (email:
Using control genes to correct for unwanted variation in microarray data
 In: Biostatistics
, 2011
"... Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted variation. Several of these methods rely on factor analysis to infer the unwanted variation from the da ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted variation. Several of these methods rely on factor analysis to infer the unwanted variation from the data. A central problem with this approach is the difficulty in discerning the unwanted variation from the biological variation that is of interest to the researcher. We present a new method, intended for use in differential expression studies, that attempts to overcome this problem by restricting the factor analysis to negative control genes. Negative control genes are genes known a priori not to be differentially expressed with respect to the biological factor of interest. Variation in the expression levels of these genes can therefore be assumed to be unwanted variation. We name this method “Remove Unwanted Variation, 2step ” (RUV2). We discuss various techniques for assessing the performance of an adjustment method and compare the performance of RUV2 with that of other commonly used adjustment methods such as Combat and Surrogate Variable Analysis (SVA). We present several example studies, each concerning genes differentially expressed with respect to gender in the brain and find that RUV2 performs as well or better than other methods. Finally, we discuss the possibility of adapting RUV2 for use in studies not concerned with differential expression and conclude that there may be promise but substantial challenges remain.
Matrix eQTL: ultra fast eQTL analysis via large matrix operations
 Bioinformatics
, 2012
"... Motivation: Expression quantitative trait loci (eQTL) mapping aims to determine genomic regions that regulate gene transcription. Expression QTL is used to study the regulatory structure of normal tissues and to search for genetic factors in complex diseases such as cancer, diabetes, and cystic fibr ..."
Abstract

Cited by 21 (3 self)
 Add to MetaCart
(Show Context)
Motivation: Expression quantitative trait loci (eQTL) mapping aims to determine genomic regions that regulate gene transcription. Expression QTL is used to study the regulatory structure of normal tissues and to search for genetic factors in complex diseases such as cancer, diabetes, and cystic fibrosis. A modern eQTL dataset contains millions of SNPs and thousands of transcripts measured for hundreds of samples. This makes the analysis computationally complex as it involves independent testing for association for every transcriptSNP pair. The heavy computational burden makes eQTL analysis less popular, often forces analysts to restrict their attention to just a subset of transcripts and SNPs. As larger genotype and gene expression datasets become available, the demand for fast tools for eQTL analysis increases. Solution: We present a new method for fast eQTL analysis via linear models, called Matrix eQTL. Matrix eQTL can model and test for association using both linear regression and ANOVA models. The models can include covariates to account for such factors as population structure, gender, and clinical variables. It also supports testing of heteroscedastic models and models with correlated errors. In our experiment on large datasets Matrix eQTL was thousands of times faster than the existing popular software for QTL/eQTL analysis. Matrix eQTL is implemented as both Matlab and R packages and thus can easily be run on Windows, Mac OS, and Linux systems. The software is freely available at the following address.
Xiaoyong, A Review on
 Hybrid Storage, Microcomputer Applications, Vol.29, No.2
"... Epidemiology and prevention of hepatitis B virus infection in China ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
(Show Context)
Epidemiology and prevention of hepatitis B virus infection in China
The sva package for removing batch effects and other unwanted variation in highthroughput experiments
 Bioinformatics
, 2012
"... 4 Applying the sva function to estimate batch and other artifacts 4 ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
4 Applying the sva function to estimate batch and other artifacts 4
Sufficient dimension reduction and prediction in regression
 Philosophical Transactions of the Royal Society A
"... Dimension reduction for regression is a prominent issue today because technological advances now allow scientists to routinely formulate regressions in which the number of predictors is considerably larger than in the past. While several methods have been proposed to deal with such regressions, pri ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
Dimension reduction for regression is a prominent issue today because technological advances now allow scientists to routinely formulate regressions in which the number of predictors is considerably larger than in the past. While several methods have been proposed to deal with such regressions, principal components still seem to be the most widely used across the applied sciences. We give a broad overview of ideas underlying a particular class of methods for dimension reduction that includes principal components, along with an introduction to the corresponding methodology. New methods are proposed for prediction in regressions with many predictors.
Efficient inference in matrixvariate Gaussian models with iid observation noise
"... Inference in matrixvariate Gaussian models has major applications for multioutput prediction and joint learning of row and column covariances from matrixvariate data. Here, we discuss an approach for efficient inference in such models that explicitly account for iid observation noise. Computational ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
Inference in matrixvariate Gaussian models has major applications for multioutput prediction and joint learning of row and column covariances from matrixvariate data. Here, we discuss an approach for efficient inference in such models that explicitly account for iid observation noise. Computational tractability can be retained by exploiting the Kronecker product between row and column covariance matrices. Using this framework, we show how to generalize the Graphical Lasso in order to learn a sparse inverse covariance between features while accounting for a lowrank confounding covariance between samples. We show practical utility on applications to biology, where we model covariances with more than 100,000 dimensions. We find greater accuracy in recovering biological network structures and are able to better reconstruct the confounders. 1
Surrogate Variable Analysis Using Partial Least Squares (SVAPLS) in Gene Expression Studies
 2012; :Bioinformatics
"... Motivation: In a typical gene expression profiling study, our prime objective is to identify the genes that are differentially expressed between the samples from two different tissue types. Commonly, standard ANOVA/regression is implemented to identify the relative effects of these genes over the tw ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Motivation: In a typical gene expression profiling study, our prime objective is to identify the genes that are differentially expressed between the samples from two different tissue types. Commonly, standard ANOVA/regression is implemented to identify the relative effects of these genes over the two types of samples from their respective arrays of expression levels. But, this technique becomes fundamentally flawed when there are unaccounted sources of variability in these arrays (latent variables attributable to different biological, environmental or other factors relevant in the context). These factors distort the true picture of differential gene expression between the two tissue types and introduce spurious signals of expression heterogeneity. As a result many genes which are actually differentially expressed are not detected, whereas many others are falsely identified as positives. Moreover, these distortions can be