Results 1 
7 of
7
Transposable Regularized Covariance Models with an Application to Missing Data Imputation
, 2008
"... Missing data estimation is an important challenge with highdimensional data arranged in the form of a matrix. Typically this data is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrixvariate no ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Missing data estimation is an important challenge with highdimensional data arranged in the form of a matrix. Typically this data is transposable, meaning that either the rows, columns or both can be treated as features. To model transposable data, we present a modification of the matrixvariate normal, the meanrestricted matrixvariate normal, in which the rows and columns each have a separate mean vector and covariance matrix. We extend regularized covariance models, which place an additive penalty on the inverse covariance matrix, to this distribution, by placing separate penalties on the covariances of the rows and columns. These so called transposable regularized covariance models allow for maximum likelihood estimation of the mean and nonsingular covariance matrices. Using these models, we formulate EMtype algorithms for missing data imputation in both the multivariate and transposable frameworks. Exploiting the structure of our transposable models, we present techniques enabling use of our models with highdimensional data and give a computationally feasible onestep approximation for imputation. Simulations and results on microarray data and the Netflix data show that these imputation techniques often outperform existing methods and offer a greater degree of flexibility. 1
False Discovery Rates and Copy Number Variation
"... Copy number changes, the gains and losses of chromosome segments, are a common type of genetic variation among healthy individuals as well as an important feature in tumor genomes. Microarray technology enables us to simultaneously measure, with moderate accuracy, copy number variation at more than ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Copy number changes, the gains and losses of chromosome segments, are a common type of genetic variation among healthy individuals as well as an important feature in tumor genomes. Microarray technology enables us to simultaneously measure, with moderate accuracy, copy number variation at more than a million chromosome locations and for hundreds of subjects. This leads to massive data sets and complicated inference problems concerning which locations for which subjects are genuinely variable. In this paper we consider a relatively simple false discovery rate approach to cnv analysis. More careful parametric changepoint methods can then be focused on promising regions of the genome. Key words and phrases: copy number 1
Tweedie’s Formula and Selection Bias
"... We suppose that the statistician observes some large number of estimates zi, each with its own unobserved expectation parameter µi. The largest few of the zi’s are likely to substantially overestimate their corresponding µi’s, this being an example of selection bias, or regression to the mean. Tweed ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We suppose that the statistician observes some large number of estimates zi, each with its own unobserved expectation parameter µi. The largest few of the zi’s are likely to substantially overestimate their corresponding µi’s, this being an example of selection bias, or regression to the mean. Tweedie’s formula, first reported by Robbins in 1956, offers a simple empirical Bayes approach for correcting selection bias. This paper investigates its merits and limitations. In addition to the methodology, Tweedie’s formula raises more general questions concerning empirical Bayes theory, discussed here as “relevance ” and “empirical Bayes information. ” There is a close connection between applications of the formula and James–Stein estimation. Keywords: Bayesian relevance, empirical Bayes information, James–Stein, false discovery rates, regret, winner’s curse
[Two−sample Nonparametric Boots: resample Columns of X] Frequency
"... pixels,...) each with its own summary statistic “zi”, ..."
The BetaBinomial SGoF method for multiple dependent tests
, 2012
"... In this paper a correction of SGoF multitesting method for dependent tests is introduced. The correction is based in the betabinomial model, and therefore the new method is called BetaBinomial SGoF (or BBSGoF). Main properties of the new method are established, and its practical implementation is ..."
Abstract
 Add to MetaCart
In this paper a correction of SGoF multitesting method for dependent tests is introduced. The correction is based in the betabinomial model, and therefore the new method is called BetaBinomial SGoF (or BBSGoF). Main properties of the new method are established, and its practical implementation is discussed. BBSGoF is illustrated through the analysis of two different real data sets on gene/protein expression levels. The performance of the method is investigated through simulations too. One of the main conclusions of the paper is that SGoF strategy may have much power even in the presence of possible dependences among the tests. 1
1 2 3 4 5 6 7 8
"... Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. ..."
Abstract
 Add to MetaCart
Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor.
Statistical Analysis of Big Data on Pharmacogenomics
"... This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively overview several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for netwo ..."
Abstract
 Add to MetaCart
This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively overview several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, largescale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identify important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed.