Results 1  10
of
92
Highdimensional semiparametric Gaussian copula graphical models
 THE ANNALS OF STATISTICS
, 2012
"... We propose a semiparametric approach called the nonparanormal SKEPTIC for efficiently and robustly estimating highdimensional undirected graphical models. To achieve modeling flexibility, we consider the nonparanormal graphical models proposed by Liu, Lafferty and Wasserman [J. Mach. Learn. Res. 10 ..."
Abstract

Cited by 51 (19 self)
 Add to MetaCart
We propose a semiparametric approach called the nonparanormal SKEPTIC for efficiently and robustly estimating highdimensional undirected graphical models. To achieve modeling flexibility, we consider the nonparanormal graphical models proposed by Liu, Lafferty and Wasserman [J. Mach. Learn. Res. 10 (2009) 2295–2328]. To achieve estimation robustness, we exploit nonparametric rankbased correlation coefficient estimators, including Spearman’s rho and Kendall’s tau. We prove that the nonparanormal SKEPTIC achieves the optimal parametric rates of convergence for both graph recovery and parameter estimation. This result suggests that the nonparanormal graphical models can be used as a safe replacement of the popular Gaussian graphical models, even when the data are truly Gaussian. Besides theoretical analysis, we also conduct thorough numerical simulations to compare the graph recovery performance of different estimators under both ideal and noisy settings. The proposed methods are then applied on a largescale genomic data set to illustrate their empirical usefulness. The R package huge implementing the proposed methods is available on the Comprehensive R
Stability approach to regularization selection (stars) for high dimensional graphical models
 In Advances in Neural Information Processing Systems (NIPS
, 2010
"... Abstract A challenging problem in estimating highdimensional graphical models is to choose the regularization parameter in a datadependent way. The standard techniques include Kfold crossvalidation (KCV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though thes ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
(Show Context)
Abstract A challenging problem in estimating highdimensional graphical models is to choose the regularization parameter in a datadependent way. The standard techniques include Kfold crossvalidation (KCV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for lowdimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stabilitybased method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the stateoftheart model selection procedures, including KCV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.
A constrained ℓ1minimization approach to sparse precision matrix estimation
 J. Amer. Statist. Assoc
, 2011
"... ar ..."
Supplement to “Regularized rankbased estimation of highdimensional nonparanormal graphical models.”
, 2012
"... A sparse precision matrix can be directly translated into a sparse Gaussian graphical model under the assumption that the data follow a joint normal distribution. This neat property makes highdimensional precision matrix estimation very appealing in many applications. However, in practice we often ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
A sparse precision matrix can be directly translated into a sparse Gaussian graphical model under the assumption that the data follow a joint normal distribution. This neat property makes highdimensional precision matrix estimation very appealing in many applications. However, in practice we often face nonnormal data, and variable transformation is often used to achieve normality. In this paper we consider the nonparanormal model that assumes that the variables follow a joint normal distribution after a set of unknown monotone transformations. The nonparanormal model is much more flexible than the normal model while retaining the good interpretability of the latter in that each zero entry in the sparse precision matrix of the nonparanormal model corresponds to a pair of conditionally independent variables. In this paper we show that the nonparanormal graphical model can be efficiently estimated by using a rankbased estimation scheme which does not require estimating these unknown transformation functions. In particular, we study the rankbased graphical lasso, the rankbased neighborhood Dantzig selector and the rankbased CLIME. We establish their theoretical properties in the setting where the dimension is nearly exponentially large relative to the sample size. It is shown that the proposed rankbased estimators work as well as their oracle counterparts defined with the oracle data. Furthermore, the theory motivates us to consider the adaptive version of the rankbased neighborhood Dantzig selector and the rankbased CLIME that are shown to enjoy graphical model selection consistency without assuming the irrepresentable condition for the oracle and rankbased graphical lasso. Simulated and real data are used to demonstrate the finite performance of the rankbased estimators.
COPULA GAUSSIAN GRAPHICAL MODELS AND THEIR APPLICATION TO MODELING FUNCTIONAL DISABILITY DATA 1
"... We propose a comprehensive Bayesian approach for graphical model determination in observational studies that can accommodate binary, ordinal or continuous variables simultaneously. Our new models are called copula Gaussian graphical models (CGGMs) and embed graphical model selection inside a semipar ..."
Abstract

Cited by 31 (3 self)
 Add to MetaCart
We propose a comprehensive Bayesian approach for graphical model determination in observational studies that can accommodate binary, ordinal or continuous variables simultaneously. Our new models are called copula Gaussian graphical models (CGGMs) and embed graphical model selection inside a semiparametric Gaussian copula. The domain of applicability of our methods is very broad and encompasses many studies from social science and economics. We illustrate the use of the copula Gaussian graphical models in the analysis of a 16dimensional functional disability contingency table. 1. Introduction. The
The huge Package for Highdimensional Undirected Graph Estimation in R
, 2012
"... We describe an R package named huge which provides easytouse functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. [2007b], Liu et al. [2009] and Liu et al. [2010]. Compared with the existing graph ..."
Abstract

Cited by 21 (10 self)
 Add to MetaCart
We describe an R package named huge which provides easytouse functions for estimating high dimensional undirected graphs from data. This package implements recent results in the literature, including Friedman et al. [2007b], Liu et al. [2009] and Liu et al. [2010]. Compared with the existing graph estimation package glasso, the huge package provides extra features: (1) instead of using Fortan, it is written in C, which makes the code more portable and easier to modify; (2) besides fitting Gaussian graphical models, it also provides functions for fitting high dimensional semiparametric Gaussian copula models; (3) more functions like datadependent model selection, data generation and graph visualization; (4) a minor convergence problem of the graphical lasso algorithm is corrected; (5) the package allows the user to apply both lossless and lossy screening rules to scale up largescale problems, making a tradeoff between computational and statistical efficiency. 1
Forest density estimation
 Journal of Machine Learning Research
, 2011
"... We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bi ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
(Show Context)
We study graph estimation and density estimation in high dimensions, using a family of density estimators based on forest structured undirected graphical models. For density estimation, we do not assume the true distribution corresponds to a forest; rather, we form kernel density estimates of the bivariate and univariate marginals, and apply Kruskal’s algorithm to estimate the optimal forest on held out data. We prove an oracle inequality on the excess risk of the resulting estimator relative to the risk of the best forest. For graph estimation, we consider the problem of estimating forests with restricted tree sizes. We prove that finding a maximum weight spanning forest with restricted tree size is NPhard, and develop an approximation algorithm for this problem. Viewing the tree size as a complexity parameter, we then select a forest using data splitting, and prove bounds on excess risk and structure selection consistency of the procedure. Experiments with simulated data and microarray data indicate that the methods are a practical alternative to Gaussian graphical models.
Graphical models via generalized linear models.
 In Advances in Neural Information Processing Systems (NIPS),
, 2012
"... Abstract Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data ..."
Abstract

Cited by 19 (8 self)
 Add to MetaCart
(Show Context)
Abstract Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that nodewise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of nonGaussian highthroughput genomic networks learned via our GLM graphical models.
Copula Bayesian Networks
, 2010
"... We present the Copula Bayesian Network model for representing multivariate continuous distributions, while taking advantage of the relative ease of estimating univariate distributions. Using a novel copulabased reparameterization of a conditional density, joined with a graph that encodes independen ..."
Abstract

Cited by 18 (10 self)
 Add to MetaCart
(Show Context)
We present the Copula Bayesian Network model for representing multivariate continuous distributions, while taking advantage of the relative ease of estimating univariate distributions. Using a novel copulabased reparameterization of a conditional density, joined with a graph that encodes independencies, our model offers great flexibility in modeling highdimensional densities, while maintaining control over the form of the univariate marginals. We demonstrate the advantage of our framework for generalization over standard Bayesian networks as well as tree structured copula models for varied reallife domains that are of substantially higher dimension than those typically considered in the copula literature. 1
Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
, 2012
"... We investigate the relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects th ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
We investigate the relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a nonGaussian distribution. The proof exploits a combination of ideas from the geometry of exponential families, junction tree theory, and convex analysis. These populationlevel results have various consequences for graph selection methods, both known and novel, including a novel method for structure estimation for missing or corrupted observations. We provide nonasymptotic guarantees for such methods, and illustrate the sharpness of these predictions via simulations.