Results 1  10
of
23
The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
"... Recent methods for estimating sparse undirected graphs for realvalued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula—or “nonparanormal”—for high dimensional inference. Just as additive models extend linear models by ..."
Abstract

Cited by 91 (21 self)
 Add to MetaCart
(Show Context)
Recent methods for estimating sparse undirected graphs for realvalued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula—or “nonparanormal”—for high dimensional inference. Just as additive models extend linear models by replacing linear functions with a set of onedimensional smooth functions, the nonparanormal extends the normal by transforming the variables by smooth functions. We derive a method for estimating the nonparanormal, study the method’s theoretical properties, and show that it works well in many examples.
Highdimensional semiparametric gaussian copula graphical models. The Annals of Statistics
, 2012
"... We propose a semiparametric approach called the nonparanormal SKEPTIC for efficiently and robustly estimating highdimensional undirected graphical models. To achieve modeling flexibility, we consider the nonparanormal graphical models proposed by Liu, Lafferty and Wasserman [J. Mach. Learn. Res. 1 ..."
Abstract

Cited by 48 (20 self)
 Add to MetaCart
We propose a semiparametric approach called the nonparanormal SKEPTIC for efficiently and robustly estimating highdimensional undirected graphical models. To achieve modeling flexibility, we consider the nonparanormal graphical models proposed by Liu, Lafferty and Wasserman [J. Mach. Learn. Res. 10 (2009) 2295–2328]. To achieve estimation robustness, we exploit nonparametric rankbased correlation coefficient estimators, including Spearman’s rho and Kendall’s tau. We prove that the nonparanormal SKEPTIC achieves the optimal parametric rates of convergence for both graph recovery and parameter estimation. This result suggests that the nonparanormal graphical models can be used as a safe replacement of the popular Gaussian graphical models, even when the data are truly Gaussian. Besides theoretical analysis, we also conduct thorough numerical simulations to compare the graph recovery performance of different estimators under both ideal and noisy settings. The proposed methods are then applied on a largescale genomic data set to illustrate their empirical usefulness. The R package huge implementing the proposed methods is available on the Comprehensive R
Supplement to “Regularized rankbased estimation of highdimensional nonparanormal graphical models.” DOI:10.1214/12AOS1041SUPP
, 2012
"... A sparse precision matrix can be directly translated into a sparse Gaussian graphical model under the assumption that the data follow a joint normal distribution. This neat property makes highdimensional precision matrix estimation very appealing in many applications. However, in practice we ofte ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
A sparse precision matrix can be directly translated into a sparse Gaussian graphical model under the assumption that the data follow a joint normal distribution. This neat property makes highdimensional precision matrix estimation very appealing in many applications. However, in practice we often face nonnormal data, and variable transformation is often used to achieve normality. In this paper we consider the nonparanormal model that assumes that the variables follow a joint normal distribution after a set of unknown monotone transformations. The nonparanormal model is much more flexible than the normal model while retaining the good interpretability of the latter in that each zero entry in the sparse precision matrix of the nonparanormal model corresponds to a pair of conditionally independent variables. In this paper we show that the nonparanormal graphical model can be efficiently estimated by using a rankbased estimation scheme which does not require estimating these unknown transformation functions. In particular, we study the rankbased graphical lasso, the rankbased neighborhood Dantzig selector and the rankbased CLIME. We establish their theoretical properties in the setting where the dimension is nearly exponentially large relative to the sample size. It is shown that the proposed rankbased estimators work as well as their oracle counterparts defined with the oracle data. Furthermore, the theory motivates us to consider the adaptive version of the rankbased neighborhood Dantzig selector and the rankbased CLIME that are shown to enjoy graphical model selection consistency without assuming the irrepresentable condition for the oracle and rankbased graphical lasso. Simulated and real data are used to demonstrate the finite performance of the rankbased estimators. 1. Introduction. Estimating
Estimation of gaussian graphs by model selection
 Electron. J. Stat
, 2008
"... Abstract. We investigate in this paper the estimation of Gaussian graphs by model selection from a nonasymptotic point of view. We start from a nsample of a Gaussian law PC in R p and focus on the disadvantageous case where n is smaller than p. To estimate the graph of conditional dependences of P ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
Abstract. We investigate in this paper the estimation of Gaussian graphs by model selection from a nonasymptotic point of view. We start from a nsample of a Gaussian law PC in R p and focus on the disadvantageous case where n is smaller than p. To estimate the graph of conditional dependences of PC, we introduce a collection of candidate graphs and then select one of them by minimizing a penalized empirical risk. Our main result assess the performance of the procedure in a nonasymptotic setting. We pay a special attention to the maximal degree D of the graphs that we can handle, which turns to be roughly n/(2log p). 1.
The hidden life of latent variables: Bayesian learning with mixed graph models
, 2008
"... Directed acyclic graphs (DAGs) have been widely used as a representation of conditional independence in machine learning and statistics. Moreover, hidden or latent variables are often an important component of graphical models. However, DAG models suffer from an important limitation: the family of D ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
Directed acyclic graphs (DAGs) have been widely used as a representation of conditional independence in machine learning and statistics. Moreover, hidden or latent variables are often an important component of graphical models. However, DAG models suffer from an important limitation: the family of DAGs is not closed under marginalization of hidden variables. This means that in general we cannot use a DAG to represent the independencies over a subset of variables in a larger DAG. Directed mixed graphs (DMGs) are a representation that includes DAGs as a special case, and overcomes this limitation. This paper introduces algorithms for performing Bayesian inference in Gaussian and probit DMG models. An important requirement for inference is the characterization of the distribution over parameters of the models. We introduce a new distribution for covariance matrices of Gaussian DMGs. We discuss and illustrate how several Bayesian machine learning tasks can benefit from the principle presented here: the power to model dependencies that are generated from hidden variables, but without necessarily modelling such variables explicitly.
Weightedlasso for structured network inference from time course data
 Statistical Applications in Genetics and Molecular Biology
"... Abstract: We present a weightedLasso method to infer the parameters of a firstorder vector autoregressive model that describes time course expression data generated by directed genetogene regulation networks. These networks are assumed to own prior internal structures of connectivity which dri ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Abstract: We present a weightedLasso method to infer the parameters of a firstorder vector autoregressive model that describes time course expression data generated by directed genetogene regulation networks. These networks are assumed to own prior internal structures of connectivity which drive the inference method. This prior structure can be either derived from prior biological knowledge or inferred by the method itself. We illustrate the performance of this structurebased penalization both on synthetic data and on two canonical regulatory networks, first yeast cell cycle regulation network by analyzing Spellman et al’s dataset and second E. coli S.O.S. DNA repair network by analysing U. Alon’s lab data.
Transelliptical graphical models
 in: Advances in Neural Information Processing Systems
"... We advocate the use of a new distribution family—the transelliptical—for robust inference of high dimensional graphical models. The transelliptical family is an extension of the nonparanormal family proposed by Liu et al. (2009). Just as the nonparanormal extends the normal by transforming the varia ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
(Show Context)
We advocate the use of a new distribution family—the transelliptical—for robust inference of high dimensional graphical models. The transelliptical family is an extension of the nonparanormal family proposed by Liu et al. (2009). Just as the nonparanormal extends the normal by transforming the variables using univariate functions, the transelliptical extends the elliptical family in the same way. We propose a nonparametric rankbased regularization estimator which achieves the parametric rates of convergence for both graph recovery and parameter estimation. Such a result suggests that the extra robustness and flexibility obtained by the semiparametric transelliptical modeling incurs almost no efficiency loss. We also discuss the relationship between this work with the transelliptical component analysis proposed by Han and Liu (2012). 1
Robust graphical modeling of gene networks using classical and alternative tdistributions
 Annals of Applied Statistics
, 2011
"... ar ..."
Markov Properties for Linear Causal Models with Correlated Errors
"... A linear causal model with correlated errors, represented by a DAG with bidirected edges, can be tested by the set of conditional independence relations implied by the model. A global Markov property specifies, by the dseparation criterion, the set of all conditional independence relations holding ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
A linear causal model with correlated errors, represented by a DAG with bidirected edges, can be tested by the set of conditional independence relations implied by the model. A global Markov property specifies, by the dseparation criterion, the set of all conditional independence relations holding in any model associated with a graph. A local Markov property specifies a much smaller set of conditional independence relations which will imply all other conditional independence relations which hold under the global Markov property. For DAGs with bidirected edges associated with arbitrary probability distributions, a local Markov property is given in Richardson (2003) which may invoke an exponential number of conditional independencies. In this paper, we show that for a class of linear structural equation models with correlated errors the local Markov property will invoke only linear number of conditional independence relations. For general linear models, we provide a local Markov property that often invokes far fewer conditional independencies than that in Richardson (2003). The results have applications in testing linear structural equation models with correlated errors.
Assessing the validity domains of graphical Gaussian models in order to infer relationships among components of complex biological systems
, 2008
"... Abstract. The study of the interactions of cellular components is an essential base step to understand the structure and dynamics of biological networks. So, various methods were recently developed in this purpose. While most of them combine different types of data and ¡em¿a priori¡/em ¿ knowledge, ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Abstract. The study of the interactions of cellular components is an essential base step to understand the structure and dynamics of biological networks. So, various methods were recently developed in this purpose. While most of them combine different types of data and ¡em¿a priori¡/em ¿ knowledge, methods based on Graphical Gaussian Models are capable of learning the network directly from raw data. They consider the fullorder partial correlations which are partial correlations between two variables given the remaining ones, for modelling direct links between variables. Statistical methods were developed for estimating these links when the number of observations is larger than the number of variables. However, the rapid advance of new technologies that allow to simultaneous measure genome expression, led to largescale datasets where the number of variables is far larger than the number of observations. To get round this dimensionality problem, different strategies and new statistical methods were proposed. In this study we focused on statistical methods recently published. All are based on the fact that the number of direct relationship between two variables is very small in regards to the number of possible relationships, ¡em¿p(p1)/2¡/em¿. In the biological context, this assumption is not always satisfied over the whole graph. So it is essential to precisely know the behaviour of the methods in regards to the characteristics of the studied object before applying them. For this purpose, we evaluated the validity domain of each method from wideranging simulated datasets. We then illustrated our results using recently published biological data.