Results 1  10
of
32
Sparse Permutation Invariant Covariance Estimation
 Electronic Journal of Statistics
, 2008
"... The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in highdimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lassotype penalty. We establish a rate of convergence in the Fro ..."
Abstract

Cited by 161 (7 self)
 Add to MetaCart
The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in highdimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lassotype penalty. We establish a rate of convergence in the Frobenius norm as both data dimension p and sample size n are allowed to grow, and show that the rate depends explicitly on how sparse the true concentration matrix is. We also show that a correlationbased version of the method exhibits better rates in the operator norm. The estimator is required to be positive definite, but we avoid having to use semidefinite programming by reparameterizing the objective function
The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
"... Recent methods for estimating sparse undirected graphs for realvalued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula—or “nonparanormal”—for high dimensional inference. Just as additive models extend linear models by ..."
Abstract

Cited by 86 (19 self)
 Add to MetaCart
Recent methods for estimating sparse undirected graphs for realvalued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula—or “nonparanormal”—for high dimensional inference. Just as additive models extend linear models by replacing linear functions with a set of onedimensional smooth functions, the nonparanormal extends the normal by transforming the variables by smooth functions. We derive a method for estimating the nonparanormal, study the method’s theoretical properties, and show that it works well in many examples.
Temporal Causal Modeling with Graphical Granger Methods
 In Proceedings of the 13th Int. Conference on Knowledge Discovery and Data Mining, 66 – 75: Association for Computing Machinery
, 2007
"... The need for mining causality, beyond mere statistical correlations, for real world problems has been recognized widely. Many of these applications naturally involve temporal data, which raises the challenge of how best to leverage the temporal information for causal modeling. Recently graphical mod ..."
Abstract

Cited by 37 (6 self)
 Add to MetaCart
(Show Context)
The need for mining causality, beyond mere statistical correlations, for real world problems has been recognized widely. Many of these applications naturally involve temporal data, which raises the challenge of how best to leverage the temporal information for causal modeling. Recently graphical modeling with the concept of “Granger causality”, based on the intuition that a cause helps predict its effects in the future, has gained attention in many domains involving time series data analysis. With the surge of interest in model selection methodologies for regression, such as the Lasso, as practical alternatives to solving structural learning of graphical models, the question arises whether and how to combine these two notions into a practically viable approach for temporal causal modeling. In this paper, we examine a host of related
Discrete chain graph models
 Bernoulli
, 2009
"... The statistical literature discusses different types of Markov properties for chain graphs that lead to four possible classes of chain graph Markov models. The different models are rather well understood when the observations are continuous and multivariate normal, and it is also known that one mode ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
(Show Context)
The statistical literature discusses different types of Markov properties for chain graphs that lead to four possible classes of chain graph Markov models. The different models are rather well understood when the observations are continuous and multivariate normal, and it is also known that one model class, referred to as models of LWF (Lauritzen–Wermuth–Frydenberg) or block concentration type, yields discrete models for categorical data that are smooth. This paper considers the structural properties of the discrete models based on the three alternative Markov properties. It is shown by example that two of the alternative Markov properties can lead to nonsmooth models. The remaining model class, which can be viewed as a discrete version of multivariate regressions, is proven to comprise only smooth models. The proof employs a simple change of coordinates that also reveals that the model’s likelihood function is unimodal if the chain components of the graph are complete sets.
Multiple testing and error control in Gaussian graphical model selection
 Statistical Science
"... Abstract. Graphical models provide a framework for exploration of multivariate dependence patterns. The connection between graph and statistical model is made by identifying the vertices of the graph with the observed variables and translating the pattern of edges in the graph into a pattern of cond ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Graphical models provide a framework for exploration of multivariate dependence patterns. The connection between graph and statistical model is made by identifying the vertices of the graph with the observed variables and translating the pattern of edges in the graph into a pattern of conditional independences that is imposed on the variables ’ joint distribution. Focusing on Gaussian models, we review classical graphical models. For these models the defining conditional independences are equivalent to vanishing of certain (partial) correlation coefficients associated with individual edges that are absent from the graph. Hence, Gaussian graphical model selection can be performed by multiple testing of hypotheses about vanishing (partial) correlation coefficients. We show and exemplify how this approach allows one to perform model selection while controlling error rates for incorrect edge inclusion. Key words and phrases: Acyclic directed graph, Bayesian network, bidirected graph, chain graph, concentration graph, covariance graph, DAG, graphical model, multiple testing, undirected graph. 1.
Chain graph models of multivariate regression
, 906
"... type for categorical data ..."
(Show Context)
Maximum Likelihood Estimation in Gaussian AMP Chain Graph Models and Gaussian Ancestral Graph Models
, 2004
"... The AMP Markov property is a recently proposed alternative Markov property for chain graphs. In the case of continuous variables with a joint multivariate Gaussian distribution, it is the AMP rather than the earlier introduced LWF Markov property that is coherent with datageneration by natural bloc ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
The AMP Markov property is a recently proposed alternative Markov property for chain graphs. In the case of continuous variables with a joint multivariate Gaussian distribution, it is the AMP rather than the earlier introduced LWF Markov property that is coherent with datageneration by natural blockrecursive regressions. In this paper, we show that maximum likelihood estimates in Gaussian AMP chain graph models can be obtained by combining generalized least squares and iterative proportional fitting to an iterative algorithm. In an appendix, we give useful convergence results for iterative partial maximization algorithms that apply in particular to the described algorithm. Key words: AMP chain graph, graphical model, iterative partial maximization, multivariate normal distribution, maximum likelihood estimation 1
Estimation of gaussian graphs by model selection
 Electron. J. Stat
, 2008
"... Abstract. We investigate in this paper the estimation of Gaussian graphs by model selection from a nonasymptotic point of view. We start from a nsample of a Gaussian law PC in R p and focus on the disadvantageous case where n is smaller than p. To estimate the graph of conditional dependences of P ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
Abstract. We investigate in this paper the estimation of Gaussian graphs by model selection from a nonasymptotic point of view. We start from a nsample of a Gaussian law PC in R p and focus on the disadvantageous case where n is smaller than p. To estimate the graph of conditional dependences of PC, we introduce a collection of candidate graphs and then select one of them by minimizing a penalized empirical risk. Our main result assess the performance of the procedure in a nonasymptotic setting. We pay a special attention to the maximal degree D of the graphs that we can handle, which turns to be roughly n/(2log p). 1.
Structural Learning of Chain Graphs via Decomposition
"... Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the i ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Chain graphs present a broad class of graphical models for description of conditional independence structures, including both Markov networks and Bayesian networks as special cases. In this paper, we propose a computationally feasible method for the structural learning of chain graphs based on the idea of decomposing the learning problem into a set of smaller scale problems on its decomposed subgraphs. The decomposition requires conditional independencies but does not require the separators to be complete subgraphs. Algorithms for both skeleton recovery and complex arrow orientation are presented. Simulations under a variety of settings demonstrate the competitive performance of our method, especially when the underlying graph is sparse.
Sparse Causal Discovery in Multivariate Time Series
, 2008
"... Our goal is to estimate causal interactions in multivariate time series. Using vector autoregressive (VAR) models, these can be defined based on nonvanishing coefficients belonging to respective timelagged instances. As in most cases a parsimonious causality structure is assumed, a promising appro ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Our goal is to estimate causal interactions in multivariate time series. Using vector autoregressive (VAR) models, these can be defined based on nonvanishing coefficients belonging to respective timelagged instances. As in most cases a parsimonious causality structure is assumed, a promising approach to causal discovery consists in fitting VAR models with an additional sparsitypromoting regularization. Along this line we here propose that sparsity should be enforced for the subgroups of coefficients that belong to each pair of time series, as the absence of a causal relation requires the coefficients for all timelags to become jointly zero. Such behavior can be achieved by means of ℓ1,2norm regularized regression, for which an efficient active set solver has been proposed recently. Our method is shown to outperform standard methods in recovering simulated causality graphs. The results are on par with a second novel approach which uses multiple statistical testing.