Results 1  10
of
54
Sparse Permutation Invariant Covariance Estimation
 Electronic Journal of Statistics
, 2008
"... The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in highdimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lassotype penalty. We establish a rate of convergence in the Fro ..."
Abstract

Cited by 80 (5 self)
 Add to MetaCart
(Show Context)
The paper proposes a method for constructing a sparse estimator for the inverse covariance (concentration) matrix in highdimensional settings. The estimator uses a penalized normal likelihood approach and forces sparsity by using a lassotype penalty. We establish a rate of convergence in the Frobenius norm as both data dimension p and sample size n are allowed to grow, and show that the rate depends explicitly on how sparse the true concentration matrix is. We also show that a correlationbased version of the method exhibits better rates in the operator norm. The estimator is required to be positive definite, but we avoid having to use semidefinite programming by reparameterizing the objective function
HIGHDIMENSIONAL ISING MODEL SELECTION USING ℓ1REGULARIZED LOGISTIC REGRESSION
 SUBMITTED TO THE ANNALS OF STATISTICS
"... We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. The method is ..."
Abstract

Cited by 40 (11 self)
 Add to MetaCart
(Show Context)
We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. The method is analyzed under highdimensional scaling, in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n = Ω(d 3 log p), with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n = Ω(d 2 log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.
Temporal Causal Modeling with Graphical Granger Methods
 In Proceedings of the 13th Int. Conference on Knowledge Discovery and Data Mining, 66 – 75: Association for Computing Machinery
, 2007
"... The need for mining causality, beyond mere statistical correlations, for real world problems has been recognized widely. Many of these applications naturally involve temporal data, which raises the challenge of how best to leverage the temporal information for causal modeling. Recently graphical mod ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
(Show Context)
The need for mining causality, beyond mere statistical correlations, for real world problems has been recognized widely. Many of these applications naturally involve temporal data, which raises the challenge of how best to leverage the temporal information for causal modeling. Recently graphical modeling with the concept of “Granger causality”, based on the intuition that a cause helps predict its effects in the future, has gained attention in many domains involving time series data analysis. With the surge of interest in model selection methodologies for regression, such as the Lasso, as practical alternatives to solving structural learning of graphical models, the question arises whether and how to combine these two notions into a practically viable approach for temporal causal modeling. In this paper, we examine a host of related
Entropy Inference and the JamesStein Estimator, with Application to Nonlinear Gene Association Networks
"... We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring highdimensional gene association networks. Specifically, we develop a JamesSteintype shrinkage estimator, resulting in a procedure that is highly effic ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring highdimensional gene association networks. Specifically, we develop a JamesSteintype shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and datagenerating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropybased geneassociation network from gene expression data. A computer program is available that implements the proposed shrinkage estimator. Keywords: entropy, shrinkage estimation, JamesStein estimator, “small n, large p ” setting, mutual information, gene association network
Modelling Activity Global Temporal Dependencies using Time Delayed Probabilistic Graphical Model
"... We present a novel approach for detecting global behaviour anomalies in multiple disjoint cameras by learning time delayed dependencies between activities cross camera views. Specifically, we propose to model multicamera activities using a Time Delayed Probabilistic Graphical Model (TDPGM) with di ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
(Show Context)
We present a novel approach for detecting global behaviour anomalies in multiple disjoint cameras by learning time delayed dependencies between activities cross camera views. Specifically, we propose to model multicamera activities using a Time Delayed Probabilistic Graphical Model (TDPGM) with different nodes representing activities in different semantically decomposed regions from different camera views, and the directed links between nodes encoding causal relationships between the activities. A novel twostage structure learning algorithm is formulated to learn globally optimised timedelayed dependencies. A new cumulative abnormality score is also introduced to replace the conventional loglikelihood score for gaining significantly more robust and reliable realtime anomaly detection. The effectiveness of the proposed approach is validated using a camera network installed at a busy underground station. 1.
Estimating highdimensional intervention effects from observation data
 THE ANN OF STAT
, 2009
"... We assume that we have observational data generated from an unknown underlying directed acyclic graph (DAG) model. A DAG is typically not identifiable from observational data, but it is possible to consistently estimate the equivalence class of a DAG. Moreover, for any given DAG, causal effects can ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
We assume that we have observational data generated from an unknown underlying directed acyclic graph (DAG) model. A DAG is typically not identifiable from observational data, but it is possible to consistently estimate the equivalence class of a DAG. Moreover, for any given DAG, causal effects can be estimated using intervention calculus. In this paper, we combine these two parts. For each DAG in the estimated equivalence class, we use intervention calculus to estimate the causal effects of the covariates on the response. This yields a collection of estimated causal effects for each covariate. We show that the distinct values in this set can be consistently estimated by an algorithm that uses only local information of the graph. This local approach is computationally fast and feasible in highdimensional problems. We propose to use summary measures of the set of possible causal effects to determine variable importance. In particular, we use the minimum absolute value of this set, since that is a lower bound on the size of the causal effect. We demonstrate the merits of our methods in a simulation study and on a data set about riboflavin production.
Treelets — An Adaptive MultiScale Basis for Sparse Unordered Data
"... In many modern applications, including analysis of gene expression and text documents, the data are noisy, highdimensional, and unordered — with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typicall ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
In many modern applications, including analysis of gene expression and text documents, the data are noisy, highdimensional, and unordered — with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity: the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this paper, we present treelets — a novel construction of multiscale bases that extends wavelets to nonsmooth signals. The method is fully adaptive, as it returns a hierarchical tree and an orthonormal basis which both reflect the internal structure of the data. Treelets are especially wellsuited as a dimensionality reduction and feature selection tool prior to regression and classification, in situations where sample sizes are small and the data are sparse with unknown groupings of correlated or collinear variables. The method is also simple to implement and analyze theoretically. Here we describe a variety of situations where treelets perform better than principal component analysis as well as some common variable selection and cluster averaging schemes. We illustrate treelets on a blocked covariance model and on several data sets (hyperspectral image data, DNA microarray data, and internet advertisements) with highly complex dependencies between variables. 1
Informationtheoretic limits of selecting binary graphical models in high dimensions
 in Proc. of IEEE Intl. Symp. on Inf. Theory
, 2008
"... in high dimensions ..."
(Show Context)
Consistent Feature Selection for Pattern Recognition in Polynomial Time
"... We analyze two different feature selection problems: finding a minimal feature set optimal for classification (MINIMALOPTIMAL) vs. finding all features relevant to the target variable (ALLRELEVANT). The latter problem is motivated by recent applications within bioinformatics, particularly gene exp ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
We analyze two different feature selection problems: finding a minimal feature set optimal for classification (MINIMALOPTIMAL) vs. finding all features relevant to the target variable (ALLRELEVANT). The latter problem is motivated by recent applications within bioinformatics, particularly gene expression analysis. For both problems, we identify classes of data distributions for which there exist consistent, polynomialtime algorithms. We also prove that ALLRELEVANT is much harder than MINIMALOPTIMAL and propose two consistent, polynomialtime algorithms. We argue that the distribution classes considered are reasonable in many practical cases, so that our results simplify feature selection in a wide range of machine learning tasks.
Penalized Likelihood Methods for Estimation of sparse high dimensional directed acyclic graphs
, 2010
"... Directed acyclic graphs are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical, as well as biological systems, where directed edges between nodes represent the influence of components of the system o ..."
Abstract

Cited by 6 (6 self)
 Add to MetaCart
Directed acyclic graphs are commonly used to represent causal relationships among random variables in graphical models. Applications of these models arise in the study of physical, as well as biological systems, where directed edges between nodes represent the influence of components of the system on each other. Estimation of directed graphs from observational data is computationally NPhard. In addition, directed graphs with the same structure may be indistinguishable based on observations alone. When the nodes exhibit a natural ordering, the problem of estimating directed graphs reduces to the problem of estimating the structure of the network. In this paper, we propose an efficient penalized likelihood method for estimation of the adjacency matrix of directed acyclic graphs, when variables inherit a natural ordering. We study variable selection consistency of both the lasso, as well as the adaptive lasso penalties in high dimensional sparse settings, and propose an errorbased choice for selecting the tuning parameter. We show that although the lasso is only variable selection consistent under stringent conditions, the adaptive lasso can consistently estimate the true graph under the usual regularity assumptions. Simulation studies indicate that the correct ordering of the variables becomes less critical in estimation of high dimensional sparse networks.