Results 1  10
of
78
Highdimensional graphical model selection using ℓ1regularized logistic regression
 Advances in Neural Information Processing Systems 19
, 2007
"... We consider the problem of estimating the graph structure associated with a discrete Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. Our fram ..."
Abstract

Cited by 78 (6 self)
 Add to MetaCart
We consider the problem of estimating the graph structure associated with a discrete Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. Our framework applies to the highdimensional setting, in which both the number of nodes p and maximum neighborhood sizes d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. Under certain assumptions on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n = Ω(d 3 log p), with the error decaying as O(exp(−Cn/d 3)) for some constant C. If these same assumptions are imposed directly on the sample matrices, we show that n = Ω(d 2 log p) samples are sufficient.
A unified framework for highdimensional analysis of Mestimators with decomposable regularizers
"... ..."
Covariance regularization by thresholding
, 2007
"... This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGa ..."
Abstract

Cited by 69 (9 self)
 Add to MetaCart
This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or subGaussian, and (log p)/n → 0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general crossvalidation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data. 1. Introduction. Estimation
Stability selection
"... Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. 1 2 ..."
Abstract

Cited by 60 (2 self)
 Add to MetaCart
Proofs subject to correction. Not to be reproduced without permission. Contributions to the discussion must not exceed 400 words. Contributions longer than 400 words will be cut by the editor. 1 2
Sparsistency and rates of convergence in large covariance matrices estimation
, 2007
"... This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probabi ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (sn log pn/n) 1/2, where sn is the number of nonzero elements, pn is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of highdimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λn goes to 0 have been made explicit and compared under different penalties. As a result, for the L1penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: s ′ n = O(pn) at most, among O(p2 n) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where s ′ n is the number of the nonzero elements on the offdiagonal entries. On the other hand, using the SCAD or hardthresholding penalty functions, there is no such a restriction. 1. Introduction. Covariance
The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
"... Recent methods for estimating sparse undirected graphs for realvalued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula—or “nonparanormal”—for high dimensional inference. Just as additive models extend linear models by ..."
Abstract

Cited by 41 (12 self)
 Add to MetaCart
Recent methods for estimating sparse undirected graphs for realvalued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula—or “nonparanormal”—for high dimensional inference. Just as additive models extend linear models by replacing linear functions with a set of onedimensional smooth functions, the nonparanormal extends the normal by transforming the variables by smooth functions. We derive a method for estimating the nonparanormal, study the method’s theoretical properties, and show that it works well in many examples.
HIGHDIMENSIONAL ISING MODEL SELECTION USING ℓ1REGULARIZED LOGISTIC REGRESSION
 SUBMITTED TO THE ANNALS OF STATISTICS
"... We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. The method is ..."
Abstract

Cited by 40 (13 self)
 Add to MetaCart
We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on ℓ1regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an ℓ1constraint. The method is analyzed under highdimensional scaling, in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n = Ω(d 3 log p), with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n = Ω(d 2 log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.
Estimation of (near) lowrank matrices with noise and highdimensional scaling
"... We study an instance of highdimensional statistical inference in which the goal is to use N noisy observations to estimate a matrix Θ ∗ ∈ R k×p that is assumed to be either exactly low rank, or “near ” lowrank, meaning that it can be wellapproximated by a matrix with low rank. We consider an Me ..."
Abstract

Cited by 38 (11 self)
 Add to MetaCart
We study an instance of highdimensional statistical inference in which the goal is to use N noisy observations to estimate a matrix Θ ∗ ∈ R k×p that is assumed to be either exactly low rank, or “near ” lowrank, meaning that it can be wellapproximated by a matrix with low rank. We consider an Mestimator based on regularization by the traceornuclearnormovermatrices, andanalyze its performance under highdimensional scaling. We provide nonasymptotic bounds on the Frobenius norm error that hold for a generalclassofnoisyobservationmodels,and apply to both exactly lowrank and approximately lowrank matrices. We then illustrate their consequences for a number of specific learning models, including lowrank multivariate or multitask regression, system identification in vector autoregressive processes, and recovery of lowrank matrices from random projections. Simulations show excellent agreement with the highdimensional scaling of the error predicted by our theory. 1.
Projected Subgradient Methods for Learning Sparse Gaussians
"... Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a highdimensional space. Our approach uses the ℓ1norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient meth ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a highdimensional space. Our approach uses the ℓ1norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient method, which is faster than previous methods in practice and equal to the best performing of these in asymptotic complexity. We also extend the ℓ1regularized objective to the problem of sparsifying entire blocks within the inverse covariance matrix. Our methods generalize fairly easily to this case, while other methods do not. We demonstrate that our extensions give better generalization performance on two real domains—biological network analysis and a 2Dshape modeling image task. 1
Optimal rates of convergence for covariance matrix estimation
 Ann. Statist
, 2010
"... Covariance matrix plays a central role in multivariate statistical analysis. Significant advances have been made recently on developing both theory and methodology for estimating large covariance matrices. However, a minimax theory has yet been developed. In this paper we establish the optimal rates ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
Covariance matrix plays a central role in multivariate statistical analysis. Significant advances have been made recently on developing both theory and methodology for estimating large covariance matrices. However, a minimax theory has yet been developed. In this paper we establish the optimal rates of convergence for estimating the covariance matrix under both the operator norm and Frobenius norm. It is shown that optimal procedures under the two norms are different and consequently matrix estimation under the operator norm is fundamentally different from vector estimation. The minimax upper bound is obtained by constructing a special class of tapering estimators and by studying their risk properties. A key step in obtaining the optimal rate of convergence is the derivation of the minimax lower bound. The technical analysis requires new ideas that are quite different from those used in the more conventional function/sequence estimation problems. 1. Introduction. Suppose