## On Learning Discrete Graphical Models using Group-Sparse

Citations: | 6 - 2 self |

### BibTeX

@MISC{Jalali_onlearning,

author = {Ali Jalali and Pradeep Ravikumar and Vishvas Vasuki and Sujay Sanghavi},

title = {On Learning Discrete Graphical Models using Group-Sparse},

year = {}

}

### OpenURL

### Abstract

We study the problem of learning the graph structure associated with a general discrete graphical models (each variable can take any of m> 1 values, the clique factors have maximum size c ≥ 2) from samples, under high-dimensional scaling where the number of variables p could be larger than the number of samples n. We provide a quantitative consistency analysis of a procedure based on node-wise multi-class logistic regression with group-sparse regularization. We first consider general m-ary pairwise models – where each factor depends on at most two variables. We show that when

### Citations

1103 |
Numerical Modelling of the
- S, ERRAUD, et al.
- 2000
(Show Context)
Citation Context ...for all t ∈ N (r). Accordingly, let ¯ Θ∗ P c represent all non-zero nonpairwise entries. Hierarchical Models. A common assumption imposed on such higher-order MRFs is that they be hierarchical models =-=[16]-=-. Specifically, any MRF of the form (2) is hierarchical if for any clique C, θ∗ C = 0 implies that θ∗ B = 0 for any clique B ⊇ A containing A. This has an importance consequence: the set of pairwise e... |

637 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...ing approaches. Methods for estimating such graph structure include those based on constraint and hypothesis testing [29], and those that estimate restricted classes of graph structures such as trees =-=[5]-=-, polytrees [11], and hypertrees [30]. Another class of approaches estimate the local neighborhood of each node via exhaustive search for the special case of bounded degree graphs. Abbeel et al.[1] pr... |

501 | Model selection and estimation in regression with grouped variables
- Yuan, Lin
- 2006
(Show Context)
Citation Context ... an ℓ1/ℓ2-regularized 380On Learning Discrete Graphical Models using Group-Sparse Regularization multiclass logistic regression problem, and is thus the multiclass logistic analog of the group Lasso =-=[36]-=-. The solution to the program (6) yields an estimate ̂N (r) of the neighborhood of node r by ̂N (r) = {t ∈ V : t ̸= r; ∥ ˆ θrt∥2 ̸= 0}. We are interested in the event that all the node neighboorhoods ... |

496 |
Causation, Prediction, and Search
- Spirtes, Glymour, et al.
- 1993
(Show Context)
Citation Context ... further greedy procedures, though we defer further discussion in the sequel. Existing approaches. Methods for estimating such graph structure include those based on constraint and hypothesis testing =-=[29]-=-, and those that estimate restricted classes of graph structures such as trees [5], polytrees [11], and hypertrees [30]. Another class of approaches estimate the local neighborhood of each node via ex... |

424 | The dantzig selector: Statistical estimation when p is much larger than n the dantzig selector: Statistical estimation when p is much larger than n
- Candes, Tao
- 2007
(Show Context)
Citation Context ...as shown it is still possible to obtain practical consistent procedures by leveraging low-dimensional structure. The most popular example is that of leveraging sparsity using ℓ1-regularization (e.g., =-=[4, 12, 21, 23, 31, 34, 37]-=-). For MRF structure learning, such ℓ1-regularization has been successfully used for Gaussian [21] and discrete binary pairwise (i.e. Ising) models [26, 17]. In these instances, there is effectively o... |

382 | High-dimensional graphs and variable selection with
- Meinshausen, Buhlmann
- 2006
(Show Context)
Citation Context ...as shown it is still possible to obtain practical consistent procedures by leveraging low-dimensional structure. The most popular example is that of leveraging sparsity using ℓ1-regularization (e.g., =-=[4, 12, 21, 23, 31, 34, 37]-=-). For MRF structure learning, such ℓ1-regularization has been successfully used for Gaussian [21] and discrete binary pairwise (i.e. Ising) models [26, 17]. In these instances, there is effectively o... |

298 |
Spatial Statistics
- Ripley
- 1981
(Show Context)
Citation Context .... Copyright 2011 by the authors. dom fields, are used in a variety of domains, including statistical physics [14], natural language processing [19], image analysis [35, 13, 6], and spatial statistics =-=[27]-=-, among others. A Markov random field (MRF) over a p-dimensional discrete random vector X = (X1, X2, . . . , Xp) is specified by an undirected graph G = (V, E), with vertex set V = {1, 2, . . . , p} –... |

296 | Just relax: convex programming methods for identifying sparse signals in noise
- Tropp
- 2006
(Show Context)
Citation Context ...as shown it is still possible to obtain practical consistent procedures by leveraging low-dimensional structure. The most popular example is that of leveraging sparsity using ℓ1-regularization (e.g., =-=[4, 12, 21, 23, 31, 34, 37]-=-). For MRF structure learning, such ℓ1-regularization has been successfully used for Gaussian [21] and discrete binary pairwise (i.e. Ising) models [26, 17]. In these instances, there is effectively o... |

276 |
Markow Random Field Texture Models
- Cross, Jain
- 1983
(Show Context)
Citation Context ... FL, USA. Volume 15 of JMLR: W&CP 15. Copyright 2011 by the authors. dom fields, are used in a variety of domains, including statistical physics [14], natural language processing [19], image analysis =-=[35, 13, 6]-=-, and spatial statistics [27], among others. A Markov random field (MRF) over a p-dimensional discrete random vector X = (X1, X2, . . . , Xp) is specified by an undirected graph G = (V, E), with verte... |

222 | On Model Selection Consistency of Lasso
- Zhao, Yu
- 2006
(Show Context)
Citation Context |

161 | thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (LASSO
- Wainwright, “Sharp
- 2009
(Show Context)
Citation Context |

156 | Consistency of the group Lasso and multiple kernel learning
- Bach
(Show Context)
Citation Context ...late the ℓq norms of the groups, and compute their overall ℓ1 norm. Recent work on group and block-sparse linear 378On Learning Discrete Graphical Models using Group-Sparse Regularization regression =-=[32, 8, 22, 18, 24, 25, 2]-=- show that under such group-sparse settings, group-sparse regularization outperforms the use of ℓ1 penalization. Our Results: Pairwise m-ary models. In this paper, we provide a quantitative consistenc... |

139 | The group lasso for logistic regression
- Meier, Geer, et al.
- 2008
(Show Context)
Citation Context ...ets across vertices to form the graph estimate. There has been a strong line of work on developing fast algorithms to solve these sparse multiclass logistic regression programs including Meier et al. =-=[20]-=-, Krishnapuram et al. [15]. Indeed, [9, 10] show good empirical performance using such ℓ1/ℓq regularization even with the joint likelihood over all variables. Our Results: General m-ary models. One (n... |

113 | Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Transaction on pattern analysis and machine learning
- Krishnapuram, Carin, et al.
- 2005
(Show Context)
Citation Context ...m the graph estimate. There has been a strong line of work on developing fast algorithms to solve these sparse multiclass logistic regression programs including Meier et al. [20], Krishnapuram et al. =-=[15]-=-. Indeed, [9, 10] show good empirical performance using such ℓ1/ℓq regularization even with the joint likelihood over all variables. Our Results: General m-ary models. One (natural, but expensive) ext... |

102 | Efficient structure learning of Markov networks using L1-regularization
- Lee, Ganapahthi, et al.
- 2007
(Show Context)
Citation Context ...ℓ1-regularization (e.g., [4, 12, 21, 23, 31, 34, 37]). For MRF structure learning, such ℓ1-regularization has been successfully used for Gaussian [21] and discrete binary pairwise (i.e. Ising) models =-=[26, 17]-=-. In these instances, there is effectively only one parameter per edge, so that a sparse graph corresponds to a sparse set of parameters. In this paper, we are interested in more general discrete grap... |

76 | Sparse additive models
- RAVIKUMAR, LAFFERTY, et al.
- 2009
(Show Context)
Citation Context ...late the ℓq norms of the groups, and compute their overall ℓ1 norm. Recent work on group and block-sparse linear 378On Learning Discrete Graphical Models using Group-Sparse Regularization regression =-=[32, 8, 22, 18, 24, 25, 2]-=- show that under such group-sparse settings, group-sparse regularization outperforms the use of ℓ1 penalization. Our Results: Pairwise m-ary models. In this paper, we provide a quantitative consistenc... |

75 | Sparse permutation invariant covariance estimation - ROTHMAN, BICKEL, et al. - 2008 |

72 |
Simultaneous variable selection
- TURLACH, VENABLES, et al.
- 2005
(Show Context)
Citation Context ...late the ℓq norms of the groups, and compute their overall ℓ1 norm. Recent work on group and block-sparse linear 378On Learning Discrete Graphical Models using Group-Sparse Regularization regression =-=[32, 8, 22, 18, 24, 25, 2]-=- show that under such group-sparse settings, group-sparse regularization outperforms the use of ℓ1 penalization. Our Results: Pairwise m-ary models. In this paper, we provide a quantitative consistenc... |

59 |
Sharp thresholds for noisy and high-dimensional recovery of sparsity using ℓ1-constrained quadratic programming
- Wainwright
- 2006
(Show Context)
Citation Context ...mum eigen values of J ∗ and J n . 3.1 Assumptions We begin by stating the assumptions imposed on the true model. We note that similar sufficient conditions have been imposed in papers analyzing Lasso =-=[33]-=- and block-regularization methods [22, 24]. (A1) Invertibility: (A2) Incoherence: ( ) ∗ Λmin Q ≥ Cmin > 0. SrSr ∥ for some α ∈ ( 0, 1 ) 2 . ∥Q ∗ S c r Sr ( Q ∗ SrSr ∈ ) −1 ∥ ∥∥∞,2 ≤ 1−2α √ dr (A3) Bou... |

48 |
de Geer. Taking advantage of sparsity in multi-task learning
- Lounici, Tsybakov, et al.
- 2009
(Show Context)
Citation Context |

47 |
The use of Markov random fields as models of texture
- Hassner, Sklansky
- 1980
(Show Context)
Citation Context ... FL, USA. Volume 15 of JMLR: W&CP 15. Copyright 2011 by the authors. dom fields, are used in a variety of domains, including statistical physics [14], natural language processing [19], image analysis =-=[35, 13, 6]-=-, and spatial statistics [27], among others. A Markov random field (MRF) over a p-dimensional discrete random vector X = (X1, X2, . . . , Xp) is specified by an undirected graph G = (V, E), with verte... |

45 | Learning factor graphs in polynomial time and sample complexity
- Abbeel, Koller, et al.
(Show Context)
Citation Context ...es [5], polytrees [11], and hypertrees [30]. Another class of approaches estimate the local neighborhood of each node via exhaustive search for the special case of bounded degree graphs. Abbeel et al.=-=[1]-=- propose a method for learning factor graphs based on local conditional entropies and thresholding, but the computational complexity grows at least as quickly as O(p d+1 ), where d is the maximum neig... |

41 | Maximum Likelihood Bounded Tree-width Markov Networks
- Srebro
- 2001
(Show Context)
Citation Context ...ng such graph structure include those based on constraint and hypothesis testing [29], and those that estimate restricted classes of graph structures such as trees [5], polytrees [11], and hypertrees =-=[30]-=-. Another class of approaches estimate the local neighborhood of each node via exhaustive search for the special case of bounded degree graphs. Abbeel et al.[1] propose a method for learning factor gr... |

37 | High-dimensional Ising model selection using ℓ1regularized logistic regression
- RAVIKUMAR, WAINWRIGHT, et al.
- 2010
(Show Context)
Citation Context ...ℓ1-regularization (e.g., [4, 12, 21, 23, 31, 34, 37]). For MRF structure learning, such ℓ1-regularization has been successfully used for Gaussian [21] and discrete binary pairwise (i.e. Ising) models =-=[26, 17]-=-. In these instances, there is effectively only one parameter per edge, so that a sparse graph corresponds to a sparse set of parameters. In this paper, we are interested in more general discrete grap... |

33 |
Learning polytrees
- Dasgupta
- 1999
(Show Context)
Citation Context ... Methods for estimating such graph structure include those based on constraint and hypothesis testing [29], and those that estimate restricted classes of graph structures such as trees [5], polytrees =-=[11]-=-, and hypertrees [30]. Another class of approaches estimate the local neighborhood of each node via exhaustive search for the special case of bounded degree graphs. Abbeel et al.[1] propose a method f... |

31 | Reconstruction of Markov Random Fields from Samples: Some Observations and Algorithms
- Bresler, Mossel, et al.
- 2008
(Show Context)
Citation Context ... local conditional entropies and thresholding, but the computational complexity grows at least as quickly as O(p d+1 ), where d is the maximum neighborhood size in the graphical model. Bresler et al. =-=[3]-=- describe a related local search-based method, and prove under relatively mild assumptions that it can recover the graph structure with Θ(log p) samples. However, in the absence of additional restrict... |

28 |
Maximal sparsity representation via ℓ1 minimization
- Donoho, Elad
(Show Context)
Citation Context |

27 | Support union recovery in high-dimensional multivariate regression
- Obozinski, Wainwright, et al.
- 2011
(Show Context)
Citation Context |

19 |
Feature selection, ℓ1 vs. ℓ2 regularization, and rotational invariance
- Ng
- 2004
(Show Context)
Citation Context |

17 | Joint support recovery under high-dimensional scaling: Benefits and perils of ℓ1-ℓ∞-regularization
- Negahban, Wainwright
- 2008
(Show Context)
Citation Context |

13 |
Markov image modeling
- Woods
- 1978
(Show Context)
Citation Context ... FL, USA. Volume 15 of JMLR: W&CP 15. Copyright 2011 by the authors. dom fields, are used in a variety of domains, including statistical physics [14], natural language processing [19], image analysis =-=[35, 13, 6]-=-, and spatial statistics [27], among others. A Markov random field (MRF) over a p-dimensional discrete random vector X = (X1, X2, . . . , Xp) is specified by an undirected graph G = (V, E), with verte... |

9 | Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries
- Dahinden, Parmigiani, et al.
- 2007
(Show Context)
Citation Context ...timate. There has been a strong line of work on developing fast algorithms to solve these sparse multiclass logistic regression programs including Meier et al. [20], Krishnapuram et al. [15]. Indeed, =-=[9, 10]-=- show good empirical performance using such ℓ1/ℓq regularization even with the joint likelihood over all variables. Our Results: General m-ary models. One (natural, but expensive) extension to graphic... |

7 |
Consistent estimation of the basic neighborhood structure of Markov random fields
- Csiszár, Talata
- 2006
(Show Context)
Citation Context ...umptions that it can recover the graph structure with Θ(log p) samples. However, in the absence of additional restrictions, the computational complexity of the method is O(p d+1 ). Csiszár and Talata =-=[7]-=- show consistency of a method that uses pseudo-likelihood and a modification of the BIC criterion, but this also involves a prohibitively expensive search. 2 Problem Setup and Notation MRFs and their ... |

7 |
Beitrag zur theorie der ferromagnetismus. Zeitschrift für Physik 31
- Ising
- 1925
(Show Context)
Citation Context ...ligence and Statistics (AISTATS) 2011, Fort Lauderdale, FL, USA. Volume 15 of JMLR: W&CP 15. Copyright 2011 by the authors. dom fields, are used in a variety of domains, including statistical physics =-=[14]-=-, natural language processing [19], image analysis [35, 13, 6], and spatial statistics [27], among others. A Markov random field (MRF) over a p-dimensional discrete random vector X = (X1, X2, . . . , ... |

3 |
Decomposition and model selection for large contingency tables
- Dahinden, Kalisch, et al.
- 2010
(Show Context)
Citation Context ...timate. There has been a strong line of work on developing fast algorithms to solve these sparse multiclass logistic regression programs including Meier et al. [20], Krishnapuram et al. [15]. Indeed, =-=[9, 10]-=- show good empirical performance using such ℓ1/ℓq regularization even with the joint likelihood over all variables. Our Results: General m-ary models. One (natural, but expensive) extension to graphic... |