## Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity

### Cached

### Download Links

Citations: | 65 - 9 self |

### BibTeX

@MISC{Kim_tree-guidedgroup,

author = {Seyoung Kim and Eric P. Xing},

title = {Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity},

year = {}

}

### OpenURL

### Abstract

We consider the problem of learning a sparse multi-task regression, where the structure in the outputs can be represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity. Our goal is to recover the common set of relevant inputs for each output cluster. Assuming that the tree structure is available as prior knowledge, we formulate this problem as a new multi-task regularized regression called tree-guided group lasso. Our structured regularization is based on a grouplasso penalty, where groups are defined with respect to the tree structure. We describe a systematic weighting scheme for the groups in the penalty such that each output variable is penalized in a balanced manner even if the groups overlap. We present an efficient optimization method that can handle a largescale problem. Using simulated and yeast datasets, we demonstrate that our method shows a superior performance in terms of both prediction errors and recovery of true sparsity patterns compared to other methods for multi-task learning. 1.

### Citations

2066 | Regression shrinkage and selection via the Lasso
- Tibshirani
- 1996
(Show Context)
Citation Context ...roblem from inputs to outputs. In the simplest case where the output is a univariate continuous or discrete response (e.g., a gene expression measurement for a single gene), techniques such as lasso (=-=Tibshirani, 1996-=-) or L1-regularized logistic regression (Ng, 2004; Wainwright et al., 2006) have been developed to identify a parsimonious subset of covariates that determine the outputs. However, in the problem of m... |

558 | Model selection and estimation in regression with grouped variables
- Yuan, Lin
- 2006
(Show Context)
Citation Context ...ting, sparse regression methods that extend lasso have been proposed to allow the recovered relevant inputs to reflect the underlying structural information among the inputs. For example, group lasso =-=[12]-=- assumed that the groupings of the inputs are available as a prior knowledge, and used groups of inputs instead of individual inputs as a unit of variable selection by applying an L1 norm of the lasso... |

417 | Regularization and variable selection via the elastic net
- Zou, Hastie
(Show Context)
Citation Context ...2, the penalty term in Equation (4) can be written as ∑ ∑ wv‖β j Gv‖ ∑ [ ( = s3 |β 2 j 1| + |β j ) (√ 2| + g3 (β j 1) 2 + (β j 2) 2 )] . (5) j v∈V j v∈V j This is equivalent to an elastic-net penalty =-=[15]-=-, where β j 1 and β j 2 can be selected either jointly or separately according to the weights s3 and g3. When s3 = 0, the penalty in Equation (5) becomes equivalent to a ridge-regression penalty, wher... |

160 | Consistency of the group lasso and multiple kernel learning
- Bach
(Show Context)
Citation Context ... 4 Parameter Estimation In order to estimate the parameters in tree-guided group lasso, we use the alternative formulation of the problem in Equation (4) that was previously introduce for group lasso =-=[1]-=-, given as ˆB T = argmin ∑ (yk − Xβk) T ( ∑ ∑ · (yk − Xβk) + λ wv‖β j Gv‖ ) 2 . 2 k Since the L1/L2 norm in the above equation is a non-smooth function, it is not trivial to optimize it directly. Usin... |

145 | Convex multi-task feature learning
- Argyriou, Evgeniou, et al.
- 2008
(Show Context)
Citation Context ...n is a nonsmooth function, it is not trivial to optimize it directly. We make use of the fact that the variational formulation of a mixed-norm regularization is equal to a weighted L2 regularization (=-=Argyriou et al., 2008-=-) as follows: (∑ ∑ j where ∑ holds for v∈V j wv‖β j Gv‖ ) 2 2 ≤ ∑ ∑ w2 v‖β j j v∈V dj,v Gv‖2 2 ∑ v dj,v = 1, dj,v ≥ 0, ∀j, v, and the equality dj,v = ∑ j wv‖β j,v‖ 2 ∑ v∈V wv‖β j,v‖ 2 . (6) Thus, we c... |

142 | Feature selection, l1 vs. l2 regularization, and rotational invariance
- Ng
- 2004
(Show Context)
Citation Context ... the output is a univariate continuous or discrete response (e.g., a gene expression measurement for a single gene), techniques such as lasso (Tibshirani, 1996) or L1-regularized logistic regression (=-=Ng, 2004-=-; Wainwright et al., 2006) have been developed to identify a parsimonious subset of covariates that determine the outputs. However, in the problem of multi-task regression, where the output is a multi... |

117 | Group Lasso with overlap and graph Lasso
- Jacob, Obozinski, et al.
(Show Context)
Citation Context ...n (a) in tree-guided group lasso. so that the child nodes enter the set of relevant inputs only if its parent node does. The situations with arbitrary overlapping groups have been considered as well (=-=Jacob et al., 2009-=-; Jenatton et al., 2009). Many of these ideas related to group lasso in a univariate-output regression may be directly applied to multi-task regression problems. The L1/L2 penalty of group lasso has b... |

99 | Structured variable selection with sparsity-inducing norms
- Jenatton, Audibert, et al.
(Show Context)
Citation Context ...ree structure, and designed groups so that the child nodes enter the set of relevant inputs only if its parent node does. The situations with arbitrary overlapping groups have been considered as well =-=[4, 5]-=-. Many of these ideas related to group lasso in a univariate regression may be directly applied to the multi-task regression problems. The L1/L2 penalty of group lasso has been used to recover inputs ... |

94 | Grouped and hierarchical model selection through composite absolute penalties. Annals of Statistics
- Zhao, Yu
- 2006
(Show Context)
Citation Context ...so has been extended to a more general setting to encode prior knowledge on various sparsity patterns, where the key idea is to allow the groups to have an overlap. The hierarchical selection method (=-=Zhao et al., 2008-=-) assumed that the input variables form a tree structure, and designed groupsTree-Guided Group Lasso for Multi-task Regression Inputs Outputs (tasks) (a) ☛ ✟ Gv5 = {β ✡ ✠ j 1 , βj 2 , βj 3 } ☛ ✟ Gv4 ... |

86 |
Joint covariate selection and joint subspace selection for multiple classification problems
- Obozinski, Taskar, et al.
- 2010
(Show Context)
Citation Context ...y of group lasso has been used to recover inputs that are jointly relevant to all of the outputs, or tasks, where the L2 norm is applied to the 2outputs instead of groups of inputs as in group lasso =-=[8, 7]-=-. Although the L1/L2 penalty has been shown to be effective in a joint covariate selection in multi-task learning, it assumed that all of the tasks are equally related with each other and share the sa... |

76 | High-dimensional graphical model selection using ℓ1-regularized logistic regression
- Wainwright, Ravikumar, et al.
- 2006
(Show Context)
Citation Context ...t is a univariate continuous or discrete response (e.g., a gene expression measurement for a single gene), techniques such as lasso (Tibshirani, 1996) or L1-regularized logistic regression (Ng, 2004; =-=Wainwright et al., 2006-=-) have been developed to identify a parsimonious subset of covariates that determine the outputs. However, in the problem of multi-task regression, where the output is a multivariate vector with an in... |

41 |
Integrating large-scale functional genomic data to dissect the complexity of yeast regulatory networks
- Zhu, Zhang, et al.
- 2008
(Show Context)
Citation Context ...hways, and are co-expressed as a module. Furthermore, evidence has been found that these genes within a module often share a common genetic basis that causes the variations in their expression levels =-=[14, 2]-=-. However, most of the previous approaches were based on a single-phenotype analysis that treats the multiple phenotypes as independent of each other, and there has been a lack of statistical tools th... |

28 | High-dimensional union support recovery in multivariate regression
- Obozinski, Wainwright, et al.
- 2008
(Show Context)
Citation Context ...y of group lasso has been used to recover inputs that are jointly relevant to all of the outputs, or tasks, where the L2 norm is applied to the 2outputs instead of groups of inputs as in group lasso =-=[8, 7]-=-. Although the L1/L2 penalty has been shown to be effective in a joint covariate selection in multi-task learning, it assumed that all of the tasks are equally related with each other and share the sa... |

9 |
et al. Variations in DNA elucidate molecular networks that cause disease. Nature 2008
- Chen, Zhu, et al.
(Show Context)
Citation Context ...hways, and are co-expressed as a module. Furthermore, evidence has been found that these genes within a module often share a common genetic basis that causes the variations in their expression levels =-=[14, 2]-=-. However, most of the previous approaches were based on a single-phenotype analysis that treats the multiple phenotypes as independent of each other, and there has been a lack of statistical tools th... |

1 |
Supervised harvesting of gene expression trees
- Hastie, Tibshirani, et al.
(Show Context)
Citation Context ...f genes in a regression framework, they computed averages over members of the cluster for each internal node in the tree, and used these averages as inputs, leading to a potential loss of information =-=[3]-=-. In our method, we use the original data with the clustering tree as a guide towards a structured sparsity. In our experiments, we demonstrate that our proposed method can be successfully applied to ... |