## Online Learning for Group Lasso

Citations: | 10 - 1 self |

### BibTeX

@MISC{Yang_onlinelearning,

author = {Haiqin Yang and Zenglin Xu and Irwin King and Michael R. Lyu},

title = {Online Learning for Group Lasso},

year = {}

}

### OpenURL

### Abstract

We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batch-mode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs in an online mode and scales well: at each iteration one can update the weight vector according to a closed-form solution based on the average of previous subgradients. Therefore, the proposed online algorithm can be very efficient and scalable. This is guaranteed by its low worst-case time complexity and memory cost both in the order of O(d), where d is the number of dimensions. Moreover, in order to achieve more sparsity in both the group level and the individual feature level, we successively extend our online system to efficiently solve a number of variants of sparse group lasso models. We also show that the online system is applicable to other group lasso models, such as the group lasso with overlap and graph lasso. Finally, we demonstrate the merits of our algorithm by experimenting with both synthetic and real-world datasets. 1.

### Citations

510 | Model selection and estimation in regression with grouped variables - Yuan, Lin - 2006 |

329 |
Regression shrinkage and selection via the
- Tibshirani
- 1996
(Show Context)
Citation Context ... with both synthetic and real-world datasets. 1. Introduction Group lasso (Yuan & Lin, 2006), a technique of selecting key explanatory factors in a grouped manner, is an important extension of lasso (=-=Tibshirani, 1996-=-). It has been successfully employed in a number of applications, such as birthweight prediction and gene finding (Yuan & Lin, Appearing in Proceedings of the 27 th International Conference on Machine... |

155 | Consistency of the group Lasso and multiple kernel learning - Bach |

141 | The group Lasso for logistic regression - Meier, Geer, et al. |

133 | The tradeoffs of large scale learning
- Bottou, Bousquet
- 2008
(Show Context)
Citation Context ...as been extensively studied in machine learning area in recent years (Zinkevich, 2003; Bottou & LeCun, 2003; Shalev-Shwartz & Singer, 2006; Fink et al., 2006; Amit et al., 2006; Crammer et al., 2006; =-=Bottou & Bousquet, 2007-=-; Dredze et al., 2008; Hu et al., 2009; Zhao et al., 2009). These methods can be cast into different categories. One family of online learning algorithms is based on the criterion of maximum margin (S... |

123 | Logarithmic regret algorithms for online convex optimization
- Hazan, Agarwal, et al.
- 2007
(Show Context)
Citation Context ...ve the optimal converge rate O(1/ √ T ). It would be interesting to investigate that by introducing additional assumption, whether the average regret bound can be improve to O(log(T )/T ) as that in (=-=Hazan et al., 2007-=-). The second result of Theorem 2 gives a bound for the difference between the learned weight and the optimal weight. If ¯ RT > 0, then the bound can be tighter. However, the term of ¯ RT in the bound... |

109 | Group Lasso with Overlap and Graph Lasso
- Jacob, Obozinski, et al.
- 2009
(Show Context)
Citation Context ...r be dominated by k-th order polynomial expansions of some inputs or contain categorical features which are usually represented as groups of dummy variables (Meier et al., 2008; Roth & Fischer, 2008; =-=Jacob et al., 2009-=-). Due to its advantages, group lasso has been intensively studied in statistics and machine learning (Yuan & Lin, 2006; Bach, 2008). Extensions include the group lasso for logistic regression (Meier ... |

74 | Primal-Dual Subgradient Methods for Convex Problems
- Nesterov
(Show Context)
Citation Context ..., 2009). Interested readers can read the above papers and references therein. 4. Online Learning for Group Lasso Inspired by recently developed first-order methods for optimizing composite functions (=-=Nesterov, 2009-=-) and the efAlgorithm 1 Online learning algorithm for group lasso Input: • w0 ∈ R d , and a strongly convex function h(w) with modulus 1 such that w0 = arg min w h(w) ∈ arg min Ω(w) . (4) w • Given co... |

61 | T.: Sparse online learning via truncated gradient - Langford, Li, et al. - 2009 |

61 | Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals
- Yeo, Burge
- 2004
(Show Context)
Citation Context ...on. False splice sites are sequences on the DNA which match the consensus sequence at positions 0 and 1. Removing the consensus “GT” results in a sequence length of 7 with 4 level {A, C, G, T }; see (=-=Yeo & Burge, 2004-=-) for detailed description. We follow the experimental setup in (Meier et al., 2008) and measure the performance by the maximum correlation coefficient (Yeo & Burge, 2004). The original training datas... |

59 | Confidence-weighted linear classification
- Dredze, Crammer, et al.
- 2008
(Show Context)
Citation Context ...ed in machine learning area in recent years (Zinkevich, 2003; Bottou & LeCun, 2003; Shalev-Shwartz & Singer, 2006; Fink et al., 2006; Amit et al., 2006; Crammer et al., 2006; Bottou & Bousquet, 2007; =-=Dredze et al., 2008-=-; Hu et al., 2009; Zhao et al., 2009). These methods can be cast into different categories. One family of online learning algorithms is based on the criterion of maximum margin (ShalevShwartz & Singer... |

59 | Dual averaging methods for regularized stochastic learning and online optimization. JMLR - Xiao - 2010 |

53 | Svm optimization: inverse dependence on training set size
- Shalev-Shwartz, Srebro
- 2008
(Show Context)
Citation Context ...ojected back to the constraint space if needed. An attractiveness of stochastic gradient decent methods is that their runtime may not depend at all on the number of examples (Bottou & Bousquet, 2007; =-=Shalev-Shwartz & Srebro, 2008-=-). Although various online learning algorithms have been proposed, there is no online learning algorithm developed for the group lasso yet. More recently, online learning algorithms on minimizing the ... |

45 | Large scale online learning
- Bottou, LeCun
- 2003
(Show Context)
Citation Context ...Related Work In the following, we mainly review the related work on online learning algorithms. Online learning has been extensively studied in machine learning area in recent years (Zinkevich, 2003; =-=Bottou & LeCun, 2003-=-; Shalev-Shwartz & Singer, 2006; Fink et al., 2006; Amit et al., 2006; Crammer et al., 2006; Bottou & Bousquet, 2007; Dredze et al., 2008; Hu et al., 2009; Zhao et al., 2009). These methods can be cas... |

33 |
A note on the group Lasso and a sparse group Lasso
- Friedman, Hastie, et al.
- 2010
(Show Context)
Citation Context ... algorithm. In order to seek the group lasso with more sparsity in both the group level and the individual feature level, we successfully extend the algorithm to solve the sparse group lasso problem (=-=Friedman et al., 2010-=-) and propose the enhanced sparse group lasso model. We further derive closed-form solutions to update the weight vectors in both models. Our algorithm framework can also be easily extended to solve t... |

30 | Multi-task feature learning via efficient l2, 1-norm minimization
- Liu, Ji, et al.
- 2009
(Show Context)
Citation Context ...k co-ordinate descent (Meier et al., 2008), active set algorithm (Roth & Fischer, 2008), have been proposed. Some batch-mode training methods for group lasso penalties also have been proposed, e.g., (=-=Liu et al., 2009-=-; Kowalski et al., 2009). Interested readers can read the above papers and references therein. 4. Online Learning for Group Lasso Inspired by recently developed first-order methods for optimizing comp... |

20 | Online learning meets optimization in the dual
- Shalev-Shwartz, Singer
- 2006
(Show Context)
Citation Context ...llowing, we mainly review the related work on online learning algorithms. Online learning has been extensively studied in machine learning area in recent years (Zinkevich, 2003; Bottou & LeCun, 2003; =-=Shalev-Shwartz & Singer, 2006-=-; Fink et al., 2006; Amit et al., 2006; Crammer et al., 2006; Bottou & Bousquet, 2007; Dredze et al., 2008; Hu et al., 2009; Zhao et al., 2009). These methods can be cast into different categories. On... |

16 | Algorithms for sparse linear classifiers in the massive data setting - Balakrishnan, Madigan |

16 | Efficient learning using forward-backward splitting. NIPS
- Duchi, Singer
- 2009
(Show Context)
Citation Context ...cently, online learning algorithms on minimizing the summation of data fitting and L1-regularization have been proposed to yield sparse solutions (Balakrishnan & Madigan, 2008; Langford et al., 2009; =-=Duchi & Singer, 2009-=-; Xiao, 2009a). These algorithms are very promising in real-world applications, especially for training largescale datasets. In (Langford et al., 2009), a truncated gradient method is proposed to trun... |

13 | Accelerated gradient methods for stochastic optimization and online learning
- Hu, Kwok, et al.
- 2009
(Show Context)
Citation Context ...g area in recent years (Zinkevich, 2003; Bottou & LeCun, 2003; Shalev-Shwartz & Singer, 2006; Fink et al., 2006; Amit et al., 2006; Crammer et al., 2006; Bottou & Bousquet, 2007; Dredze et al., 2008; =-=Hu et al., 2009-=-; Zhao et al., 2009). These methods can be cast into different categories. One family of online learning algorithms is based on the criterion of maximum margin (ShalevShwartz & Singer, 2006; Dredze et... |

8 | An efficient projection for l1,∞ regularization
- Quattoni, Carreras, et al.
- 2009
(Show Context)
Citation Context ...= 1 for all the groups, the group lasso is equivalent to the lasso. Remark 2. To introduce group sparsity, it is also possible to impose other joint regularization on the weight, e.g., the L1,∞-norm (=-=Quattoni et al., 2009-=-). Remark 3. The group lasso regularizer has also been extended to use in Multiple Kernel Learning (MKL) (Bach, 2008). The consistency analysis on the connection between the group lasso and MKL can be... |

1 |
Duol: A double updating approach for online learning
- Zhao, Hoi, et al.
- 2009
(Show Context)
Citation Context ...years (Zinkevich, 2003; Bottou & LeCun, 2003; Shalev-Shwartz & Singer, 2006; Fink et al., 2006; Amit et al., 2006; Crammer et al., 2006; Bottou & Bousquet, 2007; Dredze et al., 2008; Hu et al., 2009; =-=Zhao et al., 2009-=-). These methods can be cast into different categories. One family of online learning algorithms is based on the criterion of maximum margin (ShalevShwartz & Singer, 2006; Dredze et al., 2008), which ... |