## An Accelerated Gradient Method for Trace Norm Minimization

Citations: | 55 - 5 self |

### BibTeX

@MISC{Ji_anaccelerated,

author = {Shuiwang Ji and Jieping Ye},

title = {An Accelerated Gradient Method for Trace Norm Minimization},

year = {}

}

### OpenURL

### Abstract

We consider the minimization of a smooth loss function regularized by the trace norm of the matrix variable. Such formulation finds applications in many machine learning tasks including multi-task learning, matrix classification, and matrix completion. The standard semidefinite programming formulation for this problem is computationally expensive. In addition, due to the non-smooth nature of the trace norm, the optimal first-order black-box method for solving such class of problems converges as O ( 1 √), where k is the k iteration counter. In this paper, we exploit the special structure of the trace norm, based on which we propose an extended gradient algorithm that converges as O ( 1 k). We further propose an accelerated gradient algorithm, which achieves the optimal convergence rate of O ( 1 k 2) for smooth problems. Experiments on multi-task learning problems demonstrate the efficiency of the proposed algorithms. 1.

### Citations

741 |
Nonlinear Programming. Athena Scientific
- Bertsekas
- 1999
(Show Context)
Citation Context ...d for Trace Norm Minimization rate for these algorithms is difficult to guarantee. Due to the non-smooth nature of the trace norm, a simple approach to solve these problems is the subgradient method (=-=Bertsekas, 1999-=-; Nesterov, 2003), which converges as O( 1 √ ) where k is the iteration counter. k It is known from the complexity theory of convex optimization (Nemirovsky & Yudin, 1983; Nesterov, 2003) that this co... |

365 | Fast iterative shrinkage-thresholding algorithm for linear inverse problems
- Beck, Teboulle
- 2009
(Show Context)
Citation Context ...gence rate for smooth problems. Hence, the nonsmoothness effect of the trace norm regularization is effectively removed. The proposed algorithms extend the algorithms in (Nesterov, 2007; Tseng, 2008; =-=Beck & Teboulle, 2009-=-) to the matrix case. Experiments on multi-task learning problems demonstrate the efficiency of the proposed algorithms in comparison with existing ones. Note that while the present paper was under re... |

315 | Exact matrix completion via convex optimization
- Candés, Recht
- 2009
(Show Context)
Citation Context ... spectral norm. A number of recent work has shown that the low rank solution can be recovered exactly via minimizing the trace norm under certain conditions (Recht et al., 2008a; Recht et al., 2008b; =-=Candés & Recht, 2008-=-). In practice, the trace norm relaxation has been shown to yield low-rank solutions and it has been used widely in many scenarios. In (Srebro et al., 2005; Rennie & Srebro, 2005; Weimer et al., 2008a... |

252 | Smooth minimization of non-smooth functions
- Nesterov
(Show Context)
Citation Context ...remains the same. Note that the sequence of objective values generated by the accelerated scheme may increase. It, however, can be made non-increasing by a simple modification of the algorithm as in (=-=Nesterov, 2005-=-). 4.2. Convergence Analysis We show in the following that by performing the gradient step at the search point Zk instead of at the approximate solution Wk, the convergence rate of the gradient method... |

215 | Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization
- Recht, Fazel, et al.
- 2010
(Show Context)
Citation Context ...of the rank function over the unit ball of spectral norm. A number of recent work has shown that the low rank solution can be recovered exactly via minimizing the trace norm under certain conditions (=-=Recht et al., 2008-=-a; Recht et al., 2008b; Candés & Recht, 2008). In practice, the trace norm relaxation has been shown to yield low-rank solutions and it has been used widely in many scenarios. In (Srebro et al., 2005;... |

197 |
Introductory lectures on convex optimization: A basic course, volume 87
- Nesterov, Nesterov
- 2004
(Show Context)
Citation Context ...Minimization rate for these algorithms is difficult to guarantee. Due to the non-smooth nature of the trace norm, a simple approach to solve these problems is the subgradient method (Bertsekas, 1999; =-=Nesterov, 2003-=-), which converges as O( 1 √ ) where k is the iteration counter. k It is known from the complexity theory of convex optimization (Nemirovsky & Yudin, 1983; Nesterov, 2003) that this convergence rate i... |

196 | A singular value thresholding algorithm for matrix completion. preprint
- Cai, Candes, et al.
(Show Context)
Citation Context ... In practice, the trace norm relaxation has been shown to yield low-rank solutions and it has been used widely in many scenarios. In (Srebro et al., 2005; Rennie & Srebro, 2005; Weimer et al., 2008a; =-=Cai et al., 2008-=-; Ma et al., 2008) the matrix completion problem was formulated as a trace norm minimization problem. In problems where multiple related tasks are learned simultaneously, the models for different task... |

191 |
Gradient methods for minimizing composite objective function. Core discussion paper 2007/96
- Nesterov
- 2007
(Show Context)
Citation Context ..., which is the optimal convergence rate for smooth problems. Hence, the nonsmoothness effect of the trace norm regularization is effectively removed. The proposed algorithms extend the algorithms in (=-=Nesterov, 2007-=-; Tseng, 2008; Beck & Teboulle, 2009) to the matrix case. Experiments on multi-task learning problems demonstrate the efficiency of the proposed algorithms in comparison with existing ones. Note that ... |

170 |
Problem complexity and method efficiency in optimization. Nauka (published in English by
- Nemirovskii, Yudin
- 1983
(Show Context)
Citation Context ...se problems is the subgradient method (Bertsekas, 1999; Nesterov, 2003), which converges as O( 1 √ ) where k is the iteration counter. k It is known from the complexity theory of convex optimization (=-=Nemirovsky & Yudin, 1983-=-; Nesterov, 2003) that this convergence rate is already optimal for nonsmooth optimization under the first-order black-box model, where only the function values and first-order derivatives are used. I... |

153 | A rank minimization heuristic with application to minimum order system approximation
- Fazel, Hindi, et al.
- 2001
(Show Context)
Citation Context ...e rank function is the trace norm (nuclear norm) Appearing in Proceedings of the 26 th International Conference on Machine Learning, Montreal, Canada, 2009. Copyright 2009 by the author(s)/owner(s). (=-=Fazel et al., 2001-=-), defined as the sum of the singular values of the matrix, since it is the convex envelope of the rank function over the unit ball of spectral norm. A number of recent work has shown that the low ran... |

141 | Convex multi-task feature learning
- Argyriou, Evgeniou, et al.
- 2008
(Show Context)
Citation Context ...strained to share certain information. Recently, this constraint has been expressed as the trace norm regularization on the weight matrix in the context of multitask learning (Abernethy et al., 2006; =-=Argyriou et al., 2008-=-; Abernethy et al., 2009; Obozinski et al., 2009), multi-class classification (Amit et al., 2007), and multivariate linear regression (Yuan et al., 2007; Lu et al., 2008). For two-dimensional data suc... |

134 | Fast maximum margin matrix factorization for collaborative prediction
- Rennie, Srebro
- 2005
(Show Context)
Citation Context ...; Recht et al., 2008b; Candés & Recht, 2008). In practice, the trace norm relaxation has been shown to yield low-rank solutions and it has been used widely in many scenarios. In (Srebro et al., 2005; =-=Rennie & Srebro, 2005-=-; Weimer et al., 2008a; Cai et al., 2008; Ma et al., 2008) the matrix completion problem was formulated as a trace norm minimization problem. In problems where multiple related tasks are learned simul... |

106 |
A method for solving a convex programming problem with convergence rate o(1/k 2
- Nesterov
- 1983
(Show Context)
Citation Context ... problems. This results in an extended gradient algorithm with the same convergence rate of O( 1 k ) as that for smooth problems. Following the Nesterov’s method for accelerating the gradient method (=-=Nesterov, 1983-=-; Nesterov, 2003), we show that the extended gradient algorithm can be further accelerated to converge as O( 1 k2 ), which is the optimal convergence rate for smooth problems. Hence, the nonsmoothness... |

91 |
Fixed point and bregman iterative methods for matrix rank minimization
- Ma, Goldfarb, et al.
(Show Context)
Citation Context ...trace norm relaxation has been shown to yield low-rank solutions and it has been used widely in many scenarios. In (Srebro et al., 2005; Rennie & Srebro, 2005; Weimer et al., 2008a; Cai et al., 2008; =-=Ma et al., 2008-=-) the matrix completion problem was formulated as a trace norm minimization problem. In problems where multiple related tasks are learned simultaneously, the models for different tasks can be constrai... |

81 |
Joint covariate selection and joint subspace selection for multiple classification problems
- Obozinski, Taskar, et al.
- 2010
(Show Context)
Citation Context ..., this constraint has been expressed as the trace norm regularization on the weight matrix in the context of multitask learning (Abernethy et al., 2006; Argyriou et al., 2008; Abernethy et al., 2009; =-=Obozinski et al., 2009-=-), multi-class classification (Amit et al., 2007), and multivariate linear regression (Yuan et al., 2007; Lu et al., 2008). For two-dimensional data such as images, the matrix classification formulati... |

71 |
On accelerated proximal gradient methods for convex-concave optimization
- Tseng
- 2008
(Show Context)
Citation Context ...ptimal convergence rate for smooth problems. Hence, the nonsmoothness effect of the trace norm regularization is effectively removed. The proposed algorithms extend the algorithms in (Nesterov, 2007; =-=Tseng, 2008-=-; Beck & Teboulle, 2009) to the matrix case. Experiments on multi-task learning problems demonstrate the efficiency of the proposed algorithms in comparison with existing ones. Note that while the pre... |

68 | An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems,” Pacific - Toh, Yun - 2010 |

63 | Uncovering shared structures in multiclass classification
- Amit, Fink, et al.
(Show Context)
Citation Context ...m regularization on the weight matrix in the context of multitask learning (Abernethy et al., 2006; Argyriou et al., 2008; Abernethy et al., 2009; Obozinski et al., 2009), multi-class classification (=-=Amit et al., 2007-=-), and multivariate linear regression (Yuan et al., 2007; Lu et al., 2008). For two-dimensional data such as images, the matrix classification formulation (Tomioka & Aihara, 2007; Bach, 2008) applies ... |

50 | A new approach to collaborative filtering: operator estimation with spectral regularization
- Abernethy, Bach, et al.
- 2008
(Show Context)
Citation Context ...in information. Recently, this constraint has been expressed as the trace norm regularization on the weight matrix in the context of multitask learning (Abernethy et al., 2006; Argyriou et al., 2008; =-=Abernethy et al., 2009-=-; Obozinski et al., 2009), multi-class classification (Amit et al., 2007), and multivariate linear regression (Yuan et al., 2007; Lu et al., 2008). For two-dimensional data such as images, the matrix ... |

41 | Consistency of trace norm minimization
- Bach
(Show Context)
Citation Context ...n (Amit et al., 2007), and multivariate linear regression (Yuan et al., 2007; Lu et al., 2008). For two-dimensional data such as images, the matrix classification formulation (Tomioka & Aihara, 2007; =-=Bach, 2008-=-) applies a weight matrix, regularized by its trace norm, on the data. It was shown (Tomioka & Aihara, 2007) that such formulation leads to improved performance over conventional methods. A practical ... |

31 |
Low-rank matrix factorization with attributes
- Abernethy, Bach, et al.
- 2006
(Show Context)
Citation Context ...fferent tasks can be constrained to share certain information. Recently, this constraint has been expressed as the trace norm regularization on the weight matrix in the context of multitask learning (=-=Abernethy et al., 2006-=-; Argyriou et al., 2008; Abernethy et al., 2009; Obozinski et al., 2009), multi-class classification (Amit et al., 2007), and multivariate linear regression (Yuan et al., 2007; Lu et al., 2008). For t... |

28 |
COFI RANK - maximum margin matrix factorization for collaborative ranking
- Weimer, Karatzoglou, et al.
(Show Context)
Citation Context ...Candés & Recht, 2008). In practice, the trace norm relaxation has been shown to yield low-rank solutions and it has been used widely in many scenarios. In (Srebro et al., 2005; Rennie & Srebro, 2005; =-=Weimer et al., 2008-=-a; Cai et al., 2008; Ma et al., 2008) the matrix completion problem was formulated as a trace norm minimization problem. In problems where multiple related tasks are learned simultaneously, the models... |

16 | Improving maximum margin matrix factorization
- Weimer, Karatzoglou, et al.
(Show Context)
Citation Context ...Candés & Recht, 2008). In practice, the trace norm relaxation has been shown to yield low-rank solutions and it has been used widely in many scenarios. In (Srebro et al., 2005; Rennie & Srebro, 2005; =-=Weimer et al., 2008-=-a; Cai et al., 2008; Ma et al., 2008) the matrix completion problem was formulated as a trace norm minimization problem. In problems where multiple related tasks are learned simultaneously, the models... |

9 | Convex optimization methods for dimension reduction and coefficient estimation in multivariate linear regression, Arxiv preprint arXiv:0904.0691
- Lu, Monteiro, et al.
- 2009
(Show Context)
Citation Context ...(Abernethy et al., 2006; Argyriou et al., 2008; Abernethy et al., 2009; Obozinski et al., 2009), multi-class classification (Amit et al., 2007), and multivariate linear regression (Yuan et al., 2007; =-=Lu et al., 2008-=-). For two-dimensional data such as images, the matrix classification formulation (Tomioka & Aihara, 2007; Bach, 2008) applies a weight matrix, regularized by its trace norm, on the data. It was shown... |

9 | Classifying matrices with a spectral regularization
- Tomioka, Aihara
- 2007
(Show Context)
Citation Context ...ulti-class classification (Amit et al., 2007), and multivariate linear regression (Yuan et al., 2007; Lu et al., 2008). For two-dimensional data such as images, the matrix classification formulation (=-=Tomioka & Aihara, 2007-=-; Bach, 2008) applies a weight matrix, regularized by its trace norm, on the data. It was shown (Tomioka & Aihara, 2007) that such formulation leads to improved performance over conventional methods. ... |

5 |
Necessary and sufficient condtions for success of the nuclear norm heuristic for rank minimization
- Recht, Xu, et al.
- 2008
(Show Context)
Citation Context ...of the rank function over the unit ball of spectral norm. A number of recent work has shown that the low rank solution can be recovered exactly via minimizing the trace norm under certain conditions (=-=Recht et al., 2008-=-a; Recht et al., 2008b; Candés & Recht, 2008). In practice, the trace norm relaxation has been shown to yield low-rank solutions and it has been used widely in many scenarios. In (Srebro et al., 2005;... |