Results 1 
2 of
2
On optimal probabilities in stochastic coordinate descent methods. arXiv:1310.3438
, 2013
"... Abstract We propose and analyze a new parallel coordinate descent method'NSyncin which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen nonuniformly. We derive convergence rates under a strong convexity assumption, and comment on ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
Abstract We propose and analyze a new parallel coordinate descent method'NSyncin which at each iteration a random subset of coordinates is updated, in parallel, allowing for the subsets to be chosen nonuniformly. We derive convergence rates under a strong convexity assumption, and comment on how to assign probabilities to the sets to optimize the bound. The complexity and practical performance of the method can outperform its uniform variant by an order of magnitude. Surprisingly, the strategy of updating a single randomly selected coordinate per iterationwith optimal probabilitiesmay require less iterations, both in theory and practice, than the strategy of updating all coordinates at every iteration.
Fercoq, O.: SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization
, 1502
"... Abstract We propose a new algorithm for minimizing regularized empirical loss: Stochastic Dual Newton Ascent (SDNA). Our method is dual in nature: in each iteration we update a random subset of the dual variables. However, unlike existing methods such as stochastic dual coordinate ascent, SDNA is c ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract We propose a new algorithm for minimizing regularized empirical loss: Stochastic Dual Newton Ascent (SDNA). Our method is dual in nature: in each iteration we update a random subset of the dual variables. However, unlike existing methods such as stochastic dual coordinate ascent, SDNA is capable of utilizing all local curvature information contained in the examples, which leads to striking improvements in both theory and practice sometimes by orders of magnitude. In the special case when an L2regularizer is used in the primal, the dual problem is a concave quadratic maximization problem plus a separable term. In this regime, SDNA in each step solves a proximal subproblem involving a random principal submatrix of the Hessian of the quadratic function; whence the name of the method.