Results 1 
5 of
5
Stochastic dual coordinate ascent with adaptive probabilities. ICML 2015. [2] Shai ShalevShwartz and Tong Zhang. Stochastic dual coordinate ascent methods for regularized loss
"... This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems. Our modification consists in allowing the method adaptively change the probability distribution over the dual variables throughout the ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems. Our modification consists in allowing the method adaptively change the probability distribution over the dual variables throughout the iterative process. AdaSDCA achieves provably better complexity bound than SDCA with the best fixed probability distribution, known as importance sampling. However, it is of a theoretical character as it is expensive to implement. We also propose AdaSDCA+: a practical variant which in our experiments outperforms existing nonadaptive methods. 1.
Accelerated Coordinate Descent with Adaptive Coordinate Frequencies
"... Coordinate descent (CD) algorithms have become the method of choice for solving a number of machine learning tasks. They are particularly popular for training linear models, including linear support vector machine classification, LASSO regression, and logistic regression. We propose an extension of ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Coordinate descent (CD) algorithms have become the method of choice for solving a number of machine learning tasks. They are particularly popular for training linear models, including linear support vector machine classification, LASSO regression, and logistic regression. We propose an extension of the CD algorithm, called the adaptive coordinate frequencies (ACF) method. This modified CD scheme does not treat all coordinates equally, in that it does not pick all coordinates equally often for optimization. Instead the relative frequencies of coordinates are subject to online adaptation. The resulting optimization scheme can result in significant speedups. We demonstrate the usefulness of our approach on a number of large scale machine learning problems.
LMCMA: an Alternative to LBFGS for Large Scale Blackbox Optimization
"... The limited memory BFGS method (LBFGS) of Liu and Nocedal (1989) is often considered to be the method of choice for continuous optimization when first and/or second order information is available. However, the use of LBFGS can be complicated in a blackbox scenario where gradient information i ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The limited memory BFGS method (LBFGS) of Liu and Nocedal (1989) is often considered to be the method of choice for continuous optimization when first and/or second order information is available. However, the use of LBFGS can be complicated in a blackbox scenario where gradient information is not available and therefore should be numerically estimated. The accuracy of this estimation, obtained by finite difference methods, is often problemdependent that may lead to premature convergence of the algorithm. In this paper, we demonstrate an alternative to LBFGS, the limited memory CovarianceMatrix Adaptation Evolution Strategy (LMCMA) proposed by Loshchilov (2014). The LMCMA is a stochastic derivativefree algorithm for numerical optimization of nonlinear, nonconvex optimization problems. Inspired by the LBFGS, the LMCMA samples candidate solutions according to a covariance matrix reproduced from m direction vectors selected during the optimization process. The decomposition of the covariance matrix into Cholesky factors allows to reduce the memory complexity to O(mn), where n is the number of decision variables. The time complexity of sampling one candidate solution is also O(mn), but scales as only about 25 scalarvector multiplications in practice. The algorithm has an important property of invariance w.r.t. strictly increasing transformations of the objective function, such transformations do not compromise its ability to approach the optimum. The LMCMA outperforms the original CMAES and its large scale versions on nonseparable illconditioned problems with a factor increasing with problem dimension. Invariance properties of the algorithm do not prevent it from demonstrating a comparable performance to LBFGS on nontrivial large scale smooth and nonsmooth optimization problems.
Coordinate Descent with Online Adaptation of Coordinate Frequencies
, 2014
"... ar ..."
(Show Context)
A Competitive DivideandConquer Algorithm for Unconstrained LargeScale BlackBox Optimization
"... This paper proposes a competitive divideandconquer algorithm for solving largescale blackbox optimization problems, where there are thousands of decision variables, and the algebraic models of the problems are unavailable. We focus on problems that are partially additively separable, since this ..."
Abstract
 Add to MetaCart
This paper proposes a competitive divideandconquer algorithm for solving largescale blackbox optimization problems, where there are thousands of decision variables, and the algebraic models of the problems are unavailable. We focus on problems that are partially additively separable, since this type of problem can be further decomposed into a number of smaller independent subproblems. The proposed algorithm addresses two important issues in solving largescale blackbox optimization: (1) the identification of the independent subproblems without explicitly knowing the formula of the objective function and (2) the optimization of the identified blackbox subproblems. First, a Global Differential Grouping (GDG) method is proposed to identify the independent subproblems. Then, a variant of the Covariance Matrix Adaptation Evolution Strategy (CMAES) is adopted to solve the subproblems resulting from its rotation invariance property. GDG and CMAES work together under the cooperative coevolution framework. The resultant algorithm named CCGDGCMAES is then evaluated on the CECâ€™2010 largescale global optimization (LSGO) benchmark functions, which have a thousand decision variables and blackbox objective functions. The experimental results show that on most test functions evaluated in this study, GDG manages to obtain an ideal partition of the index set of the decision variables, and CCGDGCMAES outperforms the stateoftheart results. Moreover, the competitive performance of the wellknown CMAES is extended from lowdimensional