Results 1  10
of
18
A dual coordinate descent method for largescale linear SVM.
 In ICML,
, 2008
"... Abstract In many applications, data appear with a huge number of instances as well as features. Linear Support Vector Machines (SVM) is one of the most popular tools to deal with such largescale sparse data. This paper presents a novel dual coordinate descent method for linear SVM with L1and L2l ..."
Abstract

Cited by 207 (20 self)
 Add to MetaCart
Abstract In many applications, data appear with a huge number of instances as well as features. Linear Support Vector Machines (SVM) is one of the most popular tools to deal with such largescale sparse data. This paper presents a novel dual coordinate descent method for linear SVM with L1and L2loss functions. The proposed method is simple and reaches an accurate solution in O(log(1/ )) iterations. Experiments indicate that our method is much faster than state of the art solvers such as Pegasos, TRON, SVM perf , and a recent primal coordinate descent implementation.
An Asynchronous Parallel Stochastic Coordinate Descent Algorithm
 JOURNAL OF MACHINE LEARNING RESEARCH
"... We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong convexity property and a sublinear rate (1/K) on general c ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
We describe an asynchronous parallel stochastic coordinate descent algorithm for minimizing smooth unconstrained or separably constrained functions. The method achieves a linear convergence rate on functions that satisfy an essential strong convexity property and a sublinear rate (1/K) on general convex functions. Nearlinear speedup on a multicore system can be expected if the number of processors is O(n1/2) in unconstrained optimization and O(n1/4) in the separableconstrained case, where n is the number of variables. We describe results from implementation on 40core processors.
Asynchronous stochastic coordinate descent: Parallelism and convergence properties
"... ar ..."
(Show Context)
Non–Asymptotic Convergence Analysis of Inexact Gradient Methods for Machine Learning Without Strong Convexity
, 2014
"... ..."
Largescale randomizedcoordinate descent methods with nonseparable
"... linear constraints ..."
(Show Context)
Multicore Structural SVM Training
"... Abstract. Many problems in natural language processing and computer vision can be framed as structured prediction problems. Structural support vector machines (SVM) is a popular approach for training structured predictors, where learning is framed as an optimization problem. Most structural SVM sol ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Many problems in natural language processing and computer vision can be framed as structured prediction problems. Structural support vector machines (SVM) is a popular approach for training structured predictors, where learning is framed as an optimization problem. Most structural SVM solvers alternate between a model update phase and an inference phase (which predicts structures for all training examples). As structures become more complex, inference becomes a bottleneck and thus slows down learning considerably. In this paper, we propose a new learning algorithm for structural SVMs called DEMIDCD that extends the dual coordinate descent approach by decoupling the model update and inference phases into different threads. We take advantage of multicore hardware to parallelize learning with minimal synchronization between the model update and the inference phases. We prove that our algorithm not only converges but also fully utilizes all available processors to speed up learning, and validate our approach on two realworld NLP problems: partofspeech tagging and relation extraction. In both cases, we show that our algorithm utilizes all available processors to speed up learning and achieves competitive performance. For example, it achieves a relative duality gap of 1 % on a POS tagging problem in 192 seconds using 16 threads, while a standard implementation of a multithreaded dual coordinate descent algorithm with the same number of threads requires more than 600 seconds to reach a solution of the same quality. 1
Incremental and Decremental Training for Linear Classification
"... In classification, if a small number of instances is added or removed, incremental and decremental techniques can be applied to quickly update the model. However, the design of incremental and decremental algorithms involves many considerations. In this paper, we focus on linear classifiers includin ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
In classification, if a small number of instances is added or removed, incremental and decremental techniques can be applied to quickly update the model. However, the design of incremental and decremental algorithms involves many considerations. In this paper, we focus on linear classifiers including logistic regression and linear SVM because of their simplicity over kernel or other methods. By applying a warm start strategy, we investigate issues such as using primal or dual formulation, choosing optimization methods, and creating practical implementations. Through theoretical analysis and practical experiments, we conclude that a warm start setting on a highorder optimization method for primal formulations is more suitable than others for incremental and decremental learning of linear classification.
Scalable Exemplar Clustering and Facility Location via Augmented Block Coordinate Descent with Column Generation
"... Abstract In recent years exemplar clustering has become a popular tool for applications in document and video summarization, active learning, and clustering with general similarity, where cluster centroids are required to be a subset of the data samples rather than their linear combinations. The pr ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract In recent years exemplar clustering has become a popular tool for applications in document and video summarization, active learning, and clustering with general similarity, where cluster centroids are required to be a subset of the data samples rather than their linear combinations. The problem is also wellknown as facility location in the operations research literature. While the problem has welldeveloped convex relaxation with approximation and recovery guarantees, its number of variables grows quadratically with the number of samples. Therefore, stateoftheart methods can hardly handle more than 10 4 samples (i.e. 10 8 variables). In this work, we propose an AugmentedLagrangian with Block Coordinate Descent (ALBCD) algorithm that utilizes problem structure to obtain closedform solution for each block subproblem, and exploits lowrank representation of the dissimilarity matrix to search active columns without computing the entire matrix. Experiments show our approach to be orders of magnitude faster than existing approaches and can handle problems of up to 10 6 samples. We also demonstrate successful applications of the algorithm on worldscale facility location, document summarization and active learning.
Proximal QuasiNewton for Computationally Intensive `1regularized Mestimators
"... We consider the class of optimization problems arising from computationally intensive `1regularized Mestimators, where the function or gradient values are very expensive to compute. A particular instance of interest is the `1regularized MLE for learning Conditional Random Fields (CRFs), which ar ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider the class of optimization problems arising from computationally intensive `1regularized Mestimators, where the function or gradient values are very expensive to compute. A particular instance of interest is the `1regularized MLE for learning Conditional Random Fields (CRFs), which are a popular class of statistical models for varied structured prediction problems such as sequence labeling, alignment, and classification with label taxonomy. `1regularized MLEs for CRFs are particularly expensive to optimize since computing the gradient values requires an expensive inference step. In this work, we propose the use of a carefully constructed proximal quasiNewton algorithm for such computationally intensive Mestimation problems, where we employ an aggressive active set selection technique. In a key contribution of the paper, we show that the proximal quasiNewton method is provably superlinearly convergent, even in the absence of strong convexity, by leveraging a restricted variant of strong convexity. In our experiments, the proposed algorithm converges considerably faster than current stateoftheart on the problems of sequence labeling and hierarchical classification. 1
PrimalDual Rates and Certificates ETH Zürich, Switzerland
"... Abstract We propose an algorithmindependent framework to equip existing optimization methods with primaldual certificates. Such certificates and corresponding rate of convergence guarantees are important for practitioners to diagnose progress, in particular in machine learning applications. We ob ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract We propose an algorithmindependent framework to equip existing optimization methods with primaldual certificates. Such certificates and corresponding rate of convergence guarantees are important for practitioners to diagnose progress, in particular in machine learning applications. We obtain new primaldual convergence rates, e.g., for the Lasso as well as many L 1 , Elastic Net, group Lasso and TVregularized problems. The theory applies to any normregularized generalized linear model. Our approach provides efficiently computable duality gaps which are globally defined, without modifying the original problems in the region of interest.