Results 1  10
of
23
DADMM: A communicationefficient distributed algorithm for separable optimization
 IEEE Trans. Sig. Proc
, 2013
"... ar ..."
Performance of a Distributed Stochastic Approximation Algorithm
, 2012
"... In this paper, a distributed stochastic approximation algorithm is studied. Applications of such algorithms include decentralized estimation, optimization, control or computing. The algorithm consists in two steps: a local step, where each node in a network updates a local estimate using a stochasti ..."
Abstract

Cited by 11 (7 self)
 Add to MetaCart
(Show Context)
In this paper, a distributed stochastic approximation algorithm is studied. Applications of such algorithms include decentralized estimation, optimization, control or computing. The algorithm consists in two steps: a local step, where each node in a network updates a local estimate using a stochastic approximation algorithm with decreasing step size, and a gossip step, where a node computes a local weighted average between its estimates and those of its neighbors. Convergence of the estimates toward a consensus is established under weak assumptions. The approach relies on two main ingredients: the existence of a Lyapunov function for the mean field in the agreement subspace, and a contraction property of the random matrices of weights in the subspace orthogonal to the agreement subspace. A second order analysis of the algorithm is also performed under the form of a Central Limit Theorem. The Polyakaveraged version of the algorithm is also considered.
MultiLabel Learning with PRO Loss
"... Multilabel learning methods assign multiple labels to one object. In practice, in addition to differentiating relevant labels from irrelevant ones, it is often desired to rank the relevant labels for an object, whereas the rankings of irrelevant labels are not important. Such a requirement, however ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
Multilabel learning methods assign multiple labels to one object. In practice, in addition to differentiating relevant labels from irrelevant ones, it is often desired to rank the relevant labels for an object, whereas the rankings of irrelevant labels are not important. Such a requirement, however, cannot be met because most existing methods were designed to optimize existing criteria, yet there is no criterion which encodes the aforementioned requirement. In this paper, we present a new criterion, PRO LOSS, concerning the prediction on all labels as well as the rankings of only relevant labels. We then propose ProSVM which optimizes PRO LOSS efficiently using alternating direction method of multipliers. We further improve its efficiency with an upper approximation that reduces the number of constraints from O(T 2) to O(T), where T is the number of labels. Experiments show that our proposals are not only superior on PRO LOSS, but also highly competitive on existing evaluation criteria.
Efficient Distributed Linear Classification Algorithms via the Alternating Direction Method of Multipliers
"... Linear classification has demonstrated success in many areas of applications. Modern algorithms for linear classification can train reasonably good models while going through the data in only tens of rounds. However, large data often does not fit in the memory of a single machine, which makes the bo ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Linear classification has demonstrated success in many areas of applications. Modern algorithms for linear classification can train reasonably good models while going through the data in only tens of rounds. However, large data often does not fit in the memory of a single machine, which makes the bottleneck in largescale learning the disk I/O, not the CPU. Following this observation, Yu et al. (2010) made significant progress in reducing disk usage, and their algorithms now outperform LIBLINEAR. In this paper, rather than optimizing algorithms on a single machine, we propose and implement distributed algorithms that achieve parallel disk loading and access the disk only once. Our largescale learning algorithms are based on the framework of alternating direction methods of multipliers. The framework derives a subproblem that remains to be solved efficiently for which we propose using dual coordinate descent and trust region Newton method. Our experimental evaluations on large datasets demonstrate that the proposed algorithms achieve significant speedup over the classifier proposed by Yu et al. running on a single machine. Our algorithms are faster than existing distributed solvers, such as Zinkevich et al. (2010)’s parallel stochastic gradient descent and Vowpal Wabbit. 1
A stochastic coordinate descent primaldual algorithm and applications to largescale composite optimization,
, 2014
"... AbstractBased on the idea of randomized coordinate descent of αaveraged operators, a randomized primaldual optimization algorithm is introduced, where a random subset of coordinates is updated at each iteration. The algorithm builds upon a variant of a recent (deterministic) algorithm proposed b ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
AbstractBased on the idea of randomized coordinate descent of αaveraged operators, a randomized primaldual optimization algorithm is introduced, where a random subset of coordinates is updated at each iteration. The algorithm builds upon a variant of a recent (deterministic) algorithm proposed by Vũ and Condat that includes the well known ADMM as a particular case. The obtained algorithm is used to solve asynchronously a distributed optimization problem. A network of agents, each having a separate cost function containing a differentiable term, seek to find a consensus on the minimum of the aggregate objective. The method yields an algorithm where at each iteration, a random subset of agents wake up, update their local estimates, exchange some data with their neighbors, and go idle. Numerical results demonstrate the attractive performance of the method. The general approach can be naturally adapted to other situations where coordinate descent convex optimization algorithms are used with a random choice of the coordinates.
Solving Large Scale Linear SVM with Distributed Block Minimization
"... Over recent years we have seen the appearance of huge datasets that do not fit into memory and do not even fit on the hard disk of a single computer. Moreover, even when processed on a cluster of machines, data are usually stored in a distributed way. The transfer of significant subsets of such data ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Over recent years we have seen the appearance of huge datasets that do not fit into memory and do not even fit on the hard disk of a single computer. Moreover, even when processed on a cluster of machines, data are usually stored in a distributed way. The transfer of significant subsets of such datasets from one node to another is very slow. We present a new algorithm for training linear Support Vector Machines over such large datasets. Our algorithm assumes that the dataset is partitioned over several nodes on a cluster and performs a distributed block minimization along with the subsequent line search. The communication complexity of our algorithm is independent of the number of training examples. With our MapReduce/Hadoop implementation of this algorithm the accurate training of SVM over the datasets of tens of millions of examples takes less than 11 minutes. 1
Adding vs. averaging in distributed primaldual optimization
, 2015
"... Abstract Distributed optimization methods for largescale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of t ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract Distributed optimization methods for largescale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of the recent communicationefficient primaldual framework (COCOA) for distributed optimization. Our framework, COCOA + , allows for additive combination of local updates to the global parameters at each iteration, whereas previous schemes with convergence guarantees only allow conservative averaging. We give stronger (primaldual) convergence rate guarantees for both COCOA as well as our new variants, and generalize the theory for both methods to cover nonsmooth convex loss functions. We provide an extensive experimental comparison that shows the markedly improved performance of COCOA + on several realworld distributed datasets, especially when scaling up the number of machines.
A HypergraphPartitioned Vertex Programming Approach for Largescale Consensus Optimization
"... In modern data science problems, techniques for extracting value from big data require performing largescale optimization over heterogenous, irregularly structured data. Much of this data is best represented as multirelational graphs, making vertexprogramming abstractions such as those of Pregel ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
In modern data science problems, techniques for extracting value from big data require performing largescale optimization over heterogenous, irregularly structured data. Much of this data is best represented as multirelational graphs, making vertexprogramming abstractions such as those of Pregel and GraphLab ideal fits for modern largescale data analysis. In this paper, we describe a vertexprogramming implementation of a popular consensus optimization technique known as the alternating direction method of multipliers (ADMM) [1]. ADMM consensus optimization allows the elegant solution of complex objectives such as inference in rich probabilistic models. We also introduce a novel hypergraph partitioning technique that improves over the stateoftheart vertex programming framework and significantly reduces the communication cost by reducing the number of replicated nodes by an order of magnitude. We implement our algorithm in GraphLab and measure scaling performance on a variety of realistic bipartite graphs and a large synthetic voteropinion analysis application. We show a 50 % improvement in running time over the current GraphLab partitioning scheme.
Distributed Probabilistic Learning for Camera Networks by
, 2012
"... Probabilistic approaches to computer vision typically assume a centralized setting, with the algorithm granted access to all observed data points. However, many problems in widearea surveillance can benefit from distributed modeling, either because of physical or computations constraints. In this w ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Probabilistic approaches to computer vision typically assume a centralized setting, with the algorithm granted access to all observed data points. However, many problems in widearea surveillance can benefit from distributed modeling, either because of physical or computations constraints. In this work we present an approach to estimation and learning of generative probabilistic models in a distributed context. In particular, we show how traditional centralized models, such as probabilistic principal component analysis (PPCA), can be learned when the data is distributed across a network of sensors. We demonstrate the utility of this approach on the problem of distributed affine structure from motion (SfM). Our experiments suggest that the accuracy of the accuracy of the learned probabilistic structure and motion models rivals that of traditional centralized factorization methods. 1 1
A General Analysis of the Convergence of ADMM
"... We provide a new proof of the linear convergence of the alternating direction method of multipliers (ADMM) when one of the objective terms is strongly convex. Our proof is based on a framework for analyzing optimization algorithms introduced in Lessard et al. (2014), reducing algorithm convergen ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We provide a new proof of the linear convergence of the alternating direction method of multipliers (ADMM) when one of the objective terms is strongly convex. Our proof is based on a framework for analyzing optimization algorithms introduced in Lessard et al. (2014), reducing algorithm convergence to verifying the stability of a dynamical system. This approach generalizes a number of existing results and obviates any assumptions about specific choices of algorithm parameters. On a numerical example, we demonstrate that minimizing the derived bound on the convergence rate provides a practical approach to selecting algorithm parameters for particular ADMM instances. We complement our upper bound by constructing a nearlymatching lower bound on the worstcase rate of convergence. 1.