Results 1 -
5 of
5
Optimal distributed online prediction using mini-batches
, 2010
"... Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this wor ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Online prediction methods are typically presented as serial algorithms running on a single processor. However, in the age of web-scale prediction problems, it is increasingly common to encounter situations where a single processor cannot keep up with the high rate at which inputs arrive. In this work, we present the distributed mini-batch algorithm, a method of converting many serial gradient-based online prediction algorithms into distributed algorithms. We prove a regret bound for this method that is asymptotically optimal for smooth convex loss functions and stochastic inputs. Moreover, our analysis explicitly takes into account communication latencies between nodes in the distributed environment. We show how our method can be used to solve the closely-related distributed stochastic optimization problem, achieving an asymptotically linear speed-up over multiple processors. Finally, we demonstrate the merits of our approach on a web-scale online prediction problem.
Distributed delayed stochastic optimization
, 2011
"... We analyze the convergence of gradient-based optimization algorithms whose updates depend on delayed stochastic gradient information. The main application of our results is to the development of distributed minimizationalgorithmswhereamasternodeperformsparameterupdateswhile worker nodes compute stoc ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We analyze the convergence of gradient-based optimization algorithms whose updates depend on delayed stochastic gradient information. The main application of our results is to the development of distributed minimizationalgorithmswhereamasternodeperformsparameterupdateswhile worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. Our main contributionistoshowthatforsmoothstochasticproblems,thedelaysareasymptotically negligible. In application to distributed optimization, we show n-node architectures whose optimization error in stochastic problems—in spite of asynchronous delays—scales asymptotically as O(1 / √ nT), which is known to be optimal even in the absence of delays. 1
Better Mini-Batch Algorithms via Accelerated Gradient Methods
"... Mini-batch algorithms have been proposed as a way to speed-up stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard gradient methods may sometimes be insufficient to obtain a sig ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Mini-batch algorithms have been proposed as a way to speed-up stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard gradient methods may sometimes be insufficient to obtain a significant speed-up and propose a novel accelerated gradient algorithm, which deals with this deficiency, enjoys a uniformly superior guarantee and works well in practice. 1
Solving Large Scale Linear SVM with Distributed Block Minimization
"... Over recent years we have seen the appearance of huge datasets that do not fit into memory and do not even fit on the hard disk of a single computer. Moreover, even when processed on a cluster of machines, data are usually stored in a distributed way. The transfer of significant subsets of such data ..."
Abstract
- Add to MetaCart
Over recent years we have seen the appearance of huge datasets that do not fit into memory and do not even fit on the hard disk of a single computer. Moreover, even when processed on a cluster of machines, data are usually stored in a distributed way. The transfer of significant subsets of such datasets from one node to another is very slow. We present a new algorithm for training linear Support Vector Machines over such large datasets. Our algorithm assumes that the dataset is partitioned over several nodes on a cluster and performs a distributed block minimization along with the subsequent line search. The communication complexity of our algorithm is independent of the number of training examples. With our Map-Reduce/Hadoop implementation of this algorithm the accurate training of SVM over the datasets of tens of millions of examples takes less than 11 minutes. 1
Distributed Non-Stochastic Experts
"... We consider the online distributed non-stochastic experts problem, where the distributed system consists of one coordinator node that is connected to k sites, and the sites are required to communicate with each other via the coordinator. At each time-step t, one of the k site nodes has to pick an ex ..."
Abstract
- Add to MetaCart
We consider the online distributed non-stochastic experts problem, where the distributed system consists of one coordinator node that is connected to k sites, and the sites are required to communicate with each other via the coordinator. At each time-step t, one of the k site nodes has to pick an expert from the set {1,..., n}, and the same site receives information about payoffs of all experts for that round. The goal of the distributed system is to minimize regret at time horizon T, while simultaneously keeping communication to a minimum. The two extreme solutions to this problem are: (i) Full communication: This essentially simulates the nondistributed setting to obtain the optimal O ( √ log(n)T) regret bound at the cost of T communication. (ii) No communication: Each site runs an independent copy – the regret is O ( √ log(n)kT) and the communication is 0. This paper shows the difficulty of simultaneously achieving regret asymptotically better than √ kT and communication better than T. We give a novel algorithm that for an oblivious adversary achieves a non-trivial trade-off: regret O ( √ k 5(1+ɛ)/6 T) and communication O(T/k ɛ), for any value of ɛ ∈ (0, 1/5). We also consider a variant of the model, where the coordinator picks the expert. In this model, we show that the label-efficient forecaster of Cesa-Bianchi et al. (2005) already gives us strategy that is near optimal in regret vs communication trade-off. 1

