Results 11  20
of
57
NonAsymptotic Analysis of Stochastic Approximation Algorithms for Machine Learning
"... We consider the minimization of a convex objective function defined on a Hilbert space, which is only available through unbiased estimates of its gradients. This problem includes standard machine learning algorithms such as kernel logistic regression and leastsquares regression, and is commonly ref ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
We consider the minimization of a convex objective function defined on a Hilbert space, which is only available through unbiased estimates of its gradients. This problem includes standard machine learning algorithms such as kernel logistic regression and leastsquares regression, and is commonly referred to as a stochastic approximation problem in the operations research community. We provide a nonasymptotic analysis of the convergence of two wellknown algorithms, stochastic gradient descent (a.k.a. RobbinsMonro algorithm) as well as a simple modification where iterates are averaged (a.k.a. PolyakRuppert averaging). Our analysis suggests that a learning rate proportional to the inverse of the number of iterations, while leading to the optimal convergence rate in the strongly convex case, is not robust to the lack of strong convexity or the setting of the proportionality constant. This situation is remedied when using slower decays together with averaging, robustly leading to the optimal rate of convergence. We illustrate our theoretical results with simulations on synthetic and standard datasets. 1
Hierarchical classification via orthogonal transfer
 In ICML
, 2011
"... We consider multiclass classification problems where the set of labels are organized hierarchically as a category tree. We associate each node in the tree with a classifier and classify the examples recursively from the root to the leaves. We propose a hierarchical Support Vector Machine (SVM) that ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We consider multiclass classification problems where the set of labels are organized hierarchically as a category tree. We associate each node in the tree with a classifier and classify the examples recursively from the root to the leaves. We propose a hierarchical Support Vector Machine (SVM) that encourages the classifier at each node to be different from the classifiers at its ancestors. More specifically, we introduce regularizations that force the normal vector of the classifying hyperplane at each node to be orthogonal to those at its ancestors as much as possible. We establish conditions under which training such a hierarchical SVM is a convex optimization problem, anddevelop an efficient dualaveraging method for solving it. 1.
Optimal distributed online prediction
 In Proceedings of the 28th International Conference on Machine Learning (ICML11
, 2011
"... Onlinepredictionmethodsaretypicallystudied as serial algorithms running on a single processor. In this paper, we present the distributed minibatch (DMB) framework, a method of converting a serial gradientbased onlinealgorithmintoadistributedalgorithm, and prove an asymptotically optimal regret bou ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Onlinepredictionmethodsaretypicallystudied as serial algorithms running on a single processor. In this paper, we present the distributed minibatch (DMB) framework, a method of converting a serial gradientbased onlinealgorithmintoadistributedalgorithm, and prove an asymptotically optimal regret bound for smooth convex loss functions and stochastic examples. Our analysis explicitly takes into account communication latencies between computing nodes in a network. We also present robust variants, which are resilient to failures and node heterogeneity in an asynchronous distributed environment. Our method can also be used for distributed stochastic optimization, attaining an asymptotically linear speedup. Finally, we empirically demonstrate the merits of our approach on largescale online prediction problems. 1.
Adaptive bound optimization for online convex optimization (extended version
, 2010
"... We introduce a new online convex optimization algorithm that adaptively chooses its regularization function based on the loss functions observed so far. This is in contrast to previous algorithms that use a fixed regularization function such as L2squared, and modify it only via a single timedepend ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
We introduce a new online convex optimization algorithm that adaptively chooses its regularization function based on the loss functions observed so far. This is in contrast to previous algorithms that use a fixed regularization function such as L2squared, and modify it only via a single timedependent parameter. Our algorithmâ€™s regret bounds are worstcase optimal, and for certain realistic classes of loss functions they are much better than existing bounds. These bounds are problemdependent, which means they can exploit the structure of the actual problem instance. Critically, however, our algorithm does not need to know this structure in advance. Rather, we prove competitive guarantees that show the algorithm provides a bound within a constant factor of the best possible bound (of a certain functional form) in hindsight. 1
A Stochastic Gradient Method with an Exponential Convergence Rate for StronglyConvex Optimization with Finite Training Sets. arXiv preprint arXiv:1202.6258
, 2012
"... We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training objective and reducing the testing objective quickly. 1
Online Learning for Group Lasso
"... We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batchmode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batchmode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs in an online mode and scales well: at each iteration one can update the weight vector according to a closedform solution based on the average of previous subgradients. Therefore, the proposed online algorithm can be very efficient and scalable. This is guaranteed by its low worstcase time complexity and memory cost both in the order of O(d), where d is the number of dimensions. Moreover, in order to achieve more sparsity in both the group level and the individual feature level, we successively extend our online system to efficiently solve a number of variants of sparse group lasso models. We also show that the online system is applicable to other group lasso models, such as the group lasso with overlap and graph lasso. Finally, we demonstrate the merits of our algorithm by experimenting with both synthetic and realworld datasets. 1.
UltraFast Optimization Algorithm for Sparse Multi Kernel Learning
"... Many stateoftheart approaches for Multi Kernel Learning (MKL) struggle at finding a compromise between performance, sparsity of the solution and speed of the optimization process. In this paper we look at the MKL problem at the same time from a learning and optimization point of view. So, instead ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Many stateoftheart approaches for Multi Kernel Learning (MKL) struggle at finding a compromise between performance, sparsity of the solution and speed of the optimization process. In this paper we look at the MKL problem at the same time from a learning and optimization point of view. So, instead of designing a regularizer and then struggling to find an efficient method to minimize it, we design the regularizer while keeping the optimization algorithm in mind. Hence, we introduce a novel MKL formulation, which mixes elements of pnorm and elasticnet kind of regularization. We also propose a fast stochastic gradient descent method that solves the novel MKL formulation. We show theoretically and empirically that our method has 1) stateoftheart performance on many classification tasks; 2) exact sparse solutions with a tunable level of sparsity; 3) a convergence rate bound that depends only logarithmically on the number of kernels used, and is independent of the sparsity required; 4) independence on the particular convex loss function used. 1.
New Adaptive Algorithms for Online Classification
"... We propose a general framework to online learning for classification problems with timevarying potential functions in the adversarial setting. This framework allows to design and prove relative mistake bounds for any generic loss function. The mistake bounds can be specialized for the hinge loss, a ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
We propose a general framework to online learning for classification problems with timevarying potential functions in the adversarial setting. This framework allows to design and prove relative mistake bounds for any generic loss function. The mistake bounds can be specialized for the hinge loss, allowing to recover and improve the bounds of known online classification algorithms. By optimizing the general bound we derive a new online classification algorithm, called NAROW, that hybridly uses adaptive and fixed second order information. We analyze the properties of the algorithm and illustrate its performance using synthetic dataset. 1
Recent Advances of Largescale Linear Classification
"... Linear classification is a useful tool in machine learning and data mining. For some data in a rich dimensional space, the performance (i.e., testing accuracy) of linear classifiers has shown to be close to that of nonlinear classifiers such as kernel methods, but training and testing speed is much ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Linear classification is a useful tool in machine learning and data mining. For some data in a rich dimensional space, the performance (i.e., testing accuracy) of linear classifiers has shown to be close to that of nonlinear classifiers such as kernel methods, but training and testing speed is much faster. Recently, many research works have developed efficient optimization methods to construct linear classifiers and applied them to some largescale applications. In this paper, we give a comprehensive survey on the recent development of this active research area.
Better MiniBatch Algorithms via Accelerated Gradient Methods
"... Minibatch algorithms have been proposed as a way to speedup stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard gradient methods may sometimes be insufficient to obtain a sig ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Minibatch algorithms have been proposed as a way to speedup stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard gradient methods may sometimes be insufficient to obtain a significant speedup and propose a novel accelerated gradient algorithm, which deals with this deficiency, enjoys a uniformly superior guarantee and works well in practice. 1