Results 11  20
of
57
Hierarchical classification via orthogonal transfer
 In ICML
, 2011
"... We consider multiclass classification problems where the set of labels are organized hierarchically as a category tree. We associate each node in the tree with a classifier and classify the examples recursively from the root to the leaves. We propose a hierarchical Support Vector Machine (SVM) that ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
We consider multiclass classification problems where the set of labels are organized hierarchically as a category tree. We associate each node in the tree with a classifier and classify the examples recursively from the root to the leaves. We propose a hierarchical Support Vector Machine (SVM) that encourages the classifier at each node to be different from the classifiers at its ancestors. More specifically, we introduce regularizations that force the normal vector of the classifying hyperplane at each node to be orthogonal to those at its ancestors as much as possible. We establish conditions under which training such a hierarchical SVM is a convex optimization problem, anddevelop an efficient dualaveraging method for solving it. 1.
Optimal distributed online prediction
 In Proceedings of the 28th International Conference on Machine Learning (ICML11
, 2011
"... Onlinepredictionmethodsaretypicallystudied as serial algorithms running on a single processor. In this paper, we present the distributed minibatch (DMB) framework, a method of converting a serial gradientbased onlinealgorithmintoadistributedalgorithm, and prove an asymptotically optimal regret bou ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Onlinepredictionmethodsaretypicallystudied as serial algorithms running on a single processor. In this paper, we present the distributed minibatch (DMB) framework, a method of converting a serial gradientbased onlinealgorithmintoadistributedalgorithm, and prove an asymptotically optimal regret bound for smooth convex loss functions and stochastic examples. Our analysis explicitly takes into account communication latencies between computing nodes in a network. We also present robust variants, which are resilient to failures and node heterogeneity in an asynchronous distributed environment. Our method can also be used for distributed stochastic optimization, attaining an asymptotically linear speedup. Finally, we empirically demonstrate the merits of our approach on largescale online prediction problems. 1.
Adaptive bound optimization for online convex optimization (extended version
, 2010
"... We introduce a new online convex optimization algorithm that adaptively chooses its regularization function based on the loss functions observed so far. This is in contrast to previous algorithms that use a fixed regularization function such as L2squared, and modify it only via a single timedepend ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
We introduce a new online convex optimization algorithm that adaptively chooses its regularization function based on the loss functions observed so far. This is in contrast to previous algorithms that use a fixed regularization function such as L2squared, and modify it only via a single timedependent parameter. Our algorithmâ€™s regret bounds are worstcase optimal, and for certain realistic classes of loss functions they are much better than existing bounds. These bounds are problemdependent, which means they can exploit the structure of the actual problem instance. Critically, however, our algorithm does not need to know this structure in advance. Rather, we prove competitive guarantees that show the algorithm provides a bound within a constant factor of the best possible bound (of a certain functional form) in hindsight. 1
Online Learning for Group Lasso
"... We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batchmode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batchmode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs in an online mode and scales well: at each iteration one can update the weight vector according to a closedform solution based on the average of previous subgradients. Therefore, the proposed online algorithm can be very efficient and scalable. This is guaranteed by its low worstcase time complexity and memory cost both in the order of O(d), where d is the number of dimensions. Moreover, in order to achieve more sparsity in both the group level and the individual feature level, we successively extend our online system to efficiently solve a number of variants of sparse group lasso models. We also show that the online system is applicable to other group lasso models, such as the group lasso with overlap and graph lasso. Finally, we demonstrate the merits of our algorithm by experimenting with both synthetic and realworld datasets. 1.
A Stochastic Gradient Method with an Exponential Convergence Rate for StronglyConvex Optimization with Finite Training Sets. arXiv preprint arXiv:1202.6258
, 2012
"... We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training objective and reducing the testing objective quickly. 1
New Adaptive Algorithms for Online Classification
"... We propose a general framework to online learning for classification problems with timevarying potential functions in the adversarial setting. This framework allows to design and prove relative mistake bounds for any generic loss function. The mistake bounds can be specialized for the hinge loss, a ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
We propose a general framework to online learning for classification problems with timevarying potential functions in the adversarial setting. This framework allows to design and prove relative mistake bounds for any generic loss function. The mistake bounds can be specialized for the hinge loss, allowing to recover and improve the bounds of known online classification algorithms. By optimizing the general bound we derive a new online classification algorithm, called NAROW, that hybridly uses adaptive and fixed second order information. We analyze the properties of the algorithm and illustrate its performance using synthetic dataset. 1
UltraFast Optimization Algorithm for Sparse Multi Kernel Learning
"... Many stateoftheart approaches for Multi Kernel Learning (MKL) struggle at finding a compromise between performance, sparsity of the solution and speed of the optimization process. In this paper we look at the MKL problem at the same time from a learning and optimization point of view. So, instead ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Many stateoftheart approaches for Multi Kernel Learning (MKL) struggle at finding a compromise between performance, sparsity of the solution and speed of the optimization process. In this paper we look at the MKL problem at the same time from a learning and optimization point of view. So, instead of designing a regularizer and then struggling to find an efficient method to minimize it, we design the regularizer while keeping the optimization algorithm in mind. Hence, we introduce a novel MKL formulation, which mixes elements of pnorm and elasticnet kind of regularization. We also propose a fast stochastic gradient descent method that solves the novel MKL formulation. We show theoretically and empirically that our method has 1) stateoftheart performance on many classification tasks; 2) exact sparse solutions with a tunable level of sparsity; 3) a convergence rate bound that depends only logarithmically on the number of kernels used, and is independent of the sparsity required; 4) independence on the particular convex loss function used. 1.
Better MiniBatch Algorithms via Accelerated Gradient Methods
"... Minibatch algorithms have been proposed as a way to speedup stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard gradient methods may sometimes be insufficient to obtain a sig ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Minibatch algorithms have been proposed as a way to speedup stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard gradient methods may sometimes be insufficient to obtain a significant speedup and propose a novel accelerated gradient algorithm, which deals with this deficiency, enjoys a uniformly superior guarantee and works well in practice. 1
Recent Advances of Largescale Linear Classification
"... Linear classification is a useful tool in machine learning and data mining. For some data in a rich dimensional space, the performance (i.e., testing accuracy) of linear classifiers has shown to be close to that of nonlinear classifiers such as kernel methods, but training and testing speed is much ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Linear classification is a useful tool in machine learning and data mining. For some data in a rich dimensional space, the performance (i.e., testing accuracy) of linear classifiers has shown to be close to that of nonlinear classifiers such as kernel methods, but training and testing speed is much faster. Recently, many research works have developed efficient optimization methods to construct linear classifiers and applied them to some largescale applications. In this paper, we give a comprehensive survey on the recent development of this active research area.
Manifold Identification of Dual Averaging Methods for Regularized Stochastic Online Learning
"... Iterative methods that take steps in approximate subgradient directions have proved to be useful for stochastic learning problems over large or streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, whose purpose is to induce structure (for example, ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Iterative methods that take steps in approximate subgradient directions have proved to be useful for stochastic learning problems over large or streaming data sets. When the objective consists of a loss function plus a nonsmooth regularization term, whose purpose is to induce structure (for example, sparsity) in the solution, the solution often lies on a lowdimensional manifold along which the regularizer is smooth. This paper shows that a regularized dual averaging algorithm can identify this manifold with high probability. This observation motivates an algorithmic strategy in which, once a nearoptimal manifold is identified, we switch to an algorithm that searches only in this manifold, which typically has much lower intrinsic dimension than the full space, thus converging quickly to a nearoptimal point with the desired structure. Computational results are presented to illustrate these claims. 1.