• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Conditional gradient algorithms for machine learning. NIPS OPT workshop, (2012)

by Z Harchaoui, A Juditsky, A S Nemirovski
Add To MetaCart

Tools

Sorted by:
Results 1 - 8 of 8

Revisiting frank-wolfe: Projection-free sparse convex optimization

by Martin Jaggi - In ICML , 2013
"... We provide stronger and more general primal-dual convergence results for Frank-Wolfe-type algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approxi ..."
Abstract - Cited by 86 (2 self) - Add to MetaCart
We provide stronger and more general primal-dual convergence results for Frank-Wolfe-type algorithms (a.k.a. conditional gradient) for constrained convex optimization, enabled by a simple framework of duality gap certificates. Our analysis also holds if the linear subproblems are only solved approximately (as well as if the gradients are inexact), and is proven to be worst-case optimal in the sparsity of the obtained solutions. On the application side, this allows us to unify a large variety of existing sparse greedy methods, in particular for optimization over convex hulls of an atomic set, even if those sets can only be approximated, including sparse (or structured sparse) vectors or matrices, low-rank matrices, permutation matrices, or max-norm bounded matrices. We present a new general framework for convex optimization over matrix factorizations, where every Frank-Wolfe iteration will consist of a low-rank update, and discuss the broad application areas of this approach. 1.
(Show Context)

Citation Context

...ration performing a line-search on f towards all “vertices” of the domain. In the machine learning literature, algorithm variants for penalized (instead of constrained) problems were investigated by (=-=Harchaoui et al., 2012-=-; Zhang et al., 2012). For online optimization of non-smooth functions in the low-regret setting, a variant has recently been proposed by (Hazan & Kale, 2012), using randomized smoothing. (Tewari et a...

Conditional gradient algorithms for normregularized smooth convex optimization

by Zaid Harchaoui, Anatoli Juditsky, Arkadi Nemirovski , 2013
"... Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimiz ..."
Abstract - Cited by 23 (6 self) - Add to MetaCart
Motivated by some applications in signal processing and machine learning, we consider two convex optimization problems where, given a cone K, a norm ‖ · ‖ and a smooth convex function f, we want either 1) to minimize the norm over the intersection of the cone and a level set of f, or 2) to minimize over the cone the sum of f and a multiple of the norm. We focus on the case where (a) the dimension of the problem is too large to allow for interior point algorithms, (b) ‖ · ‖ is “too complicated ” to allow for computationally cheap Bregman projections required in the first-order proximal gradient algorithms. On the other hand, we assume that it is relatively easy to minimize linear forms over the intersection of K and the unit ‖ · ‖-ball. Motivating examples are given by the nuclear norm with K being the entire space of matrices, or the positive semidefinite cone in the space of symmetric matrices, and the Total Variation norm on the space of 2D images. We discuss versions of the Conditional Gradient algorithm capable to handle our problems of interest, provide the related theoretical efficiency estimates and outline some applications. 1

The complexity of large-scale convex programming under a linear optimization oracle.

by Guanghui Lan , 2013
"... Abstract This paper considers a general class of iterative optimization algorithms, referred to as linear-optimizationbased convex programming (LCP) methods, for solving large-scale convex programming (CP) problems. The LCP methods, covering the classic conditional gradient (CG) method (a.k.a., Fra ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
Abstract This paper considers a general class of iterative optimization algorithms, referred to as linear-optimizationbased convex programming (LCP) methods, for solving large-scale convex programming (CP) problems. The LCP methods, covering the classic conditional gradient (CG) method (a.k.a., Frank-Wolfe method) as a special case, can only solve a linear optimization subproblem at each iteration. In this paper, we first establish a series of lower complexity bounds for the LCP methods to solve different classes of CP problems, including smooth, nonsmooth and certain saddlepoint problems. We then formally establish the theoretical optimality or nearly optimality, in the large-scale case, for the CG method and its variants to solve different classes of CP problems. We also introduce several new optimal LCP methods, obtained by properly modifying Nesterov's accelerated gradient method, and demonstrate their possible advantages over the classic CG for solving certain classes of large-scale CP problems.

Weakly Supervised Action Labeling in Videos Under Ordering Constraints

by Piotr Bojanowski, Rémi Lajugie, Francis Bach, Ivan Laptev, Jean Ponce, Cordelia Schmid, Josef Sivic, École Normale Supérieure
"... Abstract. We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk ” then “sit ” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discrimin ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
Abstract. We are given a set of video clips, each one annotated with an ordered list of actions, such as “walk ” then “sit ” then “answer phone” extracted from, for example, the associated text script. We seek to temporally localize the individual actions in each clip as well as to learn a discriminative classifier for each action. We formulate the problem as a weakly supervised temporal assignment with ordering constraints. Each video clip is divided into small time intervals and each time interval of each video clip is assigned one action label, while respecting the order in which the action labels appear in the given annotations. We show that the action label assignment can be determined together with learning a classifier for each action in a discriminative manner. We evaluate the proposed model on a new and challenging dataset of 937 video clips with a total of 787720 frames containing sequences of 16 different actions from 69 Hollywood movies. 1
(Show Context)

Citation Context

...n only by optimizing linear functions over the domain. In particular, it does not require any projection steps. It has recently received increased attention in the context of large-scale optimization =-=[11, 17]-=-. 1.2 Problem Statement and Contributions The temporal assignment problem addressed in the rest of this paper and illustrated by Fig. 1 can be stated as follows: We are given a set of N video clips (o...

Conditional gradient sliding for convex optimization

by Guanghui Lan , Yi Zhou , 2014
"... Abstract In this paper, we present a new conditional gradient type method for convex optimization by utilizing a linear optimization (LO) oracle to minimize a series of linear functions over the feasible set. Different from the classic conditional gradient method, the conditional gradient sliding ( ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract In this paper, we present a new conditional gradient type method for convex optimization by utilizing a linear optimization (LO) oracle to minimize a series of linear functions over the feasible set. Different from the classic conditional gradient method, the conditional gradient sliding (CGS) algorithm developed herein can skip the computation of gradients from time to time, and as a result, can achieve the optimal complexity bounds in terms of not only the number of calls to the LO oracle, but also the number of gradient evaluations. More specifically, we show that the CGS method requires O(1/ √ ) and O(log(1/ )) gradient evaluations, respectively, for solving smooth and strongly convex problems, while still maintaining the optimal O(1/ ) bound on the number of calls to the LO oracle. We also develop variants of the CGS method which can achieve the optimal complexity bounds for solving stochastic optimization problems and an important class of saddle point optimization problems. To the best of our knowledge, this is the first time that these types of projection-free optimal first-order methods have been developed in the literature. Some preliminary numerical results have also been provided to demonstrate the advantages of the CGS method.
(Show Context)

Citation Context

... f ′(xk−1). 2) Call the linear optimization (LO) oracle to compute yk ∈ Argminx∈X〈pk, x〉. (1.3) 3) Set xk = (1− αk)xk−1 + αkyk for some αk ∈ [0, 1]. In addition to the computation of first-order information, each iteration of the CndG method requires only the solution of a linear optimization subproblem (1.3), while most other first-order methods require the projection over X. Since in some cases it is computationally cheaper to solve (1.3) than to perform projection over X, the CndG method has gained much interests recently from both the machine learning and optimization community (see, e.g.,[1, 2, 3, 7, 6, 11, 15, 14, 16, 17, 18, 22, 27, 28]. In particular, much recent research effort has been devoted to the complexity analysis of the CndG method. For example, it has been shown that if αk in step 3) of the CndG method are properly chosen, then this algorithm can find an -solution of (1.1) (i.e., a point x ∈ X s.t. f(x) − f∗ ≤ ) in at most O(1/) iterations. In fact, such a complexity result has been established for the CndG method under a stronger termination criterion based on the first-order optimality condition of (1.1) (see [17, 18, 11, 14]). Observe that the aforementioned O(1/) bound on gradient evaluations is signific...

Suykens, “Hybrid conditional gradient-smoothing algorithms with applications to sparse and low rank regularization

by A. Argyriou, M. Signoretto, J. Suykens - Regularization, Optimization, Kernels, and Support Vector Machines , 2014
"... Conditional gradient methods are old and well studied optimization algorithms. Their origin dates at least to the 50’s and the Frank-Wolfe algorithm for quadratic programming [18] but they apply to much more general optimization problems. General formulations of conditional gradient algorithms have ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Conditional gradient methods are old and well studied optimization algorithms. Their origin dates at least to the 50’s and the Frank-Wolfe algorithm for quadratic programming [18] but they apply to much more general optimization problems. General formulations of conditional gradient algorithms have been studied in the
(Show Context)

Citation Context

... 14, 16]. More recently, interest in the family of conditional gradient algorithms has been revived, especially in theoretical computer science, machine learning, computational geometry and elsewhere =-=[24, 23, 29, 3, 20, 54, 56, 27, 10, 22, 33, 19]-=-. Some of these algorithms have appeared independently in various fields, such as statistics and signal processing, under different names and various guises. For example, it has been observed that con...

Efficient Structured Matrix Rank Minimization

by Adams Wei Yu, et al.
"... We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map. In contrast to most known approaches for linearly structured rank minimization, we do not (a) use the full SVD; nor (b) resort to augmented Lagrangian techni ..."
Abstract - Add to MetaCart
We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map. In contrast to most known approaches for linearly structured rank minimization, we do not (a) use the full SVD; nor (b) resort to augmented Lagrangian techniques; nor (c) solve linear systems per iteration. Instead, we formulate the problem differently so that it is amenable to a generalized conditional gradient method, which results in a practical improvement with low per iteration computational cost. Numerical results show that our approach significantly outperforms state-of-the-art competitors in terms of running time, while effectively recovering low rank solutions in stochastic system realization and spectral compressed sensing problems.

Grenoble- Rhône-Alpes THEME

by Université Joseph Fourier
"... ..."
Abstract - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...ontent in the masked areas.Project-Team LEAR 11 6.2.2. Conditional gradient algorithms for machine learning Participants: Zaid Harchaoui, Anatoli Juditsky [UJF], Arkadi Nemirovski [Georgia Tech]. In =-=[17]-=- we consider convex optimization problems arising in machine learning in high-dimensional settings. For several important learning problems, such as e.g. noisy matrix completion, state-of-the-art opti...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University