## Subgradient methods for maximum margin structured learning (2006)

Venue: | In ICML Workshop on Learning in Structured Output Spaces |

Citations: | 22 - 0 self |

### BibTeX

@INPROCEEDINGS{Ratliff06subgradientmethods,

author = {Nathan D. Ratliff and J. Andrew Bagnell and Martin A. Zinkevich},

title = {Subgradient methods for maximum margin structured learning},

booktitle = {In ICML Workshop on Learning in Structured Output Spaces},

year = {2006}

}

### OpenURL

### Abstract

Maximum margin structured learning (MMSL) has recently gained recognition within the machine learning community as a tractable method for large scale learning. However, most current methods are limited

### Citations

733 | Gradient-based learning applied to document recognition
- Lecun, Bottou, et al.
- 1998
(Show Context)
Citation Context ... al., 2006) has considered reinforcement learning based approaches to structured classification. Subgradient methods for (unstructured) margin linear classification were considered in (Zhang, 2004). (=-=LeCun et al., 1998-=-) considers the use of gradient methods for learning using decoding methods such as Viterbi; our approach (if applied to sequence labeling) extends such methods to use notions of structured maximum ma... |

436 | Max-margin Markov networks
- Taskar, Guestrin, et al.
- 2003
(Show Context)
Citation Context ...as a tractable method for large scale learning. However, most current methods are limited in terms of scalability, convergence, or memory requirements. The original Structured SMO method proposed in (=-=Taskar et al., 2003-=-) is slow to converge, particularly for Markov networks of even medium treewidth. Similarly, dual exponentiated gradient techniques suffer from sublinear convergence as well as often large memory requ... |

371 | Large margin methods for structured and interdependent output variables
- Tsochantaridis, Joachims, et al.
- 2005
(Show Context)
Citation Context ...sting techniques introduce a log b factor for the number of predicted bits. 5. Slack-scaling In principle, we can use these tools to compute subgradients of the slack-scaling formulation detailed in (=-=Tsochantaridis et al., 2005-=-). Disregarding the regularization, under this formulation Equation 4 becomes ˜c(w) = 1 n n∑ i=1 βi { max Li(y) y∈Yi ( w T (fi(y) − fi(yi)) + 1 )} q Multiplying by the loss inside the maximization mak... |

273 |
Minimization Methods for Nondifferentiable Functions
- Shor
- 1985
(Show Context)
Citation Context ...ts into the objective to create a convex function in w. This objective is then optimized by a direct generalization of gradient descent, popular in convex optimization, called the subgradient method (=-=Shor, 1985-=-). The abundance of literature on subgradient methods makes this algorithm a decidedly convenient choice. In this case, it is well known that the subgradient method is guaranteed linear convergence wh... |

134 | C.Gentile, “On the generalization ability of on-line learning algorithms
- Cesa-Bianchi, Conconi
(Show Context)
Citation Context ...a, the expected loss of our algorithm can be bounded, with probability greater than or equal to 1−δ, by the errors it makes at each step of the incremental subgradient method using the techniques of (=-=Cesa-Bianchi et al., 2004-=-): 4 E[LT +1( ¯w)] ≤ 1 T T∑ rt(wt) + t=1 √ 2 T log ( ) 1 δ (11) This bound is rather similar in form to previous generalization bounds given using covering number techniques (Taskar et al., 2003). Imp... |

122 | Logarithmic regret algorithms for online convex optimization
- HAZAN, AGARWAL, et al.
- 2007
(Show Context)
Citation Context ...ence when the stepsize is chosen to be constant. Furthermore, this algorithm becomes the well-studied Greedy Projection algorithm of (Zinkevich, 2003) in the online setting. Using tools developed in (=-=Hazan et al., 2006-=-), we can show that the risk of this online algorithm with respect to the prediction loss grows only sublinearly in time. Perhaps more importantly, the implementation of this algorithm is simple and h... |

104 | Maximum Margin Planning
- Ratliff, Bagnell, et al.
- 2006
(Show Context)
Citation Context ...avigation. The former problem is well known to MMSL, but the latter is new to this domain. Indeed, although there is a tractable polynomial sized quadratic programming representation for the problem (=-=Ratliff et al., 2006-=-), solving it directly using one of the previously proposed methods would be intractable practically for reasons similar to those that arise in directly solving the linear programming formulation of M... |

56 | Solving large scale linear prediction problems using stochastic gradient descent algorithms
- Zhang
- 2004
(Show Context)
Citation Context ...work, (Duame et al., 2006) has considered reinforcement learning based approaches to structured classification. Subgradient methods for (unstructured) margin linear classification were considered in (=-=Zhang, 2004-=-). (LeCun et al., 1998) considers the use of gradient methods for learning using decoding methods such as Viterbi; our approach (if applied to sequence labeling) extends such methods to use notions of... |

47 | Convergence rate of incremental subgradient algorithms
- Nedic, Bertsekas
- 2000
(Show Context)
Citation Context ...y)) for each i. 6: Compute g ∈ ∂c(w) as in Equation 6. 7: Update w ← w − αtg 8: (Optional): Project w on to any additional constraints. 9: t ← t + 1 10: end while 11: return w 12: end procedure from (=-=Nedic & Bertsekas, 2000-=-) who analyze incremental subgradient algorithms, of which the subgradient method is a special case. Our results require a strong convexity assumption to hold for the objective function. Given W ⊆ R d... |

27 | Structured prediction via the extragradient method. NIPS
- Taskar, Lacoste-Julien, et al.
- 2005
(Show Context)
Citation Context ..., particularly for Markov networks of even medium treewidth. Similarly, dual exponentiated gradient techniques suffer from sublinear convergence as well as often large memory requirements. Recently, (=-=Taskar et al., 2006-=-) have looked into saddle-point methods for optimization and have succeeded in efficiently solving several problems that would have otherwise had intractable memory requirements. We propose an alterna... |

1 | Preliminary version in Proc. of the 14th conference on Neural Information processing Systems (NIPS - SearchDuame, Langford, et al. - 2001 |