Results 1  10
of
39
Dual averaging methods for regularized stochastic learning and online optimization
 In Advances in Neural Information Processing Systems 23
, 2009
"... We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1norm for promoting sparsity. We develop extensions of Nes ..."
Abstract

Cited by 131 (7 self)
 Add to MetaCart
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1norm for promoting sparsity. We develop extensions of Nesterov’s dual averaging method, that can exploit the regularization structure in an online setting. At each iteration of these methods, the learning variables are adjusted by solving a simple minimization problem that involves the running average of all past subgradients of the loss function and the whole regularization term, not just its subgradient. In the case of ℓ1regularization, our method is particularly effective in obtaining sparse solutions. We show that these methods achieve the optimal convergence rates or regret bounds that are standard in the literature on stochastic and online convex optimization. For stochastic learning problems in which the loss functions have Lipschitz continuous gradients, we also present an accelerated version of the dual averaging method.
Adaptive online gradient descent
 In Advances in Neural Information Processing Systems 21
, 2007
"... We study the rates of growth of the regret in online convex optimization. First, we show that a simple extension of the algorithm of Hazan et al eliminates the need for a priori knowledge of the lower bound on the second derivatives of the observed functions. We then provide an algorithm, Adaptive O ..."
Abstract

Cited by 43 (10 self)
 Add to MetaCart
(Show Context)
We study the rates of growth of the regret in online convex optimization. First, we show that a simple extension of the algorithm of Hazan et al eliminates the need for a priori knowledge of the lower bound on the second derivatives of the observed functions. We then provide an algorithm, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions and of Hazan et al for strongly convex functions, achieving intermediate rates between √ T and log T. Furthermore, we show strong optimality of the algorithm. Finally, we provide an extension of our results to general norms. 1
Mind the Duality Gap: Logarithmic regret algorithms for online optimization
"... We describe a primaldual framework for the design and analysis of online strongly convex optimization algorithms. Our framework yields the tightest known logarithmic regret bounds for FollowTheLeader and for the gradient descent algorithm proposed in Hazan et al. [2006]. We then show that one can ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
We describe a primaldual framework for the design and analysis of online strongly convex optimization algorithms. Our framework yields the tightest known logarithmic regret bounds for FollowTheLeader and for the gradient descent algorithm proposed in Hazan et al. [2006]. We then show that one can interpolate between these two extreme cases. In particular, we derive a new algorithm that shares the computational simplicity of gradient descent but achieves lower regret in many practical situations. Finally, we further extend our framework for generalized strongly convex functions. 1
On the equivalence of weak learnability and linear separability: New relaxations and efficient boosting algorithms
 IN: PROCEEDINGS OF THE 21ST ANNUAL CONFERENCE ON COMPUTATIONAL LEARNING THEORY
"... Boosting algorithms build highly accurate prediction mechanisms from a collection of lowaccuracy predictors. To do so, they employ the notion of weaklearnability. The starting point of this paper is a proof which shows that weak learnability is equivalent to linear separability with ℓ1 margin. Whil ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
Boosting algorithms build highly accurate prediction mechanisms from a collection of lowaccuracy predictors. To do so, they employ the notion of weaklearnability. The starting point of this paper is a proof which shows that weak learnability is equivalent to linear separability with ℓ1 margin. While this equivalence is a direct consequence of von Neumann’s minimax theorem, we derive the equivalence directly using Fenchel duality. We then use our derivation to describe a family of relaxations to the weaklearnability assumption that readily translates to a family of relaxations of linear separability with margin. This alternative perspective sheds new light on known softmargin boosting algorithms and also enables us to derive several new relaxations of the notion of linear separability. Last, we describe and analyze an efficient boosting framework that can be used for minimizing the loss functions derived from our family of relaxations. In particular, we obtain efficient boosting algorithms for maximizing hard and soft versions of the ℓ1 margin.
Online Learning: Random Averages, Combinatorial Parameters, and Learnability
"... We develop a theory of online learning by defining several complexity measures. Among them are analogues of Rademacher complexity, covering numbers and fatshattering dimension from statistical learning theory. Relationship among these complexity measures, their connection to online learning, and too ..."
Abstract

Cited by 32 (15 self)
 Add to MetaCart
(Show Context)
We develop a theory of online learning by defining several complexity measures. Among them are analogues of Rademacher complexity, covering numbers and fatshattering dimension from statistical learning theory. Relationship among these complexity measures, their connection to online learning, and tools for bounding them are provided. We apply these results to various learning problems. We provide a complete characterization of online learnability in the supervised setting. 1
30 Regression and Classification on Large Datasets
 Journal of Computer and System Sciences
, 1997
"... We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm FilterBoost, which is based on a logistic regression technique proposed by Collins et al. (2002), requi ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
We study boosting in the filtering setting, where the booster draws examples from an oracle instead of using a fixed training set and so may train efficiently on very large datasets. Our algorithm FilterBoost, which is based on a logistic regression technique proposed by Collins et al. (2002), requires fewer assumptions to achieve bounds equivalent to or better than previous work. Our proofs demonstrate the algorithm’s strong theoretical properties for both classification and conditional probability estimation, and we validate these results through extensive experiments. Empirically, our algorithm proves more robust to noise and overfitting than batch boosters in conditional probability estimation and proves competitive in classification. We give several extensions of FilterBoost to the multiclass case, proving PAC bounds on each. In particular, we make use of the ideas of pseudoloss and ErrorCorrecting Output Codes used by Freund and Schapire (1997) and Schapire (1997) to create easily implementable boosters, and we show that the generalization of Output Codes by Allwein et al. (2000) extends to FilterBoost. This work represents one of the first studies of boostingbyfiltering
Exponentiated gradient algorithms for loglinear structured prediction
 In Proc. ICML, 2007
"... Conditional loglinear models are a commonly used method for structured prediction. Efficient learning of parameters in these models is therefore an important problem. This paper describes an exponentiated gradient (EG) algorithm for training such models. EG is applied to the convex dual of the maxi ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
(Show Context)
Conditional loglinear models are a commonly used method for structured prediction. Efficient learning of parameters in these models is therefore an important problem. This paper describes an exponentiated gradient (EG) algorithm for training such models. EG is applied to the convex dual of the maximum likelihood objective; this results in both sequential and parallel update algorithms, where in the sequential algorithm parameters are updated in an online fashion. We provide a convergence proof for both algorithms. Our analysis also simplifies previous results on EG for maxmargin models, and leads to a tighter bound on convergence rates. Experiments on a largescale parsing task show that the proposed algorithm converges much faster than conjugategradient and LBFGS approaches both in terms of optimization objective and test error. 1.
On the Universality of Online Mirror Descent
"... We show that for a general class of convex online learning problems, Mirror Descent can always achieve a (nearly) optimal regret guarantee. 1 ..."
Abstract

Cited by 23 (7 self)
 Add to MetaCart
(Show Context)
We show that for a general class of convex online learning problems, Mirror Descent can always achieve a (nearly) optimal regret guarantee. 1