Results 1  10
of
78,094
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little u ..."
Abstract

Cited by 2794 (123 self)
 Add to MetaCart
use of these methods in an online setting suitable for realtime applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple
Greedy Function Approximation: A Gradient Boosting Machine
 Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
Abstract

Cited by 962 (12 self)
 Add to MetaCart
Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed
Pegasos: Primal Estimated subgradient solver for SVM
"... We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract

Cited by 522 (19 self)
 Add to MetaCart
We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a
descent
, 2003
"... general inefficiency of batch training for gradient ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
general inefficiency of batch training for gradient
Stochastic Gradient Descent Tricks
"... Abstract. Chapter 1 strongly advocates the stochastic backpropagation method to train neural networks. This is in fact an instance of a more general technique called stochastic gradient descent (SGD). This chapter provides background material, explains why SGD is a good learning algorithm when the ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Abstract. Chapter 1 strongly advocates the stochastic backpropagation method to train neural networks. This is in fact an instance of a more general technique called stochastic gradient descent (SGD). This chapter provides background material, explains why SGD is a good learning algorithm when
Parallelized stochastic gradient descent
 Advances in Neural Information Processing Systems 23
, 2010
"... Abstract With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization a ..."
Abstract

Cited by 86 (3 self)
 Add to MetaCart
Abstract With the increase in available data parallel machine learning has become an increasingly pressing problem. In this paper we present the first parallel stochastic gradient descent algorithm including a detailed analysis and experimental evidence. Unlike prior work on parallel optimization
Largescale machine learning with stochastic gradient descent
 in COMPSTAT
, 2010
"... Abstract. During the last decade, the data sizes have grown faster than the speed of processors. In this context, the capabilities of statistical machine learning methods is limited by the computing time rather than the sample size. A more precise analysis uncovers qualitatively different tradeoffs ..."
Abstract

Cited by 153 (1 self)
 Add to MetaCart
for the case of smallscale and largescale learning problems. The largescale case involves the computational complexity of the underlying optimization algorithm in nontrivial ways. Unlikely optimization algorithms such as stochastic gradient descent show amazing performance for largescale problems
DECOUPLING THE DATA GEOMETRY FROM THE PARAMETER GEOMETRY FOR STOCHASTIC GRADIENTS SNOWBIRD LEARNING WORKSHOP 2012 EXTENDED ABSTRACT
"... Largescale learning problems require algorithms that scale benignly with respect to the size of the dataset and the number of parameters to be trained; leading numerous practitioners to favor the classic stochastic gradient descent (SGD [1, 2, 3]) over more sophisticated methods. Besides its fast c ..."
Abstract
 Add to MetaCart
Largescale learning problems require algorithms that scale benignly with respect to the size of the dataset and the number of parameters to be trained; leading numerous practitioners to favor the classic stochastic gradient descent (SGD [1, 2, 3]) over more sophisticated methods. Besides its fast
SemiStochastic Gradient Descent Methods
, 2013
"... In this paper we study the problem of minimizing the average of a large number (n) of smooth convex loss functions. We propose a new method, S2GD (SemiStochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradien ..."
Abstract
 Add to MetaCart
In this paper we study the problem of minimizing the average of a large number (n) of smooth convex loss functions. We propose a new method, S2GD (SemiStochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic
Sequence labeling · Stochastic gradient descent
"... Periodic stepsize adaptation in secondorder gradient ..."
Results 1  10
of
78,094