Results 1  10
of
838
Stochastic Gradient Descent with Only One Projection
"... Although many variants of stochastic gradient descent have been proposed for largescale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), ..."
Abstract

Cited by 6 (5 self)
 Add to MetaCart
), the projection step can be computationally expensive, making stochastic gradient descent unattractive for largescale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last
Algorithms for Nonnegative Matrix Factorization
 In NIPS
, 2001
"... Nonnegative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithms for NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minim ..."
Abstract

Cited by 1246 (5 self)
 Add to MetaCart
Nonnegative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithms for NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown
Policy gradient methods for reinforcement learning with function approximation.
 In NIPS,
, 1999
"... Abstract Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly repres ..."
Abstract

Cited by 439 (20 self)
 Add to MetaCart
proportional to the gradient: where α is a positivedefinite step size. If the above can be achieved, then θ can usually be assured to converge to a locally optimal policy in the performance measure ρ. Unlike the valuefunction approach, here small changes in θ can cause only small changes in the policy
Probing the Pareto frontier for basis pursuit solutions
, 2008
"... The basis pursuit problem seeks a minimum onenorm solution of an underdetermined leastsquares problem. Basis pursuit denoise (BPDN) fits the leastsquares problem only approximately, and a single parameter determines a curve that traces the optimal tradeoff between the leastsquares fit and the ..."
Abstract

Cited by 365 (5 self)
 Add to MetaCart
on this curve; the algorithm is suitable for problems that are large scale and for those that are in the complex domain. At each iteration, a spectral gradientprojection method approximately minimizes a leastsquares problem with an explicit onenorm constraint. Only matrixvector operations are required
Hogwild!: A lockfree approach to parallelizing stochastic gradient descent
, 2011
"... Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateoftheart performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performancedestroying memory locking and synchronization. This work a ..."
Abstract

Cited by 161 (9 self)
 Add to MetaCart
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve stateoftheart performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performancedestroying memory locking and synchronization. This work
SemiStochastic Gradient Descent Methods
, 2013
"... In this paper we study the problem of minimizing the average of a large number (n) of smooth convex loss functions. We propose a new method, S2GD (SemiStochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic gradien ..."
Abstract
 Add to MetaCart
In this paper we study the problem of minimizing the average of a large number (n) of smooth convex loss functions. We propose a new method, S2GD (SemiStochastic Gradient Descent), which runs for one or several epochs in each of which a single full gradient and a random number of stochastic
Infinitehorizon policygradient estimation
 Journal of Artificial Intelligence Research
, 2001
"... Gradientbased approaches to direct policy search in reinforcement learning have received much recent attention as a means to solve problems of partial observability and to avoid some of the problems associated with policy degradation in valuefunction methods. In this paper we introduce � � , a si ..."
Abstract

Cited by 208 (5 self)
 Add to MetaCart
simulationbased algorithm for generating a biased estimate of the gradient of the average reward in Partially Observable Markov Decision Processes ( � s) controlled by parameterized stochastic policies. A similar algorithm was proposed by Kimura, Yamamura, and Kobayashi (1995). The algorithm’s chief
Stochastic Gradient Descent on GPUs∗
"... Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. However, unlike in dataparallel algorithms, synchronization patterns in SGD are quite complex. Furthermore, scheduling for scalefree graphs is challenging. This work examines ..."
Abstract
 Add to MetaCart
Irregular algorithms such as Stochastic Gradient Descent (SGD) can benefit from the massive parallelism available on GPUs. However, unlike in dataparallel algorithms, synchronization patterns in SGD are quite complex. Furthermore, scheduling for scalefree graphs is challenging. This work
Stochastic Gradient Descent Algorithm in the Computational Network Toolkit
"... We introduce the stochastic gradient descent algorithm used in the computational network toolkit (CNTK) — a general purpose machine learning toolkit written in C++ for training and using models that can be expressed as a computational network. We describe the algorithm used to compute the gradients ..."
Abstract
 Add to MetaCart
We introduce the stochastic gradient descent algorithm used in the computational network toolkit (CNTK) — a general purpose machine learning toolkit written in C++ for training and using models that can be expressed as a computational network. We describe the algorithm used to compute
Bayesian inference on phylogeny and its impact on evolutionary biology.
 Science
, 2001
"... 1 As a discipline, phylogenetics is becoming transformed by a flood of molecular data. These data allow broad questions to be asked about the history of life, but also present difficult statistical and computational problems. Bayesian inference of phylogeny brings a new perspective to a number of o ..."
Abstract

Cited by 235 (10 self)
 Add to MetaCart
of outstanding issues in evolutionary biology, including the analysis of large phylogenetic trees and complex evolutionary models and the detection of the footprint of natural selection in DNA sequences. T he idea that species are related through a history of common descent is an old one, predating Darwin. Yet
Results 1  10
of
838