Results 1  10
of
710,198
Stochastic Gradient Descent with Only One Projection
"... Although many variants of stochastic gradient descent have been proposed for largescale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), ..."
Abstract

Cited by 6 (4 self)
 Add to MetaCart
), the projection step can be computationally expensive, making stochastic gradient descent unattractive for largescale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last
Stochastic Gradient Descent with Only One Projection
"... Although many variants of stochastic gradient descent have been proposed for largescale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), ..."
Abstract
 Add to MetaCart
), the projection step can be computationally expensive, making stochastic gradient descent unattractive for largescale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last
Learning to rank using gradient descent
 In ICML
, 2005
"... We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data f ..."
Abstract

Cited by 510 (17 self)
 Add to MetaCart
We investigate using gradient descent methods for learning ranking functions; we propose a simple probabilistic cost function, and we introduce RankNet, an implementation of these ideas using a neural network to model the underlying ranking function. We present test results on toy data and on data
Learning LongTerm Dependencies with Gradient Descent is Difficult
 TO APPEAR IN THE SPECIAL ISSUE ON RECURRENT NETWORKS OF THE IEEE TRANSACTIONS ON NEURAL NETWORKS
"... Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in th ..."
Abstract

Cited by 374 (35 self)
 Add to MetaCart
in the input/output sequences span long intervals. We showwhy gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a tradeoff between efficient learning by gradient descent and latching on information
Greedy Function Approximation: A Gradient Boosting Machine
 Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
Abstract

Cited by 951 (12 self)
 Add to MetaCart
Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed
Histograms of Oriented Gradients for Human Detection
 In CVPR
, 2005
"... We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly out ..."
Abstract

Cited by 3678 (9 self)
 Add to MetaCart
We study the question of feature sets for robust visual object recognition, adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of Histograms of Oriented Gradient (HOG) descriptors significantly
Stochastic Perturbation Theory
, 1988
"... . In this paper classical matrix perturbation theory is approached from a probabilistic point of view. The perturbed quantity is approximated by a firstorder perturbation expansion, in which the perturbation is assumed to be random. This permits the computation of statistics estimating the variatio ..."
Abstract

Cited by 886 (35 self)
 Add to MetaCart
. In this paper classical matrix perturbation theory is approached from a probabilistic point of view. The perturbed quantity is approximated by a firstorder perturbation expansion, in which the perturbation is assumed to be random. This permits the computation of statistics estimating the variation in the perturbed quantity. Up to the higherorder terms that are ignored in the expansion, these statistics tend to be more realistic than perturbation bounds obtained in terms of norms. The technique is applied to a number of problems in matrix perturbation theory, including least squares and the eigenvalue problem. Key words. perturbation theory, random matrix, linear system, least squares, eigenvalue, eigenvector, invariant subspace, singular value AMS(MOS) subject classifications. 15A06, 15A12, 15A18, 15A52, 15A60 1. Introduction. Let A be a matrix and let F be a matrix valued function of A. Two principal problems of matrix perturbation theory are the following. Given a matrix E, pr...
The Valuation of Options for Alternative Stochastic Processes
 Journal of Financial Economics
, 1976
"... This paper examines the structure of option valuation problems and develops a new technique for their solution. It also introduces several jump and diffusion processes which have nol been used in previous models. The technique is applied lo these processes to find explicit option valuation formulas, ..."
Abstract

Cited by 661 (4 self)
 Add to MetaCart
This paper examines the structure of option valuation problems and develops a new technique for their solution. It also introduces several jump and diffusion processes which have nol been used in previous models. The technique is applied lo these processes to find explicit option valuation formulas, and solutions to some previously unsolved problems involving the pricing ofsecurities with payouts and potential bankruptcy. 1.
Exponentiated Gradient Versus Gradient Descent for Linear Predictors
 Information and Computation
, 1995
"... this paper, we concentrate on linear predictors . To any vector u 2 R ..."
Abstract

Cited by 325 (14 self)
 Add to MetaCart
this paper, we concentrate on linear predictors . To any vector u 2 R
Gradient flows in metric spaces and in the space of probability measures
 LECTURES IN MATHEMATICS ETH ZÜRICH, BIRKHÄUSER VERLAG
, 2005
"... ..."
Results 1  10
of
710,198