## Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient Approximation (1992)

Venue: | IEEE Transactions on Automatic Control |

Citations: | 215 - 14 self |

### BibTeX

@ARTICLE{Spall92multivariatestochastic,

author = {James C. Spall and Senior Member},

title = {Multivariate Stochastic Approximation Using a Simultaneous Perturbation Gradient Approximation},

journal = {IEEE Transactions on Automatic Control},

year = {1992},

volume = {37},

pages = {332--341}

}

### Years of Citing Articles

### OpenURL

### Abstract

Consider the problem of finding a root of the multivariate gradient equation that arises in function minimization. When only noisy measurements of the function are available, a stochastic approximation (SA) algorithm of the general Kiefer-Wolfowitz type is appropriate for estimating the root. This paper presents an SA algorithm that is based on a "simultaneous perturbation" gradient approximation instead of the standard finite difference approximation of Kiefer-Wolfowitz type procedures. Theory and numerical experience indicate that the algorithm presented here can be significanfiy more efficient than the standard finite difference-based algorithms in large-dimensional problems.

### Citations

1455 | An Introduction to Probability Theory - Feller - 1971 |

547 | Stochastic Processes - Doob - 1953 |

61 |
Multidimensional stochastic approximation methods
- Blum
- 1954
(Show Context)
Citation Context ...ons in the presence of noisy measurements. Perhaps the most important application of SA is in finding extrema of functions as first described in Kiefer and Wolfowitz [18] for the scalar case and Blum =-=[2]-=- for the multivariate case. This type of SA has potential applications in a number of areas relevant to statistical modeling and control, e.g., sequential parameter estimation, adaptive control, exper... |

34 |
Accelerated stochastic approximation
- Kesten
- 1958
(Show Context)
Citation Context ...ms under special conditions (e.g., second-order algorithms such as that in [26] or adaptive 0018-9286/92503.00s1992 IEEE SPALL: SIMULTANEOUS PERTURBATION STOCHASTIC APPROXIMATION 333 algorithms as in =-=[17]-=- or [19]), but they will not be considered in this paper. Rather, we will focus on the performance of (2.1) with gk(') as defined in the following, and contrast (in Sections IV and V) this performance... |

34 | Probability Theory - Laha, Rohatgi - 1979 |

32 |
Stochastic quasi-gradient methods and their application to system optimization
- Ermoliev
- 1983
(Show Context)
Citation Context ...t approximations similar to (2.2) or (2.3) in the sense that only two measurements are used in the analog to (2.2) have been considered in Kushner and Clark [20, pp. 58-60, 254-256] and Ermoliev [6], =-=[7]-=-. Motivated partly by the problem of weight estimation (learning) in neural networks, Styblinski and Tang [33] consider an algorithm similar to those of Kushner and Clark and Ermoliev for the noise m ... |

24 |
Applications of a Kushner and Clark lemma to general classes of stochastic algorithms
- Metivier, Priouret
- 1984
(Show Context)
Citation Context ...onverges almost surely to 0'. Defining the error term we can rewrite (2.1) as +, = O k - ae[g(te) + which is in e fore of a generiz Robbins-Monro gori m considered, e.g., in [20, pp. 38-39], [23], or =-=[24]-=-. t us intruce e following assumptions, which are veu similar to ose of a number of oer authors, as discussed 1ow. AI: ak,cn > OY k; ak'-*O, Ck'-}O as k--* oo; o k=osc ! 2The randomness in the perturb... |

20 |
Stochastic estimation of a regression function
- Kiefer, Wolfowitz
- 1952
(Show Context)
Citation Context ...cedure for finding roots of equations in the presence of noisy measurements. Perhaps the most important application of SA is in finding extrema of functions as first described in Kiefer and Wolfowitz =-=[18]-=- for the scalar case and Blum [2] for the multivariate case. This type of SA has potential applications in a number of areas relevant to statistical modeling and control, e.g., sequential parameter es... |

14 |
Convergence of parameter sensitivity estimates in a stochastic experiment
- Cao
- 1985
(Show Context)
Citation Context ...ons in the field of perturbation analysis since it provides insight into the shape of the performance measure without requiring exact derivatives or a large number of function evaluations (see, e.g., =-=[3], [15], or-=- [14]). We now define the "simultaneous perturbation" estimate for g('). Let A ksR p be a vector of p mutually independent mean-zero random variables {Akp Ae2,--', Akp } satisfying condition... |

14 |
Stochastic approximation of minima with improved asymptotic speed
- Fabian
- 1967
(Show Context)
Citation Context ... the performance of the SA algorithm (relative to using gk(') as in (2.2)). Obviously, other averaging methods may also be applicable. It does not appear, however, that the averaging method of Fabian =-=[8]-=- would directly apply since the elements of g?) are (conditional on ) dependent, violating a key assumption of the Fabian technique. Gradient approximations similar to (2.2) or (2.3) in the sense that... |

13 |
Strong convergence of a stochastic approximation algorithm
- Ljung
- 1978
(Show Context)
Citation Context ...which n converges almost surely to 0'. Defining the error term we can rewrite (2.1) as +, = O k - ae[g(te) + which is in e fore of a generiz Robbins-Monro gori m considered, e.g., in [20, pp. 38-39], =-=[23]-=-, or [24]. t us intruce e following assumptions, which are veu similar to ose of a number of oer authors, as discussed 1ow. AI: ak,cn > OY k; ak'-*O, Ck'-}O as k--* oo; o k=osc ! 2The randomness in th... |

6 |
On the method of generalized stochastic gradients and quasi-Fejer sequences
- Ermoliev
- 1969
(Show Context)
Citation Context ...adient approximations similar to (2.2) or (2.3) in the sense that only two measurements are used in the analog to (2.2) have been considered in Kushner and Clark [20, pp. 58-60, 254-256] and Ermoliev =-=[6]-=-, [7]. Motivated partly by the problem of weight estimation (learning) in neural networks, Styblinski and Tang [33] consider an algorithm similar to those of Kushner and Clark and Ermoliev for the noi... |

6 |
On using perturbation analysis to do sensitivity analysis: Derivatives vs. differences
- Holtzman
- 1989
(Show Context)
Citation Context ...n the field of perturbation analysis since it provides insight into the shape of the performance measure without requiring exact derivatives or a large number of function evaluations (see, e.g., [3], =-=[15], or [14])-=-. We now define the "simultaneous perturbation" estimate for g('). Let A ksR p be a vector of p mutually independent mean-zero random variables {Akp Ae2,--', Akp } satisfying conditions give... |

6 |
Stochastic approximation and sequential search for optimum
- Lai
- 1985
(Show Context)
Citation Context ...ictive condition and could be expected to hold in most applications. 3 A4 and A5 are motivated by considering a limiting form of the deterministic version of (2.1), i.e., k+ =s- ag(&) as k -* oo. Lai =-=[22]-=- presents a brief tutorial on the role of A4 and A5 in SA algorithms. See also the discussion in [23, Section 6]. Slightly weaker assumptions than A4 and A5 (leading to convergence to different roots ... |

4 | Saridis, Self-organizing control of stochastic systems - N - 1979 |

3 |
Perturbation analysis explained
- Ho
- 1988
(Show Context)
Citation Context ...ld of perturbation analysis since it provides insight into the shape of the performance measure without requiring exact derivatives or a large number of function evaluations (see, e.g., [3], [15], or =-=[14]). We now -=-define the "simultaneous perturbation" estimate for g('). Let A ksR p be a vector of p mutually independent mean-zero random variables {Akp Ae2,--', Akp } satisfying conditions given in Sect... |

3 |
Stochastic approximation method with gradient averaging for unconstrained problems
- Ruszczynski, Syski
- 1983
(Show Context)
Citation Context ...or determining optimal SA gain coefficients (a, c); carrying out these calculations would be of some interest. It would also be of interest to examine whether an averaging scheme such as that in [8], =-=[27]-=-, or [33], which might enhance the rate of convergence, could be developed for SPSA. Similarly, it may be possible to develop an SPSA-type technique for accelerated SA algorithms (e.g., those with ada... |

3 |
A stochastic steepest-descent algorithm
- Wardi
- 1988
(Show Context)
Citation Context ...resence of noise and using SA to find the root might be useful if L(.) is approximated in some computationally efficient random manner. Such stochastic optimization techniques have been considered in =-=[34]-=-, where it is assumed that the function to be nfinimized is approximated in an efficient random way. that L is a function R e - Rsfor which third-order (actually any order) derivatives exists continuo... |

2 |
Extensions of Kesten's adaptive stochastic approximation method
- Kushner, Gavin
- 1973
(Show Context)
Citation Context ... special conditions (e.g., second-order algorithms such as that in [26] or adaptive 0018-9286/92503.00s1992 IEEE SPALL: SIMULTANEOUS PERTURBATION STOCHASTIC APPROXIMATION 333 algorithms as in [17] or =-=[19]-=-), but they will not be considered in this paper. Rather, we will focus on the performance of (2.1) with gk(') as defined in the following, and contrast (in Sections IV and V) this performance with th... |

2 |
Estimation and tests of hypotheses for the initial mean and covariance in the Kalman filter model
- Shumway, Olsen, et al.
- 1981
(Show Context)
Citation Context ... the x i represent data distributed N(0, Z + Qt)), and that it has applications, e.g., in Kalman filter model estimation (the author's primary interest) and dose response curve estimation (see, e.g., =-=[29]-=- and [16]). g More relevant for our purposes here is the fact 7Chin [4] considers a different function L(') and performs a study similar to that here. His numerical results are qualitatively the same.... |

1 |
Convergence analysis of smoothed stochastic gradient-type algorithm," lnt
- Betman, Feuer, et al.
- 1987
(Show Context)
Citation Context ...lgorithms. See also the discussion in [23, Section 6]. Slightly weaker assumptions than A4 and A5 (leading to convergence to different roots 0* along different sample paths) are discussed in [24] and =-=[1]-=-. Proposition 1: Let A1-A5 and the conditions of Lemma 1 hold. Then as k --} oo 0 k - 0* for almost all cosfl. (3.3) Proof.' Given A1 and A3-A5, we know from [20, Lemma 2.2.1 and Theorem 2.3.1] (see a... |

1 |
Comparative study of several multivariate stochastic approximation algorithms
- Chin
- 1990
(Show Context)
Citation Context ...tions increases enough to nullify the reduced number of measurements per iteration). Some numerical experience described in Section V seems to corroborate this statement; also, theoretical results in =-=[4]-=- comparing asymptotic mean square errors (analogous to results in Section IV here) indicate that random directions SA will not generally be superior to finite difference SA. Note that neither of the g... |

1 |
asymptotic normality in stochastic approximation
- On
- 1968
(Show Context)
Citation Context ...)) = 0siswe have ^ ^ 2 +4 +- ?]rX;? _ and so EllPhil 2 - + to)a2C[ 2. Then from (3.4) and AI, ii) has been shown, which completes the proof. Q.E.D. C. Asymptotic Normality ofsUsing a result of Fabian =-=[9]-=-, Proposition 2 below establishes asymptotic normality for scaled . Section IV comments on how this result can be used to draw conclusions about the relative efficiency of SPSA and FDSA. Proposition 2... |

1 |
On the chmce of step size in the Robbins-Monro procedure
- Goldstein
- 1988
(Show Context)
Citation Context ...observed superiority for slowly decaying gains in a finite-sample setting is consistent with the author's other experiences with SA [e.g., [30], [31]] although asymptotic theory (e.g., [10], [11] and =-=[13]-=-) indicates that O(1/k) gains are optimal), 2) the norm values are still dropping relatively fast near the terminal iterations for the O(1/k) gain (although the values are dropping only slightly for t... |

1 |
Empirical Bayes estimation of rates
- Hui, Berger
- 1983
(Show Context)
Citation Context ...represent data distributed N(0, Z + Qt)), and that it has applications, e.g., in Kalman filter model estimation (the author's primary interest) and dose response curve estimation (see, e.g., [29] and =-=[16]-=-). g More relevant for our purposes here is the fact 7Chin [4] considers a different function L(') and performs a study similar to that here. His numerical results are qualitatively the same. 81n an M... |

1 |
Stochastic Aooroximation Methods for Constrained and Unconstrained Systems
- Kushner, Clark
- 1978
(Show Context)
Citation Context ...minf lltl[sa.s., and gives sufficient conditions for this weaker condition to hold. This audior found, however, that certain other conditions were not as easily verified as those of Kushner and Clark =-=[20]-=-, Lai [22], or Metivier and Priouret [24], which will be used in the proof of Proposition 1. where the equality follows by the fact that E(eirej)= E(e[E(e l O)) = 0siswe have ^ ^ 2 +4 +- ?]rX;? _ and ... |

1 |
Kiefer-Wolfowitz procedure," in Encyclopedia of Statistical Sciences
- Ruppert
- 1983
(Show Context)
Citation Context ...r finding 0* (e.g., steepest descent, Newton-Raphson, scoring). In the case where L is observed in the presence of noise, an SA algorithm of the generic Kiefer-Wolfowitz/Blum type is appropriate (see =-=[25] for -=-a general discussion and related references). In contrast to SA algorithms based on finite difference methods, which require 2p (noisy) measurements of L at each iteration, the "simultaneous pert... |

1 |
Bayesian error isolation lbr models of large-scale systems
- Spall
- 1988
(Show Context)
Citation Context ... greater than 5 for the latter and e - 0* is On(k -/2) (this observed superiority for slowly decaying gains in a finite-sample setting is consistent with the author's other experiences with SA [e.g., =-=[30]-=-, [31]] although asymptotic theory (e.g., [10], [11] and [13]) indicates that O(1/k) gains are optimal), 2) the norm values are still dropping relatively fast near the terminal iterations for the O(1/... |

1 |
Experiments in nonconvex opfimization: Stochastic approximation with function smoothing and simulated annealing
- Styblinski, Tang
- 1990
(Show Context)
Citation Context ...(2.2) have been considered in Kushner and Clark [20, pp. 58-60, 254-256] and Ermoliev [6], [7]. Motivated partly by the problem of weight estimation (learning) in neural networks, Styblinski and Tang =-=[33]-=- consider an algorithm similar to those of Kushner and Clark and Ermoliev for the noise m 0 setting and compare this algorithm to one based on simulated annealing. The gradient approximations of these... |