## Model-free control of nonlinear stochastic systems with discrete-time measurements (1998)

Venue: | IEEE Transactions on Automatic Control |

Citations: | 24 - 6 self |

### BibTeX

@ARTICLE{Spall98model-freecontrol,

author = {James C. Spall and Senior Member and John A. Cristion},

title = {Model-free control of nonlinear stochastic systems with discrete-time measurements},

journal = {IEEE Transactions on Automatic Control},

year = {1998},

volume = {43},

pages = {1198--1210}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract—Consider the problem of developing a controller for general (nonlinear and stochastic) systems where the equations governing the system are unknown. Using discrete-time measurements, this paper presents an approach for estimating a controller without building or assuming a model for the system (including such general models as differential/difference equations, neural networks, fuzzy logic rules, etc.). Such an approach has potential advantages in accommodating complex systems with possibly time-varying dynamics. Since control requires some mapping, taking system information, and producing control actions, the controller is constructed through use of a function approximator (FA) such as a neural network or polynomial (no FA is used for the unmodeled system equations). Creating the controller involves the estimation of the unknown parameters within the FA. However, since no functional form is being assumed for the system equations, the gradient of the loss function for use in standard optimization algorithms is not available. Therefore, this paper considers the use of the simultaneous perturbation stochastic approximation algorithm, which requires only system measurements (not a system model). Related to this, a convergence result for stochastic approximation algorithms with time-varying objective functions and feedback is established. It is shown that this algorithm can greatly enhance the efficiency over more standard stochastic approximation algorithms based on finite-difference gradient approximations. Index Terms — Direct adaptive control, gradient estimation, nonlinear systems, simultaneous perturbation stochastic approximation. I.

### Citations

699 |
Networks for Approximation and Learning
- Poggio, Girosi
- 1990
(Show Context)
Citation Context ...ractice to ensure that the desired level of accuracy with a given FA is being achieved). Each FA technique tends to have advantages and disadvantages, some of which are discussed in Poggio and Girosi =-=[36]-=-, Lane et al. [22], and Chen and Chen [8] (e.g., polynomials have a relatively easy physical interpretability, but the number of parameters to be determined grows rapidly with input dimension or polyn... |

557 |
Identification and control of dynamical systems using neural networks
- Narendra, Parthasarathy
- 1990
(Show Context)
Citation Context ...ss from sample input–output data on the system prior to operating the system in closed-loop and constructing the controller FA (with neural networks as the FA’s; see, e.g., Narendra and Parthasarathy =-=[31]-=-, Pao et al. [34], or SartorisSPALL AND CRISTION: MODEL-FREE CONTROL OF NONLINEAR STOCHASTIC SYSTEMS 1199 and Antsaklis [45]). In contrast, the direct control approach here does not require any open-l... |

455 | Principles of mathematical analysis - Rudin - 1964 |

383 | Adaptive Algorithms and Stochastic Approximations - Benveniste, Metivier, et al. - 1991 |

322 | der Schaft. Nonlinear Dynamical Control Systems - Nijmeijer, van - 1990 |

233 | Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
- Spall
- 1992
(Show Context)
Citation Context ...eriod in which measurements are being collected for one gradient approximation (see Section IV here). We will, therefore, consider an SA algorithm based on a “simultaneous perturbation” method (Spall =-=[46]-=-), which is typically much more efficient than the finite-difference SA algorithms in the amount of data required. In particular, the simultaneous perturbation approximation requires only one or two s... |

165 | Gaussian networks for direct adaptive control
- Sanner, Slotine
- 1992
(Show Context)
Citation Context ...roller. (We say “in some cases” because in the stochastic discrete-time setting, there are currently almost no practically useful stability results for adaptive nonlinear systems.) Sanner and Slotine =-=[43]-=-, Levin and Narendra [23], [24], Jagannathan et al. [16], Fabri and Kadirkamanathan [13], and Ahmed and Anjum [1] are examples of approaches that rely on controller FA’s but introduce stronger modelin... |

112 |
Acceleration of stochastic approximation by averaging
- Polyak, Juditsky
- 1992
(Show Context)
Citation Context .... This averaging method has been shown theoretically to yield asymptotically minimum variance estimates in the general Robbins–Monro SA setting with nontime-varying loss function (Polyak and Juditsky =-=[37]-=-) and to offer improved performance in some SPSA settings (Dippon and Renz [10] and Maryak [27]). By its nature, of course, averaging seems most appropriate for systems that have stationary—or perhaps... |

64 | Functions of several variables - Fleming - 1977 |

59 |
Control of non linear dynamical systems using neural networks. Part II: observability, identification and control
- Levin, Narendra
- 1996
(Show Context)
Citation Context ...cases” because in the stochastic discrete-time setting, there are currently almost no practically useful stability results for adaptive nonlinear systems.) Sanner and Slotine [43], Levin and Narendra =-=[23]-=-, [24], Jagannathan et al. [16], Fabri and Kadirkamanathan [13], and Ahmed and Anjum [1] are examples of approaches that rely on controller FA’s but introduce stronger modeling assumptions (e.g., dete... |

31 |
Optimization of discrete event systems via simultaneous perturbation stochastic approximation
- Fu, Hill
- 1997
(Show Context)
Citation Context ...l context. Constraints are usually handled on a problem-dependent basis (such as the wastewater example in Section IV-A), but general approaches with SPSA are described in Sadegh [41] and Fu and Hill =-=[15]-=-; these approaches have yet to be implemented in a control context. Another open problem is one common to many applications of function approximators: namely, to develop guidelines for determining the... |

29 | Universal approximation to nonlinear operators by neural networks with arbitrary activation functions and its application to dynamical systems
- Chen, Chen
- 1995
(Show Context)
Citation Context ...f accuracy with a given FA is being achieved). Each FA technique tends to have advantages and disadvantages, some of which are discussed in Poggio and Girosi [36], Lane et al. [22], and Chen and Chen =-=[8]-=- (e.g., polynomials have a relatively easy physical interpretability, but the number of parameters to be determined grows rapidly with input dimension or polynomial order). Since the approach of this ... |

25 |
Convergence of Learning Algorithms with Constant Learning Rates
- Kuan, Hornik
- 1991
(Show Context)
Citation Context ...omponents in the system. In fact, because of their ease of use, such constant gains are sometimes applied in SA (or SA-type) algorithms even when (see, e.g., Kushner and Huang [19] or Kuan and Hornik =-=[18]-=-), although it is known that they preclude the formal a.s. convergence of decaying gain algorithms. IV. EMPIRICAL STUDIES This section presents the results of numerical studies on two different nonlin... |

24 |
Constrained optimization via stochastic approximation with a simultaneous perturbation gradient approximation
- Sadegh
- 1997
(Show Context)
Citation Context ...we are trying to minimize some cost (without regard to a specific target value) and penalty functions or projections are used for certain values of and/or to reflect problem constraints (e.g., Sadegh =-=[41]-=-). For convenience, however, the remainder of the paper will illustrate points with the targettracking problem exemplified by (2). Note that although (2) is a one-time-step error function, much of the... |

21 | Weighted means in stochastic approximation for minima
- Dippon, Renz
- 1997
(Show Context)
Citation Context ...nimum variance estimates in the general Robbins–Monro SA setting with nontime-varying loss function (Polyak and Juditsky [37]) and to offer improved performance in some SPSA settings (Dippon and Renz =-=[10]-=- and Maryak [27]). By its nature, of course, averaging seems most appropriate for systems that have stationary—or perhaps asymptotically very slowly time-varying—dynamics (e.g., the case of the propos... |

21 | Self-organizing control of stochastic systems,Marcel - Saridis |

20 |
Dynamic structure neural networks for stable adaptive control of nonlinear systems
- Fabri, Kadirkamanathan
- 1996
(Show Context)
Citation Context ...re currently almost no practically useful stability results for adaptive nonlinear systems.) Sanner and Slotine [43], Levin and Narendra [23], [24], Jagannathan et al. [16], Fabri and Kadirkamanathan =-=[13]-=-, and Ahmed and Anjum [1] are examples of approaches that rely on controller FA’s but introduce stronger modeling assumptions (e.g., deterministic systems or specific knowledge of how the controller e... |

20 | Human control strategy: abstraction, veri®cation, and replication
- NECHYBA, XU
- 1997
(Show Context)
Citation Context ...cations of the simultaneous perturbation optimization method in control are given in Maeda and De Figueiredo [26] (robotics), Koch et al. [17] (integrated transit/traffic control), and Nechyba and Xu =-=[32]-=- (human-machine interface control). Note that the general convergence result presented here is relevant to most of these applications. The remainder of this paper is organized as follows. Section II d... |

16 |
Approximation methods which converge with probability one
- Blum
- 1954
(Show Context)
Citation Context ...nce, we will for ease of notation replace the “lim sup” with “lim” without loss of generality. We will show that the event has probability zero in a multivariate extension to scalar arguments in Blum =-=[5]-=- and Evans and Weber [12]. Furthermore, suppose that the limiting quantity of the unbounded elements in is (trivial modifications cover a limiting quantity including limits). For and as in C5), the ev... |

16 | Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation
- Sadegh, Spall
- 1997
(Show Context)
Citation Context ...mly or normally distributed. However, taking as symmetrically Bernoulli-distributed to satisfy this condition has proven effective in our numerical studies, and, in fact, is shown in Sadegh and Spall =-=[42]-=- to be the optimal choice of distribution in static optimizationsSPALL AND CRISTION: MODEL-FREE CONTROL OF NONLINEAR STOCHASTIC SYSTEMS 1205 problems based on asymptotic principles. C3) ensures that i... |

15 | Back-propagation neural networks for nonlinear self-tuning adaptive control
- Chen
- 1990
(Show Context)
Citation Context ...imated by the FA. As demonstrated in Section IV, a very important type of process to which this second method can apply is an affine-nonlinear (i.e., generalized bilinear) system such as that in Chen =-=[7]-=- and Dochain and Bastin [11]. As we will see in Section IV, when reliable prior information is available, the self-tuning method of Fig. 1(b) may yield a controller superior to the direct approximatio... |

15 | Uniqueness of the weights for minimal feedforward nets with a given input-output map - Sussman - 1992 |

14 |
Adaptive identification and control algorithms for nonlinear bacterial growth systems
- Dochain, Bastin
- 1984
(Show Context)
Citation Context ...trated in Section IV, a very important type of process to which this second method can apply is an affine-nonlinear (i.e., generalized bilinear) system such as that in Chen [7] and Dochain and Bastin =-=[11]-=-. As we will see in Section IV, when reliable prior information is available, the self-tuning method of Fig. 1(b) may yield a controller superior to the direct approximation method of Fig. 1(a). B. Fo... |

14 |
A Newton-Raphson version of the multivariate Robbins-Monro procedure. Annals of Statistics 13:236–245
- Ruppert
- 1985
(Show Context)
Citation Context ...on "�� @1A can be replaced by the true gradient. Then condition C3) resembles the well-known “steepness” condition in stochastic approximation derived from Lyapunov theory (e.g., Lai [21] and Ruppert =-=[40]-=-). Two of the special cases are when the onemeasurement form (6) is used for ”�� @1A or when the system can be “reset” when the two-measurement form (5) is used (i.e., the system can be placed at the ... |

14 |
Traffic-Responsive Signal Timing for system-wide traffic control
- Spall, Chin
- 1997
(Show Context)
Citation Context .... The same basic ideas can be applied when is changed less frequently (say, to allow transient effects to decay). An example of such a setting is the vehicle traffic control problem in Spall and Chin =-=[50]-=- where (i.e., the control function) is changed on a daily basis even though trafficresponsive control actions (the control function outputs) are changed much more frequently (perhaps minute by minute)... |

13 |
A More Efficient Global Optimization Algorithm Based on
- Chin
- 1994
(Show Context)
Citation Context ...Kushner and Huang 3 This condition does not preclude ” � from converging to a local minimizing point 3 (the same issue arises, of course, in gradient search algorithms such as back-propagation); Chin =-=[9]-=- discusses a technique by which SPSA can be used as a global optimizer for arbitrary initial conditions. An alternate global optimizing approach that seems likely to apply here is described in Yakowit... |

11 |
Cristion, “A neural network controller for systems with unmodeled dynamics with applications to wastewater treatment
- Spall, A
- 1997
(Show Context)
Citation Context ...l case of the control approach here—focusing on the “direct approximator” method (see Section II below), perfect state measurements, and a neural network as the FA—is considered in Spall and Cristion =-=[52]-=-. Some applications of the simultaneous perturbation optimization method in control are given in Maeda and De Figueiredo [26] (robotics), Koch et al. [17] (integrated transit/traffic control), and Nec... |

10 |
A measure of the tracking capability of recursive stochastic algorithms with constant gains
- Benveniste, Ruget
- 1982
(Show Context)
Citation Context ...ating � @CA � and � @0A � , as, say, with some robotic systems). In these special cases "�� @ ” � A equals �� @ ” � A plus an y@™P A � bias that can be absorbed into � @&A. [19], Benveniste and Ruget =-=[3]-=-, Macchi and Eweda [25], or Benveniste et al. [4, pp. 120–164]), which is relevant when is nonconvergent. This is likely to occur, say, when the process or measurement dynamics are perpetually time-va... |

10 |
Neural-net computing and intelligent control systems
- Pao, Phillips, et al.
- 1992
(Show Context)
Citation Context ...put–output data on the system prior to operating the system in closed-loop and constructing the controller FA (with neural networks as the FA’s; see, e.g., Narendra and Parthasarathy [31], Pao et al. =-=[34]-=-, or SartorisSPALL AND CRISTION: MODEL-FREE CONTROL OF NONLINEAR STOCHASTIC SYSTEMS 1199 and Antsaklis [45]). In contrast, the direct control approach here does not require any open-loop system identi... |

10 |
Neural approximations for multistage optimal control of nonlinear stochastic systems
- Parisini, Zoppoli
- 1996
(Show Context)
Citation Context ...tion available up to time for each ). Suppose our “sliding window” of previous information available at time , say , contains previous measurements and previous controls; akin to Parisini and Zoppoli =-=[35]-=-, the choice of and reflects a tradeoff between carrying along a large quantity of potentially relevant information and the corresponding requirements for a more complex FA. Thus, when the system is o... |

10 |
Nonlinear Adaptive Control Using Neural Networks: Estimation Based on a Smoothed Form of Simultaneous Perturbation Gradient Approximation,” Stat
- Spall, Cristion
- 1994
(Show Context)
Citation Context ...of the previous and current gradient estimates (analogous to the “momentum” approach in backpropagation); such smoothing can sometimes improve the performance of the algorithm (see Spall and Cristion =-=[51]-=- for a thorough discussion of smoothing in SPSA-based direct adaptive control). A slightly more fundamental modification is to replace the two-measurement gradient approximation in (5) with the onemea... |

9 |
De Figueiredo, “Learning rules for neuro-controller via simultaneous perturbation
- Maeda, P
- 1997
(Show Context)
Citation Context ...ements, and a neural network as the FA—is considered in Spall and Cristion [52]. Some applications of the simultaneous perturbation optimization method in control are given in Maeda and De Figueiredo =-=[26]-=- (robotics), Koch et al. [17] (integrated transit/traffic control), and Nechyba and Xu [32] (human-machine interface control). Note that the general convergence result presented here is relevant to mo... |

8 |
Asymptotic properties of stochastic approximations with constant coefficients
- Kushner, Huang
- 1981
(Show Context)
Citation Context ...e same state prior to generating � @CA � and � @0A � , as, say, with some robotic systems). In these special cases "�� @ ” � A equals �� @ ” � A plus an y@™P A � bias that can be absorbed into � @&A. =-=[19]-=-, Benveniste and Ruget [3], Macchi and Eweda [25], or Benveniste et al. [4, pp. 120–164]), which is relevant when is nonconvergent. This is likely to occur, say, when the process or measurement dynami... |

8 |
Stochastic approximation with averaging and feedback: rapidly convergent \on line" algorithms, and applications to adaptive systems
- Kushner, Yang
- 1992
(Show Context)
Citation Context ... folding in too many relatively poor values. A similar result is discussed in Wang [54, p. 37] and Maryak [27]. The numerical results here are in contrast to the numerical results of Kushner and Yang =-=[20]-=-, where it is shown that the averaging scheme yields significant improvements in a Robbins–Monro (noncontrol) setting. We expect that in certain other control problems, this type of averaging may be m... |

8 | Analysis of Stochastic Approximation and Related Algorithms - Wang - 1996 |

8 |
A Globally Convergent Stochastic Approximation
- Yakowitz
- 1993
(Show Context)
Citation Context ...gradient descent). A number of techniques have been proposed to accelerate the convergence of SA algorithms or to enhance convergence to a global minimum (see, e.g., Spall [48], Chin [9], or Yakowitz =-=[55]-=-), and it would be of interest to explore the applicability of such techniques to SPSA in a control context. Constraints are usually handled on a problem-dependent basis (such as the wastewater exampl... |

7 |
Discrete-time model reference adaptive control of nonlinear dynamical systems using neural networks
- Jagannathan, Lewis, et al.
- 1996
(Show Context)
Citation Context ...c discrete-time setting, there are currently almost no practically useful stability results for adaptive nonlinear systems.) Sanner and Slotine [43], Levin and Narendra [23], [24], Jagannathan et al. =-=[16]-=-, Fabri and Kadirkamanathan [13], and Ahmed and Anjum [1] are examples of approaches that rely on controller FA’s but introduce stronger modeling assumptions (e.g., deterministic systems or specific k... |

6 |
Stochastic approximation and sequential search for optimum
- Lai
- 1985
(Show Context)
Citation Context ...itional expectation "�� @1A can be replaced by the true gradient. Then condition C3) resembles the well-known “steepness” condition in stochastic approximation derived from Lyapunov theory (e.g., Lai =-=[21]-=- and Ruppert [40]). Two of the special cases are when the onemeasurement form (6) is used for ”�� @1A or when the system can be “reset” when the two-measurement form (5) is used (i.e., the system can ... |

5 |
A forward method for optimal stochastic nonlinear and adaptive control
- Bayard
- 1991
(Show Context)
Citation Context ...erates in closed-loop. Usually such algorithms rely on well-known finite-difference approximations to the gradient (for examples of such algorithms in control, see Saridis [44, pp. 375–376] or Bayard =-=[2]-=-). The finite-difference approach, however, can be very costly in terms of the number of system measurements required, especially in high-dimensional problems such as estimating an FA parameter vector... |

5 |
Stability analysis of hybrid composite dynamical systems: descriptions involving operators and di erence equations
- Mousa, Miller, et al.
- 1986
(Show Context)
Citation Context ...amework. Further, it seems that very little work has been done on such issues for general nonlinear, stochastic, discretetime systems (although for deterministic systems, one may consult Mousa et al. =-=[30]-=-, Nijmeijer and van der Schaft [33, Ch. 14], or references mentioned in Section I). Essentially, we feel that stability analysis should not be a necessary aspect in building all controllers since that... |

4 |
Network-wide approach to optimal signal light timing for integrated transit vehicle and traffic operations
- Koch, Chin, et al.
- 1997
(Show Context)
Citation Context ...as the FA—is considered in Spall and Cristion [52]. Some applications of the simultaneous perturbation optimization method in control are given in Maeda and De Figueiredo [26] (robotics), Koch et al. =-=[17]-=- (integrated transit/traffic control), and Nechyba and Xu [32] (human-machine interface control). Note that the general convergence result presented here is relevant to most of these applications. The... |

4 |
Some guidelines for using iterate averaging in stochastic approximation
- Maryak
- 1997
(Show Context)
Citation Context ...stimates in the general Robbins–Monro SA setting with nontime-varying loss function (Polyak and Juditsky [37]) and to offer improved performance in some SPSA settings (Dippon and Renz [10] and Maryak =-=[27]-=-). By its nature, of course, averaging seems most appropriate for systems that have stationary—or perhaps asymptotically very slowly time-varying—dynamics (e.g., the case of the proposition). In our p... |

4 |
Implementation of Learning Control Systems Using Neural Networks
- Sartori, J
- 1992
(Show Context)
Citation Context ...FA (with neural networks as the FA’s; see, e.g., Narendra and Parthasarathy [31], Pao et al. [34], or SartorisSPALL AND CRISTION: MODEL-FREE CONTROL OF NONLINEAR STOCHASTIC SYSTEMS 1199 and Antsaklis =-=[45]-=-). In contrast, the direct control approach here does not require any open-loop system identification, instead constructing the one (controller) FA while the system is operating in closed-loop. Thus w... |

3 |
On the almost sure convergence of a general stochastic approximation procedure
- Evans, Weber
- 1986
(Show Context)
Citation Context ... notation replace the “lim sup” with “lim” without loss of generality. We will show that the event has probability zero in a multivariate extension to scalar arguments in Blum [5] and Evans and Weber =-=[12]-=-. Furthermore, suppose that the limiting quantity of the unbounded elements in is (trivial modifications cover a limiting quantity including limits). For and as in C5), the event of interest can be re... |

3 |
Second order convergence analysis of stochastic adaptive linear filtering
- Macchi, Eweda
- 1983
(Show Context)
Citation Context ...A � , as, say, with some robotic systems). In these special cases "�� @ ” � A equals �� @ ” � A plus an y@™P A � bias that can be absorbed into � @&A. [19], Benveniste and Ruget [3], Macchi and Eweda =-=[25]-=-, or Benveniste et al. [4, pp. 120–164]), which is relevant when is nonconvergent. This is likely to occur, say, when the process or measurement dynamics are perpetually time-varying due to cyclic beh... |

3 |
One–Measurement Form of Simultaneous Perturbation Stochastic Approximation
- “A
- 1997
(Show Context)
Citation Context ...ing in SPSA-based direct adaptive control). A slightly more fundamental modification is to replace the two-measurement gradient approximation in (5) with the onemeasurement form as discussed in Spall =-=[47]-=-. Although [47] shows that (5) remains generally preferable to (6) in terms of overall efficiency of optimization based on loss function measurements (even though (5) uses twice the number of measurem... |

2 |
Anjum, “Neural-net-based self-tuning control of nonlinear plants
- Ahmed, F
- 1997
(Show Context)
Citation Context ...ctically useful stability results for adaptive nonlinear systems.) Sanner and Slotine [43], Levin and Narendra [23], [24], Jagannathan et al. [16], Fabri and Kadirkamanathan [13], and Ahmed and Anjum =-=[1]-=- are examples of approaches that rely on controller FA’s but introduce stronger modeling assumptions (e.g., deterministic systems or specific knowledge of how the controller enters the system dynamics... |

2 |
Theory and development of higher order CMAC neural networks
- Lane, Handelman, et al.
- 1992
(Show Context)
Citation Context ...that the desired level of accuracy with a given FA is being achieved). Each FA technique tends to have advantages and disadvantages, some of which are discussed in Poggio and Girosi [36], Lane et al. =-=[22]-=-, and Chen and Chen [8] (e.g., polynomials have a relatively easy physical interpretability, but the number of parameters to be determined grows rapidly with input dimension or polynomial order). Sinc... |

2 |
Kiefer-Wolfowitz procedure,” Encyclopedia of Statistical Science
- Ruppert
- 1983
(Show Context)
Citation Context ...asurements are needed (i.e., for the th component of the gradient approximation, the quantity is replaced by a vector with a positive constant in the th place and zeroes elsewhere; see, e.g., Ruppert =-=[39]-=-). A variation on the gradient approximation in (5) is to average several gradient approximations, with each vector in the average being based on a new (independent) value of and a corresponding new p... |

2 |
second-order stochastic optimization using only function measurements
- “Accelerated
- 1997
(Show Context)
Citation Context ..., including, e.g., standard gradient descent). A number of techniques have been proposed to accelerate the convergence of SA algorithms or to enhance convergence to a global minimum (see, e.g., Spall =-=[48]-=-, Chin [9], or Yakowitz [55]), and it would be of interest to explore the applicability of such techniques to SPSA in a control context. Constraints are usually handled on a problem-dependent basis (s... |