## Nonlinear Black-Box Modeling in System Identification: a Unified Overview (1995)

Venue: | Automatica |

Citations: | 161 - 17 self |

### BibTeX

@ARTICLE{Sjöberg95nonlinearblack-box,

author = {Jonas Sjöberg and Qinghua Zhang and Lennart Ljung and Albert Benveniste and Bernard Deylon and Pierre-yves Glorennec and Hakan Hjalmarsson and Anatoli Juditsky},

title = {Nonlinear Black-Box Modeling in System Identification: a Unified Overview},

journal = {Automatica},

year = {1995},

volume = {31},

pages = {1691--1724}

}

### Years of Citing Articles

### OpenURL

### Abstract

A nonlinear black box structure for a dynamical system is a model structure that is prepared to describe virtually any nonlinear dynamics. There has been considerable recent interest in this area with structures based on neural networks, radial basis networks, wavelet networks, hinging hyperplanes, as well as wavelet transform based methods and models based on fuzzy sets and fuzzy rules. This paper describes all these approaches in a common framework, from a user's perspective. It focuses on what are the common features in the different approaches, the choices that have to be made and what considerations are relevant for a successful system identification application of these techniques. It is pointed out that the nonlinear structures can be seen as a concatenation of a mapping from observed data to a regression vector and a nonlinear mapping from the regressor space to the output space. These mappings are discussed separately. The latter mapping is usually formed as a basis function e...

### Citations

9132 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...ite Gaussian noise. Entropy Interpretation When probabilities are being estimated, e.g., in classification problems, then it is common to choose a criterion based on the relative entropy. See, e.g., (=-=Cover and Thomas, 1991-=-). This gives the maximum likelihood estimate of the probability (Baum and Wilczek, 1988). The relative entropy is defined Entropy = p(') \Delta log ` p(') b p(') ' (51) where b p(') = b p('; `) and p... |

4208 |
Neural Networks a comprehensive foundation
- Haykin
- 1999
(Show Context)
Citation Context ...on, with kernel methods and nearest neighbor-techniques. There is also a rich literature on the subject. Among many general treatments we may refer to books on neural networks, such as (Kung, 1993), (=-=Haykin, 1994-=-), to books on Fuzzy models, like (Brown and Harris, 1994; Wang, 1994) to books and surveys on non-parametric regression and density estimation, like (Stone, 1982), (Silverman, 1986), and (Devroye and... |

2563 |
Density Estimation for Statistics and Data Analysis
- Silverman
- 1986
(Show Context)
Citation Context ...uch as (Kung, 1993), (Haykin, 1994), to books on Fuzzy models, like (Brown and Harris, 1994; Wang, 1994) to books and surveys on non-parametric regression and density estimation, like (Stone, 1982), (=-=Silverman, 1986-=-), and (Devroye and Gyorfi, 1985), and to background material on wavelets and multi-resolution techniques, like (Daubechies, 1992; Chui, 1992; Ruskai et al., 1992; Meyer, 1990) Organization of this pa... |

1431 |
System identification: Theory for the user
- LJUNG
- 1980
(Show Context)
Citation Context ...s used in practice are all variants of (9), using different ways of picking up "poles" of the system and different ways of describing the noise characteristics. The common models used can al=-=l, as in (Ljung, 1987-=-), be summarized by the general family A(q)y(t) = B(q) F (q) u(t) + C(q) D(q) e(t) (10) The special cases of (10) are known as the Box-Jenkins (BJ) model (A = 1), the ARMAX model (F = D = 1), the Outp... |

1133 | Matching Pursuits with time-frequence dictionaries
- Mallat, Zhang
- 1993
(Show Context)
Citation Context ...lassical regression analysis, this method is referred to as stagewise regression procedure. See, for example, (Draper and Smith, 1981). Recently it has been used in the matching pursuit algorithm of (=-=Mallat and Zhang, 1993-=-) and the adaptive signal representation of (Qian and Chen, 1994). Stepwise selection by orthogonalization (SSO). The RBS method does not explicitly consider the non-orthogonality of the basis functio... |

1029 |
Numerical Methods for Unconstrained Optimization and Nonlinear Equations
- Dennis, Schnabel
- 1983
(Show Context)
Citation Context ...k of V 00 N as constructed by difference approximation of d gradients. The direction (56) is however constructed directly, without explicitly forming and inverting V 00 . It is generally considered, (=-=Dennis and Schnabel, 1983-=-), that the Gauss-Newton search direction is to be preferred. For ill-conditioned problems the Levenberg-Marquardt modification is recommended. The ideal step sizesin (53), would bes= 1, if the underl... |

1002 |
Fuzzy Identification of Systems and Its Applications to Modeling and Control
- Takagi, Sugeno
- 1985
(Show Context)
Citation Context ...what is the add-on provided by fuzzy modeling. We first introduce fuzzy models such as typically used in fuzzy control (Lee, 1990). Several presentations are possible, see for instance (Zadeh, 1994) (=-=Takagi and Sugeno, 1985-=-) (Sugeno and Yasukawa, 1993). The presentation we give here is slightly heterodox, but is simple and consistent. 33 9.1 Introduction to fuzzy logic Fuzzy sets Consider scalar input variables generica... |

913 |
Approximation by superposition of a sigmoidal function
- Cybenko
- 1989
(Show Context)
Citation Context ...he question of how many layers to use is however not easy. In principle, with many basis functions, one hidden layer is sufficient for modeling most practically reasonable systems. See, for example, (=-=Cybenko, 1989-=-; Barron, 1993). (Sontag, 1993) contains many useful and interesting insights into the importance of second hidden layers in the nonlinear structure. Recurrent Networks Another very important concept ... |

829 |
Learning Representations by Back-Propagating Error
- Rumelhart, Hinton, et al.
- 1986
(Show Context)
Citation Context ...nnection with neural networks the celebrated Back-Propagation Error algorithm 3 (BP) is used to compute this gradient. Backpropagation has been described in several contexts, see e.g., (Werbos, 1974; =-=Rumelhart et al., 1986-=-). For a one-hidden-layer sigmoid neural network, ((27)) it is straightforward to compute the gradient, since (omitting the subscript k) d dff ffg(fi' + fl) = g(fi' + fl) d dfl ffg(fi' + fl) = ffg 0 (... |

731 |
Applied regression analysis
- Draper, Smith
- 1998
(Show Context)
Citation Context ... the basis function set G is chosen. Now the problem is, given a set of estimation data as defined in (33), how to select n basis functions from G. This is a classical problem in regression analysis (=-=Draper and Smith, 1981-=-). For a given value of n, selecting n optimal basis functions could be in principle performed via exhaustive search that would consist in examining all the possible combinations of n basis functions ... |

694 |
Networks for approximation and learning
- Poggio, Girosi
- 1990
(Show Context)
Citation Context ...n combination with the radial construction for multi-variable case (25), without any orthogonalization is found in both wavelet networks (Zhang and Benveniste, 1992) and radial basis neural networks (=-=Poggio and Girosi, 1990-=-). Kernel estimators. Another well known example for use of local basis functions is Kernel estimators (Nadaraya, 1964; Watson, 1969). A kernel function 2 (\Delta) is typically a bell-shaped function,... |

574 | Bayesian interpolation
- Mackay
- 1992
(Show Context)
Citation Context ...changing the beneficial effects on the variance error. This penalty term corresponds to a prior Gaussian distribution for the parameters, viz that they have mean ` # and covariance matrix 2=ffiI. In (=-=MacKay, 1992-=-) a Bayesian approach is introduced where the parameters may belong to different Gaussian distributions. This means that the spurious parameters can be excluded from the fit by associating them with a... |

562 |
Beyond regression: new tools for predictions and analysis in the behavioral science
- Werbos
- 1974
(Show Context)
Citation Context ...re (55). In connection with neural networks the celebrated Back-Propagation Error algorithm 3 (BP) is used to compute this gradient. Backpropagation has been described in several contexts, see e.g., (=-=Werbos, 1974-=-; Rumelhart et al., 1986). For a one-hidden-layer sigmoid neural network, ((27)) it is straightforward to compute the gradient, since (omitting the subscript k) d dff ffg(fi' + fl) = g(fi' + fl) d dfl... |

548 |
Identification and control of dynamical system using neural networks
- Narendra, Parthasarathy
- 1990
(Show Context)
Citation Context ...EAR STATE-SPACE models, which use past components of virtual outputs, i.e., signal values at internal nodes of the network (see e.g. Figure 3 below) that do not correspond to the output variable. In (=-=Narendra and Parthasarathy, 1990-=-) another notation is used for the same models when used in conjunction with neural networks. The NARX model is called Series-Parallel model and the NOE is called Parallel model. The model structures ... |

494 |
Multiresolution Approximations and Wavelet Orthonormal Bases of L2(R
- Mallat
- 1989
(Show Context)
Citation Context ...n 17 vector to the output, (Juditsky et al., 1995) contains extensive discussions. Here we just mention some examples. It is well known that orthonormal wavelets form orthonormal basis of L 2 (R d ) (=-=Mallat, 1989-=-; Daubechies, 1992). Several authors have shown that one-hidden-layer sigmoid network can approximate any continuous functions with an arbitrary accuracy, provided the number of basis functions used i... |

439 | Projection pursuit regression
- Friedman, Stuetzel
- 1981
(Show Context)
Citation Context ...rojection directions which would show clear data patters 49 in the projected picture. These directions are chosen as the global ones. The approach thus has clear connections with projection pursuit, (=-=Friedman and Stuetzel, 1981-=-). The advantage is that higher regression-vector dimensions can be handled, by extrapolation into unsupported data regions. Whether this is reasonable or not, depends of course on the application. Ex... |

427 |
Theory and Practice of Recursive Identification
- Ljung, Söderström
- 1986
(Show Context)
Citation Context ...l (F = D = 1), the Output-Error (OE) model (A = C = D = 1) and the ARX model (F = C = D = 1). The predictor associated with (10) can be given in "pseudo-linear" regression form as (see eq (3=-=.114) in (Ljung and Soderstrom, 1983-=-)) b y(tj`) = ` T '(t; `) (11) The regressors, i.e., the components of '(t; `) are in this general case given by 1. u(t \Gamma k) (associated with the B-polynomial) 2. y(t \Gamma k) (associated with t... |

397 |
Universal Approximation Bounds for Superpositions of a Sigmoidal Function
- Barron
- 1993
(Show Context)
Citation Context ...how many layers to use is however not easy. In principle, with many basis functions, one hidden layer is sufficient for modeling most practically reasonable systems. See, for example, (Cybenko, 1989; =-=Barron, 1993-=-). (Sontag, 1993) contains many useful and interesting insights into the importance of second hidden layers in the nonlinear structure. Recurrent Networks Another very important concept for applicatio... |

312 |
Orthogonal least squares learning algorithm for radial basis function networks
- Chen, Cowan, et al.
- 1991
(Show Context)
Citation Context ...ency, later selected basis functions are orthogonalized to earlier selected ones. It has been used in radial basis function (RBF) networks and other nonlinear modeling problems in (Chen et al., 1989; =-=Chen et al., 1991-=-). Backward elimination (BE). In contrast to the previous two methods, the backward elimination method starts by building the model using all the basis functions in G, then eliminates one basis functi... |

297 |
Regularization algorithms for learning that are equivalent to multilayer networks
- Poggio, Girosi
- 1990
(Show Context)
Citation Context ...n combination with the radial construction for multi-variable case (25), without any orthogonalization is found in both wavelet networks (Zhang and Benveniste, 1992) and radial basis neural networks (=-=Poggio and Girosi, 1990-=-). Kernel estimators. Another well known example for use of local basis functions is Kernel estimators (Nadaraya, 1964; Watson, 1969). A kernel function 2 (\Delta) is typically a bell-shaped function,... |

280 |
Ondelettes et opérateurs
- Meyer
- 1990
(Show Context)
Citation Context ...like (Stone, 1982), (Silverman, 1986), and (Devroye and Gyorfi, 1985), and to background material on wavelets and multi-resolution techniques, like (Daubechies, 1992; Chui, 1992; Ruskai et al., 1992; =-=Meyer, 1990-=-) Organization of this paper This paper will take the position of a practical user of nonlinear black-box models, describe what are the essential features of the available approaches, and discuss the ... |

265 |
Optimal global rates of convergence for nonparametric regression
- Stone
- 1982
(Show Context)
Citation Context ...ral networks, such as (Kung, 1993), (Haykin, 1994), to books on Fuzzy models, like (Brown and Harris, 1994; Wang, 1994) to books and surveys on non-parametric regression and density estimation, like (=-=Stone, 1982-=-), (Silverman, 1986), and (Devroye and Gyorfi, 1985), and to background material on wavelets and multi-resolution techniques, like (Daubechies, 1992; Chui, 1992; Ruskai et al., 1992; Meyer, 1990) Orga... |

236 |
Smooth regression analysis
- Watson
- 1964
(Show Context)
Citation Context ... and Benveniste, 1992) and radial basis neural networks (Poggio and Girosi, 1990). Kernel estimators. Another well known example for use of local basis functions is Kernel estimators (Nadaraya, 1964; =-=Watson, 1969-=-). A kernel function 2 (\Delta) is typically a bell-shaped function, and the kernel estimator has the form g(') = n X k=1 ff ks` ' \Gamma fl k h ' (29) where h is a small positive number, fl k are giv... |

186 |
Adaptive Fuzzy Systems and Control: Design and Stability Analysis
- Wang
- 1994
(Show Context)
Citation Context ...a rich literature on the subject. Among many general treatments we may refer to books on neural networks, such as (Kung, 1993), (Haykin, 1994), to books on Fuzzy models, like (Brown and Harris, 1994; =-=Wang, 1994-=-) to books and surveys on non-parametric regression and density estimation, like (Stone, 1982), (Silverman, 1986), and (Devroye and Gyorfi, 1985), and to background material on wavelets and multi-reso... |

182 |
Orthogonal least squares methods and their application to non-linear system identification
- Chen, Billings, et al.
- 1989
(Show Context)
Citation Context ...omputational efficiency, later selected basis functions are orthogonalized to earlier selected ones. It has been used in radial basis function (RBF) networks and other nonlinear modeling problems in (=-=Chen et al., 1989-=-; Chen et al., 1991). Backward elimination (BE). In contrast to the previous two methods, the backward elimination method starts by building the model using all the basis functions in G, then eliminat... |

181 | A Course in Density Estimation
- DEVROYE, L
- 1987
(Show Context)
Citation Context ...aykin, 1994), to books on Fuzzy models, like (Brown and Harris, 1994; Wang, 1994) to books and surveys on non-parametric regression and density estimation, like (Stone, 1982), (Silverman, 1986), and (=-=Devroye and Gyorfi, 1985-=-), and to background material on wavelets and multi-resolution techniques, like (Daubechies, 1992; Chui, 1992; Ruskai et al., 1992; Meyer, 1990) Organization of this paper This paper will take the pos... |

178 | The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems
- Moody
- 1992
(Show Context)
Citation Context ...r that is important for the model fit will however not be very much affected by the second term. Suppose we minimize (45) instead of (34), Then it can be shown, see e.g. (Sjoberg and Ljung, 1992) or (=-=Moody, 1992-=-), that (42) will still hold, with the important change that the number m is reduced to r(m; ffi) = m X k=1 oe 2 i (oe i + ffi) 2 (46) where oe i are the eigenvalues (singular values) of V 00 (`; ), t... |

169 |
Neurofuzzy Adaptive Modelling and Control
- Brown, Harris
- 1994
(Show Context)
Citation Context ...chniques. There is also a rich literature on the subject. Among many general treatments we may refer to books on neural networks, such as (Kung, 1993), (Haykin, 1994), to books on Fuzzy models, like (=-=Brown and Harris, 1994-=-; Wang, 1994) to books and surveys on non-parametric regression and density estimation, like (Stone, 1982), (Silverman, 1986), and (Devroye and Gyorfi, 1985), and to background material on wavelets an... |

160 | Bayesian methods for Adaptive Models
- MacKay
- 1992
(Show Context)
Citation Context ... connected to a small prior, receive only a small bias. The additional Gaussian distributions describing the parameters can be estimated together with all other parameters. This is also described in (=-=MacKay, 1991-=-). Regularization can also be used to include prior knowledge in the black-box model. Instead of penalizing the size of the parameters as in (47) one can add a complexity term which penalizes the dist... |

157 |
Nonlinear regulation: The piecewise linear approach
- Sontag
- 1981
(Show Context)
Citation Context ...ion as '. Hence, the hinging hyperplane model is a ridge constructions with an additional linear term. Using hinge functions as basic functions yields the kind of piecewise linear model, proposed by (=-=Sontag, 1981-=-). Projection pursuit regression. Another example of ridge type basis functions is the projection pursuit regression (Huber, 1985; Friedman and Stuetzle, 1981) having the form g('; `) = X k ff k g k (... |

116 |
Fuzzy logic, neural networks, and soft computing
- Zadeh
- 1994
(Show Context)
Citation Context ...uss in detail what is the add-on provided by fuzzy modeling. We first introduce fuzzy models such as typically used in fuzzy control (Lee, 1990). Several presentations are possible, see for instance (=-=Zadeh, 1994-=-) (Takagi and Sugeno, 1985) (Sugeno and Yasukawa, 1993). The presentation we give here is slightly heterodox, but is simple and consistent. 33 9.1 Introduction to fuzzy logic Fuzzy sets Consider scala... |

109 |
Wavelets: A Tutorial in Theory and Applications
- Chui
- 1992
(Show Context)
Citation Context ...gression and density estimation, like (Stone, 1982), (Silverman, 1986), and (Devroye and Gyorfi, 1985), and to background material on wavelets and multi-resolution techniques, like (Daubechies, 1992; =-=Chui, 1992-=-; Ruskai et al., 1992; Meyer, 1990) Organization of this paper This paper will take the position of a practical user of nonlinear black-box models, describe what are the essential features of the avai... |

107 |
Fuzzy systems are universal approximators
- Wang
- 1992
(Show Context)
Citation Context ... at whichsB j reaches its maximum value, and the definition of the weight functions w j (') is obvious. If property (88) does not hold, then the above defuzzification formula is modified accordingly (=-=Wang, 1992-=-) : y = g(') = P p j=1 y j w j (') P p j=1 w j (') : (91) A rule basis may be directly built with crisp conclusions, i.e., B j are ordinary values in (86). In this case no defuzzification is needed. 9... |

102 |
Hinging hyperplanes for regression, classification, and function approximation
- Breiman
- 1993
(Show Context)
Citation Context ...a ridge basis function (27) and the sigmoid choice (24) for mother function, gives the celebrated one hidden layer feed-forward sigmoid neural net. Hinging Hyperplanes. The hinging hyperplanes model (=-=Breiman, 1993) is -=-closely related to the neural network, and corresponds to the choice of the hinge function rather than the sigmoid, for the mother basis function . The hinge function has the form of an "open boo... |

102 |
System identification using Laguerre models
- Wahlberg
- 1991
(Show Context)
Citation Context ...L k (q)u(t); k = 1; : : : ; d rather than u(t \Gamma k), where the filters L k are tailored to the application. Laguerre and Kautz filters have been extensively discussed in these applications, e.g. (=-=Wahlberg, 1991-=-) and (Wahlberg, 1994). In (van den Hof et al., 1994) interesting generalizations of such regressor choices are described. 3.4 Some Other Structural Questions The actual way that the regressors are co... |

87 |
The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses
- WOLD, RunE, et al.
- 1984
(Show Context)
Citation Context ...across these subspaces. Partial Least Squares. The ridge basis function approaches have a connection, at least conceptually, to the Partial Least Squares (PLS) techniques, much used in Chemometrics, (=-=Wold et al., 1984-=-; Helland, 1990). PLS also employs techniques to select the most significant subspaces of a larger regressor space, so as to reduce the number of parameters to estimate. Fuzzy Models. Also the so-call... |

84 |
Projection pursuit (with discussion
- Huber
- 1985
(Show Context)
Citation Context ...sic functions yields the kind of piecewise linear model, proposed by (Sontag, 1981). Projection pursuit regression. Another example of ridge type basis functions is the projection pursuit regression (=-=Huber, 1985-=-; Friedman and Stuetzle, 1981) having the form g('; `) = X k ff k g k (fi k ' + fl k ) ; (30) where fi k are q by d matrices, ' 2 R d , d ? q, and g k : R q ! R are some smooth fitted functions. The c... |

73 |
Nonlinear system identification using neural networks
- Chen, Billings, et al.
(Show Context)
Citation Context ...ressors are replaced by the last computed b y u (t \Gamma kj`). Following the nomenclature for linear models it is natural to coin similar names for nonlinear models. This is well in line with, e.g. (=-=Chen et al., 1990-=-; Chen and Billings, 1992). We could thus distinguish between ffl NFIR-models, which use only u(t \Gamma k) as regressors ffl NARX-models, which use u(t \Gamma k) and y(t \Gamma k) as regressors ffl N... |

72 |
Pruning algorithms—a survey
- Reed
- 1993
(Show Context)
Citation Context ...r the case of wavelets, where most spectacular results are obtained. The equivalent of shrinking in connection with neural nets is called pruning and it has attracted much interest lately. See e.g., (=-=Reed, 1993-=-) for an overview and further references therein. In pruning, in difference from shrinking, also the dilation parameters are considered and possibly deleted. 7 Estimation Algorithms: Optimization Meth... |

70 |
Digital neural networks
- Kung
- 1993
(Show Context)
Citation Context ...nsity estimation, with kernel methods and nearest neighbor-techniques. There is also a rich literature on the subject. Among many general treatments we may refer to books on neural networks, such as (=-=Kung, 1993-=-), (Haykin, 1994), to books on Fuzzy models, like (Brown and Harris, 1994; Wang, 1994) to books and surveys on non-parametric regression and density estimation, like (Stone, 1982), (Silverman, 1986), ... |

66 | Using wavelet networks in nonparametric estimation
- Zhang
- 1997
(Show Context)
Citation Context ... are not orthogonal, in order to overcome the combinatorial complexity of the exhaustive search, three different heuristics are reviewed in the following, details of these algorithms can be found in (=-=Zhang, 1994-=-). The residual based selection (RBS). The idea of this method is to select, for the first stage, the basis function in G that best fits the estimation data, then repeatedly select the basis function ... |

58 |
System identification using Kautz models
- Wahlberg
- 1994
(Show Context)
Citation Context ... : : ; d rather than u(t \Gamma k), where the filters L k are tailored to the application. Laguerre and Kautz filters have been extensively discussed in these applications, e.g. (Wahlberg, 1991) and (=-=Wahlberg, 1994-=-). In (van den Hof et al., 1994) interesting generalizations of such regressor choices are described. 3.4 Some Other Structural Questions The actual way that the regressors are combined clearly reflec... |

54 |
Model of Dynamic Systems
- Ljung, Glad
- 1994
(Show Context)
Citation Context ...1994). Generally speaking, the numerical minimization of criteria of fit for identification purposes is a well established topic, and treated for general model structures, e.g., in (Ljung, 1987) and (=-=Ljung and Glad, 1994-=-). The general consensus is that one should use a damped Gauss-Newton algorithm with regularization features for ill conditioned Hessians -- all of this to be defined shortly -- in an off-line manner,... |

52 | Neural networks and nonlinear adaptive filtering: unifying concepts and new algorithms, Neural Computation 5
- Nerrand, Roussel-Ragot, et al.
- 1993
(Show Context)
Citation Context ...be possible to obtain a more efficient model with a smaller number of regressors by using a state-space model. State-space models in connection with neural nets are discussed in, e.g., (Rivals, 1995; =-=Nerrand et al., 1993-=-; Matthews, 1992). 5 3.2 Regressors for Nonlinear Black-Box Dynamical Models The described regressors give all the necessary freedom for the linear black-box case, and it is natural to use these also ... |

51 |
Neural networks for nonlinear dynamic system modeling and identification
- Chen, Billings
- 1992
(Show Context)
Citation Context ...d by the last computed b y u (t \Gamma kj`). Following the nomenclature for linear models it is natural to coin similar names for nonlinear models. This is well in line with, e.g. (Chen et al., 1990; =-=Chen and Billings, 1992-=-). We could thus distinguish between ffl NFIR-models, which use only u(t \Gamma k) as regressors ffl NARX-models, which use u(t \Gamma k) and y(t \Gamma k) as regressors ffl NOE-models, which use u(t ... |

49 |
Supervised learning of probability distributions by neural networks
- Baum, Wilczek
- 1988
(Show Context)
Citation Context ... in classification problems, then it is common to choose a criterion based on the relative entropy. See, e.g., (Cover and Thomas, 1991). This gives the maximum likelihood estimate of the probability (=-=Baum and Wilczek, 1988-=-). The relative entropy is defined Entropy = p(') \Delta log ` p(') b p(') ' (51) where b p(') = b p('; `) and p(') are the estimated and true probability for ' belonging to class C. The entropy is no... |

44 |
A Fuzzy Logic Based Approach to Qualitative Modelling
- Sugeno, Yasukawa
(Show Context)
Citation Context ...d by fuzzy modeling. We first introduce fuzzy models such as typically used in fuzzy control (Lee, 1990). Several presentations are possible, see for instance (Zadeh, 1994) (Takagi and Sugeno, 1985) (=-=Sugeno and Yasukawa, 1993-=-). The presentation we give here is slightly heterodox, but is simple and consistent. 33 9.1 Introduction to fuzzy logic Fuzzy sets Consider scalar input variables generically written as '. A fuzzy se... |

43 | Illconditioning in neural network training problems
- Saarinen, Bramley, et al.
- 1993
(Show Context)
Citation Context ... (combination) that is not so essential: "A spurious parameter". The regularization parameter ffi is thus a threshold for spurious parameters. Since the eigenvalues oe i often are widely spr=-=ead (see (Saarinen et al., 1993) for the -=-neural network case) we have r(m; ffi) ' m # = # of eigenvalues of V 00 that are larger than ffi 18 We can think of m # as "the efficient number of parameters in the parameterization". Regul... |

34 | Nonlinear black-box modeling in system identi Mathematical foundations
- Juditsky, Hjalmarsson, et al.
- 1995
(Show Context)
Citation Context ...s considerably less than the number of "offered" parameters, by regularization, shrinking, pruning or regressor selection. A more mathematically comprehensive treatment is given in a compani=-=on paper (Juditsky et al., 1995-=-). Keywords : Nonlinear System, Model Structures, Parameter estimation, Wavelets, Neural Networks, Fuzzy Modeling. 1 Introduction The key problem in system identification is to find a suitable model s... |

34 |
On estimating regression. Theory Prob
- Nadaraya
- 1964
(Show Context)
Citation Context ... networks (Zhang and Benveniste, 1992) and radial basis neural networks (Poggio and Girosi, 1990). Kernel estimators. Another well known example for use of local basis functions is Kernel estimators (=-=Nadaraya, 1964-=-; Watson, 1969). A kernel function 2 (\Delta) is typically a bell-shaped function, and the kernel estimator has the form g(') = n X k=1 ff ks` ' \Gamma fl k h ' (29) where h is a small positive number... |