### Citations

2383 | Generalized Additive Models
- Hastie, Tibshirani
- 1990
(Show Context)
Citation Context ...promise in complexity between multiple linear regression and a fully flexible nonlinear model such as an MLP, a high-order polynomial, or a tensor spline model. In a generalized additive model (GAM) (=-=Hastie and Tibshirani 1990-=-), a nonlinear transformation estimated by a nonparametric smoother is applied to each input, and these values are added together. The TRANSREG procedure fits nonlinear additive models using splines... |

2207 |
Clustering algorithms
- Hartigan
- 1995
(Show Context)
Citation Context ...more useful to define the net input as the Euclidean distance between the synaptic weights and the input values, in which case the competitive learning network is very similar to -means clustering (=-=Hartigan 1975-=-) except that the usual training algorithms are slow and nonconvergent. Many superior clustering algorithms have been developed in statistics, numerical taxonomy, and many other fields, as described i... |

2189 | Introduction to the theory of neural computation - Hertz, Krogh, et al. - 1991 |

2150 |
Applied Logistic Regression
- Hosmer, Lemeshow, et al.
- 2000
(Show Context)
Citation Context ...Values Target 1 2 Dependent Variables Figure 3: Simple Linear Perceptron = Multivariate Multiple Linear Regression A perceptron with a logistic activation function is a logistic regression model (=-=Hosmer and Lemeshow 1989-=-) as shown in Figure 4. Input 1 2 3 Independent Variables Output Predicted Value Target Dependent Variable Figure 4: Simple Nonlinear Perceptron = Logistic Regression A perceptron with a thres... |

1687 | Finite Mixture Models - McLachlan, Peel |

1060 | Cluster Analysis for Applications - Anderberg - 1973 |

938 |
Analysis of complex statistical variables into principal components
- Hotelling
- 1933
(Show Context)
Citation Context ...s are predicted by linear regression from the feature variables. Hence, as is wellknown from statistical theory, the optimal feature variables are the principal components of the dependent variables (=-=Hotelling 1933-=-; Jackson 1991; Jolliffe 1986; Rao 1964). There are many variations, such as Oja’s rule and Sanger’s rule, that are just inefficient algorithms for approximating principal components. The statistical ... |

885 | Statistical Analysis of Finite Mixture Distributions - Titterington, Smith, et al. - 1985 |

870 | Numerical taxonomy - Sneath, Sokal - 1973 |

644 |
Nonlinear Regression
- Seber, Wild
- 1989
(Show Context)
Citation Context ...IML R software. There is extensive statistical theory regarding nonlinear models (Bates and Watts 1988; Borowiak 1989; Cramer 1986; Edwards 1972; Gallant 1987; Gifi 1990; H a rdle 1990; Ross 1990; =-=Seber and Wild 1989-=-). Statistical software can be used to produce confidence intervals, prediction intervals, diagnostics, and various graphical displays, all of which rarely appear in the NN literature. Unsupervised Le... |

582 |
Discriminant Analysis and Statistical Pattern Recognition
- McLachlan
- 1992
(Show Context)
Citation Context ...Predicted Value Target Dependent Variable Figure 4: Simple Nonlinear Perceptron = Logistic Regression A perceptron with a threshold activation function is a linear discriminant function (Hand 1981; =-=McLachlan 1992-=-; Weiss and Kulikowski 1991). If there is only one output, it is also called an adaline, as shown in Figure 5. With multiple outputs, the threshold perceptron is a multiple discriminant function. Inst... |

551 |
Classical and modern regression with applications, (2nd edition).Boston
- Myers
- 1990
(Show Context)
Citation Context ...y attempting to minimize - 2 , where the summation is over all outputs and over the training set. A perceptron with a linear activation function is thus a linear regression model (Weisberg 1985; =-=Myers 1986-=-), possibly multiple or multivariate, as shown in Figure 3. Input 1 2 3 Independent Variables Output Predicted Values Target 1 2 Dependent Variables Figure 3: Simple Linear Perceptron = Mult... |

547 | Projection pursuit regression
- Friedman, Stuetzle
- 1981
(Show Context)
Citation Context ...odel that provides a useful alternative to polynomial regression. With a moderate number of hidden neurons, an MLP can be considered a quasi-parametric model similar to projection pursuit regression (=-=Friedman and Stuetzle 1981-=-). An MLP with one hidden layer is essentially the same as the projection pursuit regression model except that an MLP uses a predetermined functional form for the activation function in the hidden lay... |

534 |
A user’s guide to principal components
- Jackson
- 1991
(Show Context)
Citation Context ...by linear regression from the feature variables. Hence, as is wellknown from statistical theory, the optimal feature variables are the principal components of the dependent variables (Hotelling 1933; =-=Jackson 1991-=-; Jolliffe 1986; Rao 1964). There are many variations, such as Oja’s rule and Sanger’s rule, that are just inefficient algorithms for approximating principal components. The statistical model of princ... |

451 |
Nonlinear Regression Analysis and its Applications
- Bates, Watts
- 1988
(Show Context)
Citation Context ...S/STAT R software, MODEL in SAS/ETS R software, NLP in SAS/OR R software, and the various NLP routines in SAS/IML R software. There is extensive statistical theory regarding nonlinear models (=-=Bates and Watts 1988-=-; Borowiak 1989; Cramer 1986; Edwards 1972; Gallant 1987; Gifi 1990; H a rdle 1990; Ross 1990; Seber and Wild 1989). Statistical software can be used to produce confidence intervals, prediction inter... |

395 |
Applied Linear Regression
- Weisberg
- 1985
(Show Context)
Citation Context ...quares, i.e., by attempting to minimize - 2 , where the summation is over all outputs and over the training set. A perceptron with a linear activation function is thus a linear regression model (=-=Weisberg 1985-=-; Myers 1986), possibly multiple or multivariate, as shown in Figure 3. Input 1 2 3 Independent Variables Output Predicted Values Target 1 2 Dependent Variables Figure 3: Simple Linear Perce... |

369 | Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system - Carpenter, Grossberg, et al. - 1991 |

359 |
Nonlinear Statistical Models
- Gallant
- 1987
(Show Context)
Citation Context ...R R software, and the various NLP routines in SAS/IML R software. There is extensive statistical theory regarding nonlinear models (Bates and Watts 1988; Borowiak 1989; Cramer 1986; Edwards 1972; =-=Gallant 1987-=-; Gifi 1990; H a rdle 1990; Ross 1990; Seber and Wild 1989). Statistical software can be used to produce confidence intervals, prediction intervals, diagnostics, and various graphical displays, all o... |

314 |
Smooth regression analysis
- Watson
- 1964
(Show Context)
Citation Context ...ulation controller (CMAC) (Miller, Glanz and Kraft 1990). Sometimes the hidden layer values are normalized to sum to 1 (Moody and Darken 1988) as is commonly done in kernel regression (Nadaraya 1964; =-=Watson 1964-=-). Then if each observation is taken as an RBF center, and if the weights are taken to be the target values, the outputs are simply weighted averages of the target values, and the network is identical... |

303 |
Adaptive Pattern Recognition and Neural Networks
- Pao
- 1989
(Show Context)
Citation Context ...of the form shown in Figure 6, in which the arrows from the inputs to the polynomial terms would usually be given a constant weight of 1. In NN terminology, this is a type of functional link network (=-=Pao 1989-=-). In general, functional links can be transformations of any type that do not require extra parameters, and the activation function for the output is the identity, so the model is linear in the param... |

295 | Projection pursuit - Huber - 1985 |

269 | Finding Groups in Data - Kaufman, Rousseeuw - 1990 |

250 |
A general regression neural network
- Specht
- 1991
(Show Context)
Citation Context ... NN researchers routinely reinvent methods that have been known in the statistical or mathematical literature for decades or centuries, but they often fail to understand how these methods work (e.g., =-=Specht 1991-=-). The common implementations of NNs are based on biological or engineering criteria, such as how easy it is to fit the net on a chip, rather than on well-established statistical and optimization crit... |

249 |
Nonlinear Multivariate Analysis
- Gifi
- 1990
(Show Context)
Citation Context ..., and the various NLP routines in SAS/IML R software. There is extensive statistical theory regarding nonlinear models (Bates and Watts 1988; Borowiak 1989; Cramer 1986; Edwards 1972; Gallant 1987; =-=Gifi 1990-=-; H a rdle 1990; Ross 1990; Seber and Wild 1989). Statistical software can be used to produce confidence intervals, prediction intervals, diagnostics, and various graphical displays, all of which rar... |

223 | ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network - Carpenter, Grossberg, et al. - 1991 |

218 |
Discrimination and Classification
- HAND
- 1981
(Show Context)
Citation Context ...les Output Predicted Value Target Dependent Variable Figure 4: Simple Nonlinear Perceptron = Logistic Regression A perceptron with a threshold activation function is a linear discriminant function (=-=Hand 1981-=-; McLachlan 1992; Weiss and Kulikowski 1991). If there is only one output, it is also called an adaline, as shown in Figure 5. With multiple outputs, the threshold perceptron is a multiple discriminan... |

132 |
The use and interpretation of principal component analysis in applied research
- Rao
- 1964
(Show Context)
Citation Context ... feature variables. Hence, as is wellknown from statistical theory, the optimal feature variables are the principal components of the dependent variables (Hotelling 1933; Jackson 1991; Jolliffe 1986; =-=Rao 1964-=-). There are many variations, such as Oja’s rule and Sanger’s rule, that are just inefficient algorithms for approximating principal components. The statistical model of principal component analysis i... |

110 | What is projection pursuit - Jones, Sibson - 1987 |

108 | Statistical Aspects of Neural Networks - Ripley - 1993 |

82 | Cmas: an associative neural network alternative to backpropagation - Miller, Glanz, et al. - 1990 |

50 | ART1 and pattern clustering - Moore - 1989 |

49 | Cluster Analysis (2nd edition - EVERITT - 1980 |

48 |
Econometric applications of maximum likelihood methods Cambridge
- Cramer
- 1986
(Show Context)
Citation Context ... R software, NLP in SAS/OR R software, and the various NLP routines in SAS/IML R software. There is extensive statistical theory regarding nonlinear models (Bates and Watts 1988; Borowiak 1989; =-=Cramer 1986-=-; Edwards 1972; Gallant 1987; Gifi 1990; H a rdle 1990; Ross 1990; Seber and Wild 1989). Statistical software can be used to produce confidence intervals, prediction intervals, diagnostics, and vario... |

48 | The Interpretation of Analytical Chemical Data by the Use of Cluster Analysis - Massart, Kaufman - 1983 |

33 |
Nonlinear estimation
- Ross
- 1990
(Show Context)
Citation Context ...nes in SAS/IML R software. There is extensive statistical theory regarding nonlinear models (Bates and Watts 1988; Borowiak 1989; Cramer 1986; Edwards 1972; Gallant 1987; Gifi 1990; H a rdle 1990; =-=Ross 1990-=-; Seber and Wild 1989). Statistical software can be used to produce confidence intervals, prediction intervals, diagnostics, and various graphical displays, all of which rarely appear in the NN litera... |

28 |
Model discrimination for nonlinear regression models
- Borowiak
- 1989
(Show Context)
Citation Context ...ODEL in SAS/ETS R software, NLP in SAS/OR R software, and the various NLP routines in SAS/IML R software. There is extensive statistical theory regarding nonlinear models (Bates and Watts 1988; =-=Borowiak 1989-=-; Cramer 1986; Edwards 1972; Gallant 1987; Gifi 1990; H a rdle 1990; Ross 1990; Seber and Wild 1989). Statistical software can be used to produce confidence intervals, prediction intervals, diagnosti... |

26 | How Neural Networks Learn from Experience" SCientific American - Hinton - 1992 |

21 |
The modified kanerva model for automatic speech recognition
- Prager, Fallside
- 1989
(Show Context)
Citation Context ...BF neurons are also called localized receptive fields, locally tuned processing units, or potential functions. RBF networks are closely related to regularization networks. The modified Kanerva model (=-=Prager and Fallside 1989-=-) is an RBF network with a threshhold activation function. The Restricted Coulomb EnergyTM System (Cooper, Elbaum and Reilly 1982) is another threshold RBF network used for classification. There is a ... |

19 |
Mapping neural networks derived from the Parzen window estimator
- Schiøler, Hartmann
- 1992
(Show Context)
Citation Context ...ted averages of the target values, and the network is identical to the well-known Nadaraya-Watson kernel regression estimator. This method has been reinvented twice in the NN literature (Specht 1991; =-=Schiøler and Hartmann 1992-=-). Specht has popularized both kernel regression, which he calls a general regression neural network (GRNN) and kernel discriminant analysis, which he calls a probabilistic neural network (PNN). Spech... |

19 |
Curves as parameters, and touch estimation
- Tukey
- 1961
(Show Context)
Citation Context ...ounterpropagation is a form of nonparametric regression in which the smoothing parameter is the number of clusters. If training is unidirectional, then counterpropagation is a regressogram estimator (=-=Tukey 1961-=-) with the bins determined clustering the input cases. With bidirectional training, both the input and target variables are used in forming the clusters; this makes the clusters more adaptive to the l... |

18 | Applied Nonparametric Regression - ardle - 1990 |

17 | Redundancy Analysis—An Alternative to Canonical Correlation Analysis - Wollenberg - 1977 |

10 | The Outlier process
- Geiger, Pereira
- 1991
(Show Context)
Citation Context ... nonparametric smoother is applied to each input, and these values are added together. The TRANSREG procedure fits nonlinear additive models using splines. Topologically distributed encoding (TDE) (=-=Geiger 1990-=-) uses Gaussian basis functions. A nonlinear additive model can also be implemented as a NN as shown in Figure 17. Each input is connected to a small subnetwork to provide the nonlinear transformation... |