## A review of dimension reduction techniques (1997)

### Cached

### Download Links

Citations: | 33 - 4 self |

### BibTeX

@MISC{Carreira-perpiñán97areview,

author = {Miguel Á. Carreira-perpiñán},

title = {A review of dimension reduction techniques},

year = {1997}

}

### Years of Citing Articles

### OpenURL

### Abstract

The problem of dimension reduction is introduced as a way to overcome the curse of the dimensionality when dealing with vector data in high-dimensional spaces and as a modelling tool for such data. It is defined as the search for a low-dimensional manifold that embeds the high-dimensional data. A classification of dimension reduction problems is proposed. A survey of several techniques for dimension reduction is given, including principal component analysis, projection pursuit and projection pursuit regression, principal curves and methods based on topologically continuous maps, such as Kohonen’s maps or the generalised topographic mapping. Neural network implementations for several of these techniques are also reviewed, such as the projection pursuit learning network and the BCM neuron with an objective function. Several appendices complement the mathematical treatment of the main text.

### Citations

9215 |
Elements of information theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...onlinear manifolds. According to this and the following results: • For fixed variance, the normal distribution has the least information, in both the senses of Fisher information and negative entropy =-=[16]-=-. • For most high-dimensional clouds, most low-dimensional projections are approximately normal (Diaconis and Freedman [24]). We will consider the normal distribution as the least structured (or least... |

9033 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...) = p(xi|tn, W, β) = �K i ′ =1 p(tn|xi ′, W, β) . – G is a diagonal K × K matrix with elements gii(W, β) = �N n=1 Rin(W, β). (5.14) Because the EM algorithm increases the log-likelihood monotonically =-=[23]-=-, the convergence of GTM is guaranteed. According to [7], convergence is usually achieved after a few tens of iterations. As initial weights one can take the first L principal components of the sample... |

5362 |
Neural Networks for Pattern Recognition
- Bishop
- 1995
(Show Context)
Citation Context ... (e.g. degree of differentiability or computational effort required). This fact finds a parallel with radial basis functions in neural networks: the actual shape of the RBFs is relatively unimportant =-=[5]-=-. For h depending in some way on the sample size n, pointwise as well as global convergence (in probability for both the uniform and the integration sense) can be proven for kernels satisfying a few v... |

4443 |
Classification and Regression Trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...supersmoother (section F.3.4), no huge regression tables are needed. F.3.6 Projection pursuit regression See section 3.7. F.3.7 Partitioning methods (regression trees) Partitioning methods (e.g. CART =-=[10]-=-, ID3 [82]) operate as follows: • Partition the input space into regions according to the data during training (typically with hyperplanes parallel to the coordinate axes). Binary partitions are commo... |

1682 | Generalized Additive Models
- Hastie, Tibshirani
- 1990
(Show Context)
Citation Context ...refore less general because there is no interaction between input variables (e.g. the function x1x2 cannot be modelled), but it is more easily interpretable (the functions gk(xk) can be plotted); see =-=[47, 48]-=- for some applications. One could add cross-terms of the form gkl(xk, xl) to achieve greater flexibility but the combinatorial explosion quickly sets in. 3.8.1 Backfitting The ridge functions {gk} D i... |

1389 |
Multilayer Feedforward Networks are Universal Approximators
- Hornik, MaxWhite
- 1989
(Show Context)
Citation Context ...rmance of PPL compared to that of BPL Although approximation theorems of the form “any noise-free square-integrable function can be approximated to an arbitrary degree of accuracy” exist for both BPL =-=[18, 50, 51, 52]-=- and PPL (see section 6.1), nothing is said about the number of neurons required in practice. Hwang et al. report some empirical results (based on a simulation with D = 2, q = 1): • PPL (with Hermite ... |

925 |
Approximations by superpositions of a sigmoidal function
- Cybenko
- 1989
(Show Context)
Citation Context ...rmance of PPL compared to that of BPL Although approximation theorems of the form “any noise-free square-integrable function can be approximated to an arbitrary degree of accuracy” exist for both BPL =-=[18, 50, 51, 52]-=- and PPL (see section 6.1), nothing is said about the number of neurons required in practice. Hwang et al. report some empirical results (based on a simulation with D = 2, q = 1): • PPL (with Hermite ... |

710 | The cascade-correlation learning architecture
- Fahlman, Lebiere
- 1989
(Show Context)
Citation Context ...ith the dimension for fixed accuracy: the curse of dimensionality has been reduced. 6.4 Cascade correlation learning network (CCLN) Figure 13 shows a cascade correlation learning network (CCLN). CCLN =-=[30]-=- is a supervised learning architecture that dynamically grows layers of hidden neurons with fixed nonlinear activations (e.g. sigmoids), so that the network topology (size, length) can be efficiently ... |

700 | Vector Quantization
- Gray
- 1984
(Show Context)
Citation Context ...as a plane sheet that twists around itself in D dimensions to resemble as much as possible the distribution of the data vectors. 5.1.1 The neighbourhood function In a SOM, like in vector quantisation =-=[41]-=-, we have a set of reference or codebook vectors {µ i} M i=1 in data space R D , initially distributed at random 21 , but each of them is associated to a node i in a 2-D lattice —unlike in vector quan... |

505 |
Adaptive Control Process: A Guided Tour
- Bellman
- 1961
(Show Context)
Citation Context ...t information, producing a more economic representation of the data. 1.4 The curse of the dimensionality and the empty space phenomenon The curse of the dimensionality (term coined by Bellman in 1961 =-=[3]-=-) refers to the fact that, in the absence of simplifying assumptions, the sample size needed to estimate a function of several variables to a given degree of accuracy (i.e. to get a reasonably low-var... |

443 | Projection Pursuit Regression
- Friedman, Stuetzle
- 1981
(Show Context)
Citation Context ...ality due to the feature extraction step. • Less biased to the training data due to the CART method. 3.7 Projection pursuit regression (PPR) Projection pursuit regression (PPR) (Friedman and Stuetzle =-=[36]-=-) is a nonparametric regression approach for the multivariate regression problem (see section F.1) based in projection pursuit. It works by additive composition, constructing an approximation to the d... |

333 | Theory of the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex
- Bienenstock, Cooper, et al.
- 1982
(Show Context)
Citation Context ...en units (hidden layer not retrained). Table 2 compares CCLNs with PPLNs. 6.5 BCM neuron using an objective function 6.5.1 The BCM neuron Figure 14 shows the BCM neuron (Bienenstock, Cooper and Munro =-=[4]-=-). Let x ∈ R D be the input to the neuron and c = x T w ∈ R its output. We define the threshold θw = E � (x T w) 2� = E � c 2� and 36sthe functions31 ˆ φ(c, θw) = c2 − 1 2cθw and φ(c, θw) = c2 − cθw. ... |

318 |
Principal curves
- Hastie, Stuetzle
- 1989
(Show Context)
Citation Context ...ric: principal curve. PSfrag replacements mmetric: regression line. principal-component line. mmetric: principal curve. 4 Principal Curves and Principal Surfaces Principal curves (Hastie and Stuetzle =-=[46]-=-) are smooth 1-D curves that pass through the middle of a p-dimensional data set, providing a nonlinear summary of it. They are estimated in a nonparametric way, i.e. their shape is suggested by the d... |

275 | Growing Cell Structures - A Self-Organizing Network for Unsupervised and Supervised
- Fritzke
- 1994
(Show Context)
Citation Context ...ed to dimension reduction that have not been included in this work due to lack of time. These would include the Helmholtz machine [19, 20], some variations of self-organising maps (growing neural gas =-=[39, 40]-=-, Bayesian approaches [96, 97], etc.), population codes [100], and curvilinear component analysis [22], among others. 39sA Glossary • Backfitting algorithm: an iterative method to fit additive models,... |

262 |
A projection pursuit algorithm for exploratory data analysis
- FRIEDMAN, TUKEY
- 1974
(Show Context)
Citation Context ...on 3.2) is not a good indicator of structure. However, the projection on the plane spanned by e1 and e2 clearly shows both clusters. 8 The term projection pursuit was introduced by Friedman and Tukey =-=[38]-=- in 1974, along with the first projection index. Good reviews of projection pursuit can be found in Huber [53] and Jones and Sibson [65]. 9 Varimax rotation [66] is a procedure that, given a subspace ... |

244 |
The use of Faces to Represent Points in k-Dimensional Space Graphically
- Chernoff
- 1973
(Show Context)
Citation Context ... up to about 5-dimensional data sets, using colours, rotation, stereography, glyphs or other devices, but they lack the appeal of a simple plot; a well-known one is the grand tour [1]. Chernoff faces =-=[13]-=- allow even a few more dimensions, but are difficult to interpret and do not produce a spatial view of the data. 5sPSfrag replacements x3 M x2 B A x1 f(t) = (R sin 2πt, R cos 2πt, st) T Figure 2: An e... |

243 |
Projection pursuit
- Huber
- 1985
(Show Context)
Citation Context ...y shows both clusters. 8 The term projection pursuit was introduced by Friedman and Tukey [38] in 1974, along with the first projection index. Good reviews of projection pursuit can be found in Huber =-=[53]-=- and Jones and Sibson [65]. 9 Varimax rotation [66] is a procedure that, given a subspace or projection, selects a new basis for it that maximises the variance but giving large loadings to as few vari... |

235 |
Approximation capabilities of multilayer feedforward networks
- Hornik
- 1991
(Show Context)
Citation Context ...rmance of PPL compared to that of BPL Although approximation theorems of the form “any noise-free square-integrable function can be approximated to an arbitrary degree of accuracy” exist for both BPL =-=[18, 50, 51, 52]-=- and PPL (see section 6.1), nothing is said about the number of neurons required in practice. Hwang et al. report some empirical results (based on a simulation with D = 2, q = 1): • PPL (with Hermite ... |

200 | The Helmholtz machine
- Dayan, Hinton, et al.
- 1995
(Show Context)
Citation Context ...ld like to conclude by mentioning a number of further techniques related to dimension reduction that have not been included in this work due to lack of time. These would include the Helmholtz machine =-=[19, 20]-=-, some variations of self-organising maps (growing neural gas [39, 40], Bayesian approaches [96, 97], etc.), population codes [100], and curvilinear component analysis [22], among others. 39sA Glossar... |

195 |
Neural networks for principal component analysis: learning from examples without local minima
- Baldi, Hornik
- 1989
(Show Context)
Citation Context ...en units and n outputs, trained to replicate the input in the output layer minimising the squared sum of errors, and typically trained with backpropagation. Bourlard and Kamp [9] and Baldi and Hornik =-=[2]-=- showed that this network finds a basis of the subspace spanned by the first h PCs, not necessarily coincident with them 30 ; see [11, 15] for applications. • Networks based in Oja’s rule [77] with so... |

186 |
The Grand Tour: A Tool for Viewing Multidimensional Data
- Asimov
- 1985
(Show Context)
Citation Context ...t allow to visualise up to about 5-dimensional data sets, using colours, rotation, stereography, glyphs or other devices, but they lack the appeal of a simple plot; a well-known one is the grand tour =-=[1]-=-. Chernoff faces [13] allow even a few more dimensions, but are difficult to interpret and do not produce a spatial view of the data. 5sPSfrag replacements x3 M x2 B A x1 f(t) = (R sin 2πt, R cos 2πt,... |

168 |
ªNeural Networks: A Review from Statistical Perspective,º
- Cheng, Titterington
- 1994
(Show Context)
Citation Context ... the activation of the hidden layer, v, and the weights of the second layer, β: y = f(ϕ(v, β)). The following particular cases of this network implement several of the techniques previously mentioned =-=[12]-=-: • Projection pursuit regression (cf. eq. (3.8)): ϕ(v, β) = v T 1, vk = gk(w T k x) ⇒ y = j� k=1 gk(w T k x). The activation functions gk are determined from the data during training; wk represent th... |

168 | Curvilinear component analysis: A self-organizing neural network for nonlinear mapping of data sets
- Demartines, Herault
- 1997
(Show Context)
Citation Context ...e the Helmholtz machine [19, 20], some variations of self-organising maps (growing neural gas [39, 40], Bayesian approaches [96, 97], etc.), population codes [100], and curvilinear component analysis =-=[22]-=-, among others. 39sA Glossary • Backfitting algorithm: an iterative method to fit additive models, by fitting each term to the residuals given the rest (see section 3.8.1). It is a version of the Gaus... |

144 |
An Introduction to Latent Variable Models
- Everitt
- 1984
(Show Context)
Citation Context ...e system. Sometimes, a phenomenon which is in appearance high-dimensional, and thus complex, can actually be governed by a few simple variables (sometimes called “hidden causes” or “latent variables” =-=[29, 20, 21, 6, 74]-=-). Dimension reduction can be a powerful tool for modelling such phenomena and improve our understanding of them (as often the new variables will have an interpretation). For example: • Genome sequenc... |

130 |
Smoothing Techniques with Implementation
- Hardle
- 1991
(Show Context)
Citation Context .... Problems: The choice of origin can affect the estimate. The next estimators are independent of origin choice. 33 This appendix is mainly based on the books by Silverman [93], Scott [89] and Haerdle =-=[45]-=-. 34 It is also possible to estimate the cdf F (x) and then obtain from it the pdf f = dF � n i=1 I (−∞,x](Xi). It can be shown that this is an unbiased estimator and has the smallest variance of all ... |

121 |
Principal Component Neural Networks: Theory and Applications
- Diamantaras, Kung
- 1996
(Show Context)
Citation Context ... on regularisation and wavelets. 31 k=1s6.2 Principal component analysis networks There exist several neural network architectures capable to extract PCs (see fig. 9), which can be classified in (see =-=[26]-=- or [11] for more details): • Autoassociators (also called autoencoders, bottlenecks or n-h-n networks), which are linear two-layer perceptrons with n inputs, h hidden units and n outputs, trained to ... |

95 |
Asymptotics of graphical projection pursuit
- Diaconis, Freedman
- 1984
(Show Context)
Citation Context ...t information, in both the senses of Fisher information and negative entropy [16]. • For most high-dimensional clouds, most low-dimensional projections are approximately normal (Diaconis and Freedman =-=[24]-=-). We will consider the normal distribution as the least structured (or least interesting) density. For example, figure 5 shows two 2-D projections of a 3-D data set consisting of two clusters. The pr... |

83 |
Auto-association by multilayer perceptrons and singular value decomposi- tion
- Bourlard, Kamp
- 1988
(Show Context)
Citation Context ...ons with n inputs, h hidden units and n outputs, trained to replicate the input in the output layer minimising the squared sum of errors, and typically trained with backpropagation. Bourlard and Kamp =-=[9]-=- and Baldi and Hornik [2] showed that this network finds a basis of the subspace spanned by the first h PCs, not necessarily coincident with them 30 ; see [11, 15] for applications. • Networks based i... |

69 | Regression Modeling in Back-Propagation and Projection Pursuit Learning
- Hwang, Lay, et al.
- 1994
(Show Context)
Citation Context ...ive mapping G D L D Linear Sigmoidal Figure 10: Autoencoder, implemented as a four-layer nonlinear perceptron where L < D and ˆx = G(R(x)). 6.3 Projection pursuit learning network (PPLN) Hwang et al. =-=[54]-=- propose the two-layer perceptron depicted in fig. 11 with a projection pursuit learning (PPL) algorithm to solve the multivariate nonparametric regression problem of section F.1. The outputs 30 If ne... |

66 | Adaptive network for optimal linear feature extraction
- Földiák
- 1989
(Show Context)
Citation Context ...ecessarily coincident with them 30 ; see [11, 15] for applications. • Networks based in Oja’s rule [77] with some kind of decorrelating device (e.g. Kung and Diamantaras’ APEX [72], Földiák’s network =-=[32]-=-, Sanger’s Generalised Hebbian Algorithm [88]). PSfrag replacements Linear autoassociator APEX network n input units h < n hidden, output units n output units Figure 9: Two examples of neural networks... |

65 | Projection pursuit density estimation
- Friedman, Stuetzle, et al.
- 1984
(Show Context)
Citation Context ...nue until j = D or D(f||g) = 0, when ˆ f will be the Gaussian density. 3.7.3 Projection pursuit density estimation (PPDE) Projection pursuit density estimation (PPDE; Friedman, Stuetzle and Schroeder =-=[37]-=-) is appropriate when the variation of densities is concentrated in a linear manifold of the high-dimensional space. Given the data sample in R D , it operates as follows: 1. Sphere the data. 2. Take ... |

59 |
A variable span smoother
- Friedman
- 1984
(Show Context)
Citation Context ...l network implementation of PPR; see section 6.3 for details. 15 We consider only one component of the vector function f of section F.1. 16 Friedman and Stuetzle [36] use the Friedman’s supersmoother =-=[33]-=-; see section E for other smoothers. 18s3.7.1 Penalty terms in PPR Expressing the smoothness constraint on the ridge functions gj by some smoothness measure C, we can merge steps 2 and 3 in the PPR al... |

54 | Competition and multiple cause models
- Dayan, Zemel
- 1995
(Show Context)
Citation Context ...e system. Sometimes, a phenomenon which is in appearance high-dimensional, and thus complex, can actually be governed by a few simple variables (sometimes called “hidden causes” or “latent variables” =-=[29, 20, 21, 6, 74]-=-). Dimension reduction can be a powerful tool for modelling such phenomena and improve our understanding of them (as often the new variables will have an interpretation). For example: • Genome sequenc... |

46 |
Categorization of faces using unsupervised feature extraction
- Fleming, Cottrell
- 1990
(Show Context)
Citation Context ...weights would be exceedingly large. Therefore we need to reduce the dimension. A crude solution would be to simply scale down the images to a manageable size. More elaborate approaches exist, e.g. in =-=[31]-=-, a first neural network is used to reduce the vector dimension (which actually performs a principal component analysis of the training data) from the original dimension of 63 × 61 = 3843 to 80 compon... |

39 |
On non-linear functions of linear combinations
- Diaconis, Shahshahani
- 1984
(Show Context)
Citation Context ...ections. Incidentally, this shows that continuous functions gk can be uniformly approximated by a sigmoidal MLP of one input. Therefore, the approximation capabilities of MLPs and PP are very similar =-=[25, 63]-=-. This architecture admits generalisations to several output variables [84] depending on whether the output share the common “basis functions” gk and, if not, whether the separate gk share common proj... |

37 |
On polynomial-based projection indices for exploratory projection pursuit
- Hall
- 1989
(Show Context)
Citation Context ...All the previous class III indices satisfy property (3.2). The following two indices are computed via an orthogonal series estimator of f using Hermite polynomials: • I H = � (f(x) − φ(x)) 2 dx (Hall =-=[42]-=-). • I N = � (f(x) − φ(x)) 2 φ(x) dx (Cook et al. [14]). where φ(x) is the normal density of eq. (B.6). Other indices are: • Eslava and Marriott [28] propose two two-dimensional indices designed to di... |

21 |
The cascade-correlation learning: a projection pursuit learning perspective
- Hwang, You, et al.
(Show Context)
Citation Context ... No Yes (backfitting) Table 2: CCLNs vs PPLNs. 2. (Re)train output-to-hidden (output layer) units from the new and all the previous hidden units using MSE criterion and speedy quickprop. Hwang et al. =-=[55]-=- give a comparison between CCLN and PPLN: • Each new hidden unit (from a pool of candidate units) receives connections from the input layer but also from all the previous hidden units, which provide w... |

19 |
Projection Pursuit Indexes Based on Orthonormal Function Expansions
- Cook, Buja, et al.
- 1993
(Show Context)
Citation Context ....2). The following two indices are computed via an orthogonal series estimator of f using Hermite polynomials: • I H = � (f(x) − φ(x)) 2 dx (Hall [42]). • I N = � (f(x) − φ(x)) 2 φ(x) dx (Cook et al. =-=[14]-=-). where φ(x) is the normal density of eq. (B.6). Other indices are: • Eslava and Marriott [28] propose two two-dimensional indices designed to display all clusters: – Minimise the polar nearest neigh... |

19 |
New Developments in Electropalatography: A State-of-the-Art Report
- Hardcastle, Jones, et al.
- 1989
(Show Context)
Citation Context ...uences or continuous speech. A further categorisation can be done attending to the discrete or continuous nature of the data vectors. An example of discrete data is found in electropalatography (EPG) =-=[44]-=-, where each vector is a sequence of about 100 binary values which indicate the presence or absence of tongue-palate contact in coarticulation studies 6 . Another example of discrete data are genome s... |

18 | EM optimization of latent-variable density models
- Bishop, Svensén, et al.
- 1996
(Show Context)
Citation Context ...e system. Sometimes, a phenomenon which is in appearance high-dimensional, and thus complex, can actually be governed by a few simple variables (sometimes called “hidden causes” or “latent variables” =-=[29, 20, 21, 6, 74]-=-). Dimension reduction can be a powerful tool for modelling such phenomena and improve our understanding of them (as often the new variables will have an interpretation). For example: • Genome sequenc... |

18 |
Image compression by back propagation: A demonstration of extensional programming
- Cottrell, Munro, et al.
- 1987
(Show Context)
Citation Context ...d with backpropagation. Bourlard and Kamp [9] and Baldi and Hornik [2] showed that this network finds a basis of the subspace spanned by the first h PCs, not necessarily coincident with them 30 ; see =-=[11, 15]-=- for applications. • Networks based in Oja’s rule [77] with some kind of decorrelating device (e.g. Kung and Diamantaras’ APEX [72], Földiák’s network [32], Sanger’s Generalised Hebbian Algorithm [88]... |

16 |
EPG data reduction methods and their implications for studies of lingual coarticulation
- Hardcastle, Gibbon, et al.
- 1991
(Show Context)
Citation Context ...d some improvements to them. The report is concluded with a short discussion of the material presented and of further work. Several appendices complement the mathematical part of the main text. 6 See =-=[43]-=- for a discussion of several ad-hoc dimension reduction methods in use in electropalatography. 9s2 Principal Component Analysis Principal component analysis7 (PCA) is possibly the dimension reduction ... |

9 |
Some criteria for projection pursuit
- Eslava, Marriott
- 1994
(Show Context)
Citation Context ...te polynomials: • I H = � (f(x) − φ(x)) 2 dx (Hall [42]). • I N = � (f(x) − φ(x)) 2 φ(x) dx (Cook et al. [14]). where φ(x) is the normal density of eq. (B.6). Other indices are: • Eslava and Marriott =-=[28]-=- propose two two-dimensional indices designed to display all clusters: – Minimise the polar nearest neighbour index I = E {min (|θi − θi−1|, |θi+1 − θi|)}, where {θi} n i=1 are the polar angles of the... |

5 |
Compression neural networks for feature extraction: Application to human recognition from ear images”, MSc thesis
- Carreira-Perpinan
- 1995
(Show Context)
Citation Context ...larisation and wavelets. 31 k=1s6.2 Principal component analysis networks There exist several neural network architectures capable to extract PCs (see fig. 9), which can be classified in (see [26] or =-=[11]-=- for more details): • Autoassociators (also called autoencoders, bottlenecks or n-h-n networks), which are linear two-layer perceptrons with n inputs, h hidden units and n outputs, trained to replicat... |

5 |
approximation of an unknown mapping and its derivatives using multilayer feedforward networks
- “Universal
- 1990
(Show Context)
Citation Context |

2 |
A principled alternative to the self-organising map
- GTM
- 1997
(Show Context)
Citation Context ...re extraction, plays no role in the probabilistic model (see fig. 10). 5.2.3 Generative topographic mapping (GTM) The generative topographic mapping (GTM), put forward by Bishop, Svensén and Williams =-=[7]-=- as a principled view of Kohonen’s SOMs, is a density network based on a constrained Gaussian mixture model and trained with the EM algorithm. No biological motivation is intended. In GTM, the dimensi... |

1 |
Principal Component Analysis, no. 07–069
- Dunteman
- 1989
(Show Context)
Citation Context ... numerical techniques exist for finding all or the first few eigenvalues and eigenvectors of a square, symmetric, semidefinite positive matrix (the covariance matrix) in O(D 3 ): singular value 7 See =-=[27, 61, 62]-=- for a more comprehensive treatment. Also, see section C.2 for a comparison with other transformations of the covariance matrix. 10sdecomposition, Cholesky decomposition, etc.; see [81] or [99]. When ... |

1 |
additive models: Some applications
- Generalized
- 1987
(Show Context)
Citation Context ...refore less general because there is no interaction between input variables (e.g. the function x1x2 cannot be modelled), but it is more easily interpretable (the functions gk(xk) can be plotted); see =-=[47, 48]-=- for some applications. One could add cross-terms of the form gkl(xk, xl) to achieve greater flexibility but the combinatorial explosion quickly sets in. 3.8.1 Backfitting The ridge functions {gk} D i... |

1 |
Additive Models, no. 43
- Generalized
- 1990
(Show Context)
Citation Context .... Take as starting estimate the standard normal in D dimensions, N (0, I) (B.4). 3. Apply the synthetic PPDA. 3.8 Generalised additive models A generalised additive model (GAM) (Hastie and Tibshirani =-=[49]-=-) for a D-dimensional density f(x) is: ˆf(x) = α + D� gk(xk) (3.11) GAMs are a particular case of PPR (and as such, they have a neural network implementation; see section 6.1) with ak = ek and E {gk(x... |

1 | Localized exploratory projection pursuit
- Intrator
- 1991
(Show Context)
Citation Context ...is small there) and is therefore sensitive to outliers and scaling. 14 Some algorithms exist that try to avoid nonglobal optima (e.g. EPP, section 3.6). 17s3.6.1 Localised EPP Localised EPP (Intrator =-=[56]-=-) is a nonparametric classification method for high-dimensional spaces. A recursive partitioning method is applied to the high-dimensional space and low-dimensional features are extracted via EPP in e... |