## A Survey of Dimension Reduction Techniques (2002)

Citations: | 88 - 0 self |

### BibTeX

@TECHREPORT{Fodor02asurvey,

author = {Imola Fodor},

title = {A Survey of Dimension Reduction Techniques},

institution = {},

year = {2002}

}

### Years of Citing Articles

### OpenURL

### Abstract

this paper, we assume that we have n observations, each being a realization of the p- dimensional random variable x = (x 1 , . . . , x p ) with mean E(x) = = ( 1 , . . . , p ) and covariance matrix E{(x )(x = # pp . We denote such an observation matrix by X = i,j : 1 p, 1 n}. If i and # i = # (i,i) denote the mean and the standard deviation of the ith random variable, respectively, then we will often standardize the observations x i,j by (x i,j i )/ # i , where i = x i = 1/n j=1 x i,j , and # i = 1/n j=1 (x i,j x i )

### Citations

7411 |
Genetic Algorithms
- Goldberg
- 1989
(Show Context)
Citation Context ...utionary algorithms (GEAs) are optimization techniques based on Darwinian theory of evolution that use natural selection and genetics to find the best solution among members of a competing population =-=[16]-=-. There are many references describing how GEAs can be used in dimension reduction. In essence, given a set of candidate solutions, an objective function to evaluate the fitness of candidates, and the... |

4978 |
C4.5: Programs for machine learning
- Quinlan
- 1993
(Show Context)
Citation Context ...ture. There are numerous books and articles [41, 17, 5, 14, 19, 46, 13] in the statistical literature on techniques for analyzing multivariate datasets. Advances in computer science, machine learning =-=[43, 50, 44, 2]-=-. Earlier survey papers. [7] reviews several methods, including principal components analysis, projection pursuit, principal curves, self-organizing maps, as well as provides neural network implementa... |

4872 |
Neural networks for pattern recognition
- Bishop
- 1995
(Show Context)
Citation Context ...ture. There are numerous books and articles [41, 17, 5, 14, 19, 46, 13] in the statistical literature on techniques for analyzing multivariate datasets. Advances in computer science, machine learning =-=[43, 50, 44, 2]-=-. Earlier survey papers. [7] reviews several methods, including principal components analysis, projection pursuit, principal curves, self-organizing maps, as well as provides neural network implementa... |

3945 |
Classification and regression trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...r, we review traditional and current state-of-the-art dimension reduction methods published in the statistics, signal processing and machine learning literature. There are numerous books and articles =-=[41, 17, 5, 14, 19, 46, 13]-=- in the statistical literature on techniques for analyzing multivariate datasets. Advances in computer science, machine learning [43, 50, 44, 2]. Earlier survey papers. [7] reviews several methods, in... |

3375 | Induction of decision trees
- Quinlan
- 1986
(Show Context)
Citation Context ...ture. There are numerous books and articles [41, 17, 5, 14, 19, 46, 13] in the statistical literature on techniques for analyzing multivariate datasets. Advances in computer science, machine learning =-=[43, 50, 44, 2]-=-. Earlier survey papers. [7] reviews several methods, including principal components analysis, projection pursuit, principal curves, self-organizing maps, as well as provides neural network implementa... |

3276 |
The self-organizing map
- Kohonen
- 1990
(Show Context)
Citation Context ...er to these methods collectively as methods that use topologically continuous maps. 8.4.1 Kohonen’s self-organizing maps Given the data vector {tn} N n=1 ∈ R D , Kohonen’s self-organizing maps (=-=KSOM) [36]-=- learn, in an unsupervised way, a map between the data space and a 2-dimensional lattice. The method can be extended to L-dimensional topological arrangements as well. Let dL and dD denote distances (... |

2675 |
Introduction to statistical pattern recognition, 2nd edn
- Fukunaga
- 1990
(Show Context)
Citation Context ...r, we review traditional and current state-of-the-art dimension reduction methods published in the statistics, signal processing and machine learning literature. There are numerous books and articles =-=[41, 17, 5, 14, 19, 46, 13]-=- in the statistical literature on techniques for analyzing multivariate datasets. Advances in computer science, machine learning [43, 50, 44, 2]. Earlier survey papers. [7] reviews several methods, in... |

2037 |
Principal component analysis
- Jolliffe
- 1986
(Show Context)
Citation Context ...ions and non-linear dimension reduction techniques. 2 Principal component analysis Principal component analysis (PCA) is the best, in the mean-square error sense, linear dimension reduction technique =-=[25, 28].-=- Being based on the covariance matrix of the variables, it is a second-order method. In various fields, it is also known as the singular value decomposition (SVD), the Karhunen-Loève transform, the H... |

1515 | Independent Component Analysis
- Hyvärinen, Karhunen, et al.
- 2001
(Show Context)
Citation Context ...mation (and software) on this currently very popular method can be found at various websites, including [6, 24, 49]. Books summarizing the recent advances in the theory and application of ICA include =-=[1, 48, 15, 38]-=-. ICA is a higher-order method that seeks linear projections, not necessarily orthogonal to each other, that are as nearly statistically independent as possible. Statistical independence is a much str... |

1440 | Random forests
- Breiman
- 2001
(Show Context)
Citation Context ...mensional datasets is that, in many cases, not all the measured variables are “important” for understanding the underlying phenomena of interest. While certain computationally expensive novel meth=-=ods [4]-=- can construct predictive models with high accuracy from high-dimensional data, it is still of interest in many applications to reduce the dimension of the original data prior to any modeling of the d... |

1388 |
Generalized linear models
- McCullagh, Nelder
- 1989
(Show Context)
Citation Context ...inal features, is called the wrapper method in the machine learning community [34]. Dimension reduction methods related to regression include projection pursuit regression [20, 7], generalized linear =-=[42, 10]-=- and additive [19] models, neural network models, and sliced inverse regression and principal hessian directions [39]. 9 Summary In this paper, we described several dimension reduction methods. Acknow... |

1122 |
Pattern recognition and neural networks
- Ripley
- 1996
(Show Context)
Citation Context ...r, we review traditional and current state-of-the-art dimension reduction methods published in the statistics, signal processing and machine learning literature. There are numerous books and articles =-=[41, 17, 5, 14, 19, 46, 13]-=- in the statistical literature on techniques for analyzing multivariate datasets. Advances in computer science, machine learning [43, 50, 44, 2]. Earlier survey papers. [7] reviews several methods, in... |

521 | Fast and robust fixed-point algorithm for independent component analysis
- Hyvärinen
- 1999
(Show Context)
Citation Context ...eed for adaptation. The FastICA is such a batchmode algorithm using fixed-point iteration. It was introduced in [23] using the kurtosis, but was subsequently extended to general contrast functions in =-=[21]-=-. A MATLAB implementation is available from [24]. It can also be used for projection pursuit analysis described in Section 4. 6 Non-linear principal component analysis Non-linear PCA introduces non-li... |

438 | A fast fixed-point algorithm for independent component analysis
- Hyvärinen, Oja
- 1997
(Show Context)
Citation Context ...tive algorithms, and are more desirable in many practical situations where there is no need for adaptation. The FastICA is such a batchmode algorithm using fixed-point iteration. It was introduced in =-=[23]-=- using the kurtosis, but was subsequently extended to general contrast functions in [21]. A MATLAB implementation is available from [24]. It can also be used for projection pursuit analysis described ... |

411 | FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets
- Faloutsos, Lin
- 1995
(Show Context)
Citation Context ...e principal coordinates of X in k dimensions as the solution, which are equivalent to the first k principal components of X (without re-scaling to correlations) [41]. An alternative to MDS is FastMap =-=[12]-=-, a computationally efficient algorithm that maps high-dimensional data into lower-dimensional spaces, while preserving distances between objects. 8.4 Topologically continuous maps There are several m... |

408 | Multidimensional scaling - Cox |

335 |
Modern applied statistics with S-Plus
- Venables, Ripley
- 1999
(Show Context)
Citation Context ...ion of k, one can select the appropriate number of PCs to keep in order to explain a given percentage of the overall variation. Such plots are called scree diagram plots in the statistical literature =-=[53]. -=-The number of PCs to keep can also be determined by first fixing a threshold λ0, then only keeping the eigenvectors such that their corresponding eigenvalues are greater than λ0. This latter method ... |

301 |
A user’s guide to principal components
- Jackson
- 1991
(Show Context)
Citation Context ...ions and non-linear dimension reduction techniques. 2 Principal component analysis Principal component analysis (PCA) is the best, in the mean-square error sense, linear dimension reduction technique =-=[25, 28].-=- Being based on the covariance matrix of the variables, it is a second-order method. In various fields, it is also known as the singular value decomposition (SVD), the Karhunen-Loève transform, the H... |

299 |
Principal curves
- Hastie, Stuetzle
- 1989
(Show Context)
Citation Context ...od and a Bayesian ensemble learning method for estimation can be found in [30]. 11s8.2 Principal curves Principal curves are smooth curves that pass through the “middle” of multidimensional data s=-=ets [18, 40, 7]-=-. Linear principal curves are in fact principal components, while non-linear principal curves generalize the concept. Given the p-dimensional random vector y = (y1, . . . , yp) with density g(y), and ... |

213 |
Nonlinear principal component analysis using autoassociative neural networks
- Kramer
- 1991
(Show Context)
Citation Context ...thogonal (E{yy T } = W T E{vv T }W = I). Under this condition, it can be shown [31] that J(W) = E{||v − Wg(W T v)|| 2 } = E{||y − g(y)|| 2 } = p� E{[yi − g(yi)] 2 }. (54) As indicated in Secti=-=on 8.5, [37]-=- proposed a neural network architecture with non-linear activation functions in the hidden layers to estimate non-linear PCAs. 7 Random projections The method of random projections is a simple yet pow... |

154 |
Self-organizing semantic maps
- Ritter, Kohonen
- 1989
(Show Context)
Citation Context ...As. 7 Random projections The method of random projections is a simple yet powerful dimension reduction technique that uses random projection matrices to project the data into lower dimensional spaces =-=[47, 32, 33, 35]. The -=-original data X ∈ R p is transformed to the lower dimensional S ∈ R k , with k ≪ p, via i=1 S = RX, (55) where the columns of R are realizations of independent and identically distributed (i.i.d... |

143 |
Independent Component Analysis - Theory and Applications
- Lee
- 1998
(Show Context)
Citation Context ...mation (and software) on this currently very popular method can be found at various websites, including [6, 24, 49]. Books summarizing the recent advances in the theory and application of ICA include =-=[1, 48, 15, 38]-=-. ICA is a higher-order method that seeks linear projections, not necessarily orthogonal to each other, that are as nearly statistically independent as possible. Statistical independence is a much str... |

140 |
Discrimination and Classification
- Hand
- 1981
(Show Context)
Citation Context |

117 | Dimensionality reduction by random mapping: Fast similarity computation for clustering
- Kaski
(Show Context)
Citation Context ...As. 7 Random projections The method of random projections is a simple yet powerful dimension reduction technique that uses random projection matrices to project the data into lower dimensional spaces =-=[47, 32, 33, 35]. The -=-original data X ∈ R p is transformed to the lower dimensional S ∈ R k , with k ≪ p, via i=1 S = RX, (55) where the columns of R are realizations of independent and identically distributed (i.i.d... |

112 |
Principal Component Neural Networks: Theory and Applications
- Diamantaras, Kung
- 1996
(Show Context)
Citation Context ...utput layers, and are used, for example, in classification. As summarized in [7], there are many types of NN architectures that can extract principal components. More complete details can be found in =-=[9]-=-. For example, a linear, one hidden layer auto-associative perceptron with p input units, k < p hidden units, and p output units, can be trained with back-propagation to find a basis of the subspace s... |

111 |
Introduction to Generalized Linear Models
- Dobson
(Show Context)
Citation Context ...inal features, is called the wrapper method in the machine learning community [34]. Dimension reduction methods related to regression include projection pursuit regression [20, 7], generalized linear =-=[42, 10]-=- and additive [19] models, neural network models, and sliced inverse regression and principal hessian directions [39]. 9 Summary In this paper, we described several dimension reduction methods. Acknow... |

102 |
On Automatic Feature Selection
- Siedlecki, Sklansky
- 1988
(Show Context)
Citation Context |

97 | Data Exploration Using Self-Organizing Maps
- Kaski
- 1997
(Show Context)
Citation Context ...As. 7 Random projections The method of random projections is a simple yet powerful dimension reduction technique that uses random projection matrices to project the data into lower dimensional spaces =-=[47, 32, 33, 35]. The -=-original data X ∈ R p is transformed to the lower dimensional S ∈ R k , with k ≪ p, via i=1 S = RX, (55) where the columns of R are realizations of independent and identically distributed (i.i.d... |

92 | High-dimensional data analysis: The curses and blessings of dimensionality
- Donoho
- 2000
(Show Context)
Citation Context ...riables that are measured on each observation. High-dimensional datasets present many mathematical challenges as well as some opportunities, and are bound to give rise to new theoretical developments =-=[11]. On-=-e of the problems with high-dimensional datasets is that, in many cases, not all the measured variables are “important” for understanding the underlying phenomena of interest. While certain comput... |

51 | Principal Curves Revisited
- Tibshirani
- 1992
(Show Context)
Citation Context ...λ|| 2 is less than a threshold. An alternative formulation of the principal curves method, along with a generalized EM algorithm for its estimation under Gaussian distribution of g(), is presented in=-= [52]. I-=-n general, for the model y = f(λ) + ɛ, where f is smooth and E(ɛ) = 0, f is not necessarily a principal curve. Except for a few special cases, it is not known in general for what type of distributi... |

39 |
Multivariate Analysis. Probability and Mathematical Statistics
- Mardia, Kent, et al.
- 1979
(Show Context)
Citation Context |

35 |
Discarding Variables in a Principal Component Analysis
- Jolliffe
- 1972
(Show Context)
Citation Context ...eep can also be determined by first fixing a threshold λ0, then only keeping the eigenvectors such that their corresponding eigenvalues are greater than λ0. This latter method was found preferable i=-=n [26, 27]-=-, where the author also suggested keeping at least four variables. The interpretation of the PCs can be difficult at times. Although they are uncorrelated variables constructed as linear combinations ... |

31 | A.(1997), A review of dimension reduction techniques, in
- Carreira-Perpinan
(Show Context)
Citation Context ...les [41, 17, 5, 14, 19, 46, 13] in the statistical literature on techniques for analyzing multivariate datasets. Advances in computer science, machine learning [43, 50, 44, 2]. Earlier survey papers. =-=[7]-=- reviews several methods, including principal components analysis, projection pursuit, principal curves, self-organizing maps, as well as provides neural network implementations of some of the reviewe... |

31 |
Generalized Additive Models, volume 43
- Hastie, Tibshirani
- 1990
(Show Context)
Citation Context |

24 | G.The wrapper approach
- Kohavi, John
- 1998
(Show Context)
Citation Context ...h the reduced dimension. A similar approach, selecting the most relevant features by evaluating random subsets of the original features, is called the wrapper method in the machine learning community =-=[34]-=-. Dimension reduction methods related to regression include projection pursuit regression [20, 7], generalized linear [42, 10] and additive [19] models, neural network models, and sliced inverse regre... |

19 | The nonlinear PCA criterion in blind source separation: relations with other approaches
- Karhunen, Pajunen, et al.
- 1998
(Show Context)
Citation Context ...he resulting components are still linear combinations of the original variables. This method can also be thought of as a special case of independent component analysis, Section 5.1.4. As indicated in =-=[31]-=-, there are different formulations of the non-linear PCA. A non-linear PCA criterion for the data vector x = (x1, . . . , xp) T searches for the components s = (s1, . . . , sp) T in the form of s = W ... |

18 | EM optimization of latent-variable density models
- Bishop, Svensén, et al.
- 1996
(Show Context)
Citation Context ...ial density network based on constrained Gaussian mixtures that uses the expectation-maximization (EM) algorithm to estimate the parameters by maximizing the likelihood function. It was introduced in =-=[3]-=-, and, unlike the KSOMs in Section 8.4.1, it provides a rigorous treatment of SOMs under certain assumptions. 8.5 Neural networks Neural networks (NNs) model the set of output variables {yj} d p j=1 i... |

14 |
et al., Self-organization of a massive document collection
- Kohonen, Kaski
(Show Context)
Citation Context |

11 | High dimensional data analysis via the sir/phd approach. Unpublished manuscript dated
- Li
- 2000
(Show Context)
Citation Context ...o regression include projection pursuit regression [20, 7], generalized linear [42, 10] and additive [19] models, neural network models, and sliced inverse regression and principal hessian directions =-=[39]-=-. 9 Summary In this paper, we described several dimension reduction methods. Acknowledgments UCRL-ID-148494. This work was performed under the auspices of the U.S. Department of Energy by University o... |

10 |
Fast non-linear dimension reduction
- Kambhaltla, Leen
- 1994
(Show Context)
Citation Context ... and five layer neural networks, for reducing the dimension of images. It also provides a C software package called NeuralCam implementing those methods. 8.6 Vector quantization As explained in [51], =-=[29]-=- introduced a hybrid non-linear dimension reduction technique based on combining vector quantization for first clustering the data, then applying local PCA on the resulting Voronoi cell clusters. On t... |

10 | Some theoretical results on nonlinear principal component analysis
- Malthouse, Mah, et al.
- 1995
(Show Context)
Citation Context ...od and a Bayesian ensemble learning method for estimation can be found in [30]. 11s8.2 Principal curves Principal curves are smooth curves that pass through the “middle” of multidimensional data s=-=ets [18, 40, 7]-=-. Linear principal curves are in fact principal components, while non-linear principal curves generalize the concept. Given the p-dimensional random vector y = (y1, . . . , yp) with density g(y), and ... |

8 |
at al.: Dimensionality Reduction Using Genetic Algorithms
- Raymer, Punch
(Show Context)
Citation Context ...ective function to evaluate the fitness of candidates, and the values for the parameters of the chosen algorithm, GEAs search the candidate space for the member with the optimal fitness. For example, =-=[45] -=-use GAs in combination with a k-nearest neighbor (knn) classifier to reduce the dimension of a feature set: starting with a population of random transformation matrices {Wk×p} (i) , they use GAs to f... |

2 |
ICA website of J.-F
- Cardoso
(Show Context)
Citation Context ...is section is based on [22], a recent survey on independent component analysis (ICA). More information (and software) on this currently very popular method can be found at various websites, including =-=[6, 24, 49]-=-. Books summarizing the recent advances in the theory and application of ICA include [1, 48, 15, 38]. ICA is a higher-order method that seeks linear projections, not necessarily orthogonal to each oth... |

1 |
et al. ICA website at the
- Hyvvärinen
(Show Context)
Citation Context ...is section is based on [22], a recent survey on independent component analysis (ICA). More information (and software) on this currently very popular method can be found at various websites, including =-=[6, 24, 49]-=-. Books summarizing the recent advances in the theory and application of ICA include [1, 48, 15, 38]. ICA is a higher-order method that seeks linear projections, not necessarily orthogonal to each oth... |

1 |
editors. Independent Component Analaysis: Principles and Practice
- Roberts, Everson
- 2000
(Show Context)
Citation Context ...mation (and software) on this currently very popular method can be found at various websites, including [6, 24, 49]. Books summarizing the recent advances in the theory and application of ICA include =-=[1, 48, 15, 38]-=-. ICA is a higher-order method that seeks linear projections, not necessarily orthogonal to each other, that are as nearly statistically independent as possible. Statistical independence is a much str... |

1 |
et al. ICA website at The Salk Institute. http://www.cnl.salk.edu/∼tewon/ica cnl.html
- Sejnowski
(Show Context)
Citation Context ...is section is based on [22], a recent survey on independent component analysis (ICA). More information (and software) on this currently very popular method can be found at various websites, including =-=[6, 24, 49]-=-. Books summarizing the recent advances in the theory and application of ICA include [1, 48, 15, 38]. ICA is a higher-order method that seeks linear projections, not necessarily orthogonal to each oth... |

1 |
Dimension reduction of images using neural networks
- Spierenburg
- 1997
(Show Context)
Citation Context ...or function [2]. Many, traditional and more recent, linear and non-linear, dimension reduction techniques can be implemented using neural networks with different architectures and learning algorithms =-=[2, 46, 40, 51, 7]-=-. The simplest NN has three layers: the input layer, one hidden (bottleneck) layer, and the output layer. First, to obtain the data at node h of the hidden layer, the inputs xi are combined through we... |