#### DMCA

## Feature Extraction and Dimension Reduction with Applications to Classification and the Analysis of Co-occurrence Data (2001)

### Citations

5960 |
Classification and regression trees
- Breiman, Friedman, et al.
- 1984
(Show Context)
Citation Context ...TURE EXTRACTION 62 confined to be a univariate problem. Consequently, our illustrations will be focused on these two algorithms as well. Example 12 (Waveform) The waveform data, originally taken from =-=[2]-=- and available from the UCI machine-learning repository FTP site 1 , is a famous dataset in the machinelearning community. There are 3 classes and 21 predictors. The 21 predictors for the 3 -14 -12 -1... |

3797 |
Density Estimation for Statistics and Data Analysis
- Silverman
- 1986
(Show Context)
Citation Context ...l density and its first two derivatives. This is a relatively easy problem when pk belongs to a specific parametric family. When pk is modeled non-parametrically, it is also a manageable problem (see =-=[36]-=- and Appendix C). We do this once conditionally on each class to obtain fk,f ′ k ′′ ′ ,f k , and once unconditionally over the entire training sample to obtain f,f ,f ′′ . There are many different met... |

3169 | Generalized Linear Models - McCullagh, JA - 1989 |

2451 | Generalized Additive Models
- Hastie, Tibshirani
- 1990
(Show Context)
Citation Context ...rdize z. 4. Alternate between steps (2) and (3) until convergence. of this iteration procedure is the well-known power method to find the solutions to the eigen-equations above. For more details, see =-=[19]-=-, p. 197-198. Example 2 (Ecological Ordination and Gaussian Response Model) In the ecological ordination problem, Y = {yik} is a matrix whose element measures the abundance of species k at site i. Nat... |

1982 |
Practical Optimization
- Gill, Murray, et al.
- 1981
(Show Context)
Citation Context ...ally expensive or simply impossible. There are different algorithms for constructing the matrix G at each iteration. However, we will not go into any of the details here. More details can be found in =-=[16]-=-. B.3 Choice of Step Size Once a descent direction is fixed, finding the step-size is a (relatively simple) univariate problem. Various line-search methods can be used to solve this problem, such as t... |

1746 | Addtive logistic regression: A statistical view of boosting (with discussions
- Friedman, Hastie, et al.
- 2000
(Show Context)
Citation Context ...e methods in chapter 7. 1yi=k.sCHAPTER 3. CLASSIFICATION 41 Recently, majority-vote classifiers have received a tremendous amount of attention, such as Bagging (see e.g., [3]) and Boosting (see e.g., =-=[13]-=-). These methods work by iteratively resampling the data B times, building a separate classifier each time, and taking a majority vote (among all B classifiers) in the end. Bagging resamples the data ... |

1476 |
Multivariate analysis
- Mardia, Kent, et al.
- 1979
(Show Context)
Citation Context ... eigenvalue of 1 is a trivial vector of one’s, i.e., 1 = (1,1,...,1) T . Hence one identifies the eigenvectors corresponding to the next largest eigenvalue as the best solution. For more details, see =-=[30]-=-, p. 237-239, and [17],sCHAPTER 1. INTRODUCTION 4 section 4.2. This technique is known as correspondence analysis. The scores zi and uk are scores on a (latent) ordination axis, a direction in which t... |

1472 |
Pattern Recognition and Neural Networks
- Ripley
- 1996
(Show Context)
Citation Context ...direct multivariate approach that considers all the variables simultaneously. 4.3 Feature Extraction The best definition of the feature extraction problem that we can find is given by Brian Ripley in =-=[33]-=-, section 10.4: Feature extraction is generally used to mean the construction of linear combinations α T x of continuous features which have good discriminatory power between classes. Every textbook i... |

489 |
Theory and Applications of Correspondence Analysis
- Greenacre
- 1984
(Show Context)
Citation Context ...trivial vector of one’s, i.e., 1 = (1,1,...,1) T . Hence one identifies the eigenvectors corresponding to the next largest eigenvalue as the best solution. For more details, see [30], p. 237-239, and =-=[17]-=-,sCHAPTER 1. INTRODUCTION 4 section 4.2. This technique is known as correspondence analysis. The scores zi and uk are scores on a (latent) ordination axis, a direction in which the items can be ordere... |

465 | Regularized discriminant analysis - FRIEDMAN - 1989 |

396 | Canonical correspondence analysis: a new eigenvector method for multivariate direct gradient analysis. Ecology 67
- Braak, F
- 1986
(Show Context)
Citation Context ...INANT ANALYSIS 16 one being a linear function of the form or in matrix form zi = d� m=1 z = Xα. αmxim, This is known as canonical correspondence analysis (CCA) and was first developed by Ter Braak in =-=[38]-=-. The vector α defines (concretely) an ordination axis in the space spanned by the covariates and is a direction in which the ηk’s and ξi’s can be easily differentiated. An immediate advantage of this... |

381 | Sliced inverse regression for dimension reduction
- LI
- 1991
(Show Context)
Citation Context ...using the Sliced Average Variance Estimator (SAVE) to find discriminant directions for QDA. SAVE is a variant of a well-known dimension reduction technique called Sliced Inverse Regression (SIR) (see =-=[28]-=-), and was first proposed by Cook in [4]. In particular, SAVE works with standardized variables so that S △ = var(x) = I. Let Sk be the covariance matrix for class k, as above; SAVE then applies an ei... |

320 | Exploratory projection pursuit
- Friedman
- 1987
(Show Context)
Citation Context ...eginning to get into a local neighborhood of the true maximum and use Newton steps only in the local neighborhood. This strategy is adopted, for example, by Friedman in exploratory projection pursuit =-=[11]-=-. We will come back to projection pursuit later. � 4.6 Finding Multiple Features Fisher’s LDA finds more than one discriminant direction. Since it is an eigen-decomposition problem, one usually goes b... |

299 |
Projection pursuit
- Huber
- 1985
(Show Context)
Citation Context ...= pm−1(x) p(α) (αT x) p (α) m−1 (αT x) . quite cumbersome. A computationally more feasible method appeared three years later in [11]. 6.1.2 Backward Algorithm and Its Difficulties In 1985, Huber (see =-=[25]-=-) discussed an alternative approach. Instead of constructing p(x) in a forward fashion (called “synthetic” in [25]) from an initial guess p0(x), the whole process can be turned backwards (called “anal... |

271 |
Local Regression and Likelihood
- Loader
- 1999
(Show Context)
Citation Context ...odule is needed for non-parametric density estimation (as well as the derivatives). In our Splus implementation, we use the Locfit library provided by Clive Loader. More details are in Appendix C and =-=[29]-=-. � 4.5 Illustration Therefore the basic feature extraction problem for us is the following optimization problem: max α LR(α). This is the basis of all numerical procedures in this thesis. In section ... |

229 | Sparse discriminant analysis - Clemmensen, Trevor, et al. |

214 | Discriminant analysis by gaussian mixtures
- Hastie, Tibshirani
- 1996
(Show Context)
Citation Context ...ch constitute the main segments of the thesis. In chapter 7, we revisit a related approach in discriminant analysis: namely, mixture discriminant analysis (MDA), developed by Hastie and Tibshirani in =-=[20]-=-. We focus on an extension to MDA, previously outlined in [22] but not fully studied. It turns out that this extension actually corresponds to a natural generalization of Hofmann’s aspect model when w... |

142 | Flexible Discriminant Analysis by Optimal Scoring - Hastie, Tibshirani, et al. - 1993 |

140 |
The Use of Multiple Measurements
- Fisher
- 1936
(Show Context)
Citation Context ...INANT ANALYSIS 18 is known as the Mahalanobis distance from a point x to class k. 2.2.2 Linear Discriminant Directions There is another way to formulate the LDA problem, first introduced by Fisher in =-=[8]-=-. Given data {yi,xi} n i=1 , where yi ∈ {1,2,...,K} is the class label and xi ∈ R d is a vector of predictors, we look for a direction α ∈ R d in the predictor space in which the classes are separated... |

125 |
Common Principal Components and Related Multivariate Methods,
- Flury
- 1988
(Show Context)
Citation Context ...an then be achieved by forcing some (but not all) of these elements to be identical across classes. This work is based on an earlier predecessor called the common principle component (CPC) model (see =-=[9]-=-). For K groups, the CPC models them as N(µk,Σk), where Σk = BDkB T and Dk is a diagonal matrix, so that the column vectors of B define the set of common principal components. A detailed description o... |

113 |
Local likelihood estimation.
- Tibshirani, Hastie
- 1987
(Show Context)
Citation Context ...spectively. � ∞ z 2 w(z)dz < ∞; −∞ and ˆ f ′′ (z) = 1 nb 3 132 n� w ′′ � � z − zi b i=1sAPPENDIX C. DENSITY ESTIMATION 133 C.2 The Locfit Package in Splus Based on the theory of local likelihood (see =-=[39]-=-), the Locfit package in Splus is a recent development due to Clive Loader [29], and it is the package that we used in our implementation. Local likelihood estimates have some advantages over plain-va... |

74 |
Correspondence analysis: A neglected multivariate method.
- HILL
- 1974
(Show Context)
Citation Context ...re, zi = � K k=1 yikuk � K k=1 yik and uk = �I i=1 yikzi �I i=1 yik . It is well-known that the solution for the reciprocal averaging equations above can be obtained from a simple eigen-analysis (see =-=[23]-=-). In matrix notation, we can write the above equations as z ∝ A −1 Y u and u ∝ B −1 Y T z, where A = diag(yi·) with yi· = � K k=1 yik, and B = diag(y·k) with y·k = � I i=1 yik. Thus, we have z ∝ A −1... |

73 | Projection pursuit density estimation. - Friedman, Stuetzle, et al. - 1984 |

54 | Regularized Gaussian discriminant analysis through eigenvalue decomposition,”
- Bensmail, Celeux
- 1996
(Show Context)
Citation Context ...rrecting this problem. diksCHAPTER 3. CLASSIFICATION 36 Another (more classical) approach for regularization is to impose some structure on the covariance matrices. A systematic treatment is given in =-=[1]-=-, where structures of various levels of flexibility are imposed on the covariance matrices based on their eigen-decompositions: Σk = λkBkDkB T k . Here λk normalizes Dk so that det(Dk) = 1. This schem... |

44 |
Dimension reduction and visualization in discriminant analysis (with discussion).
- Cook, Yin
- 1994
(Show Context)
Citation Context ...e will focus on the special case of pk(x) ∼ N(µk,Σk), Throughout our presentation, we also compare our special variation with a competitive approach, recently proposed by Cook and others (see [4] and =-=[5]-=-), called Sliced Average Variance Estimator (SAVE). We show that SAVE is a special approximation to our approach and in fact, does not work as well as ours. In particular, SAVE, as a generalization to... |

42 | On nonlinear functions of linear combinations,”
- Diaconis, Shahshahani
- 1984
(Show Context)
Citation Context ...op into a back-fitting loop, which is then embedded into the standard IRLS loop for logistic regression. See [34] for more details. Theoretically, this model is important because it has been shown in =-=[7]-=- that the righthand side is flexible enough to approximate any smooth function provided that M is large enough. The work which we shall present in Chapter 6 is closely related to a special case of thi... |

38 |
Correspondence Analysis of Incidence and Abundance Data: Properties in Terms of a Unimodal Response Model,”
- Braak, F
- 1985
(Show Context)
Citation Context ...the relationship between environment and species. Let zi be the environmental score that site i receives and uk be the optimal environmental score for species k. Then the Gaussian response model (see =-=[37]-=-) says that yik ∼ Poisson(λik), where λik depends on the scores zi and uk through log λik = ak − (zi − uk) 2 . The parameter tk is called the tolerance of species k. So the rate of occurrence of speci... |

35 |
Bagging predictors.
- Brieman
- 1996
(Show Context)
Citation Context ...shall say more about prototype methods in chapter 7. 1yi=k.sCHAPTER 3. CLASSIFICATION 41 Recently, majority-vote classifiers have received a tremendous amount of attention, such as Bagging (see e.g., =-=[3]-=-) and Boosting (see e.g., [13]). These methods work by iteratively resampling the data B times, building a separate classifier each time, and taking a majority vote (among all B classifiers) in the en... |

34 |
Discussion of Li
- Cook, Weisberg
- 1991
(Show Context)
Citation Context ...apter, we will focus on the special case of pk(x) ∼ N(µk,Σk), Throughout our presentation, we also compare our special variation with a competitive approach, recently proposed by Cook and others (see =-=[4]-=- and [5]), called Sliced Average Variance Estimator (SAVE). We show that SAVE is a special approximation to our approach and in fact, does not work as well as ours. In particular, SAVE, as a generaliz... |

22 | Flexible Discriminant by Mixture Models
- Hastie, Tibshirani
- 1996
(Show Context)
Citation Context ...e revisit a related approach in discriminant analysis: namely, mixture discriminant analysis (MDA), developed by Hastie and Tibshirani in [20]. We focus on an extension to MDA, previously outlined in =-=[22]-=- but not fully studied. It turns out that this extension actually corresponds to a natural generalization of Hofmann’s aspect model when we have covariates on each ξi ∈ X.sChapter 2 Co-occurrence Data... |

14 | Projection pursuit discriminant analysis.
- Polzehl
- 1995
(Show Context)
Citation Context ... 2 12.9% 3 14.2% 4 14.9% Table 6.1: Misclassification error of algorithm 6.4 on 15,000 new observations. The Bayes error for this problem is 12.5%. of the problem! � Remark 6.4 We later discovered in =-=[32]-=- a similar attempt to use projection pursuit density estimation for non-parametric discriminant analysis. However, there are some differences. To choose the best ridge modification, we rely on the mat... |

8 |
The Elements of Statistical Learning: Prediction, Inference and Data Mining. 2nd edn
- Hastie
- 2009
(Show Context)
Citation Context ...that of basis expansion is used to produce non-linear decision boundaries. Though popular and fascinating on its own, SVM is not as relevant to this thesis. More details can be found, for example, in =-=[14]-=-. � 3.2.4 Non-Parametric Discrimination More generally, one does not have to assume that the class densities are Gaussian. With more flexible density functions, the decision boundaries also become mor... |

7 |
Statistical models for co-occurence data
- Hofmann, Puzicha
(Show Context)
Citation Context ... in products 3, 4, and 10 as well. � 1.3 Aspect Model Instead of assigning scores to the categories, one can model the probabilities of co-occurrence directly. One such model is the aspect model (see =-=[24]-=-), which is based on partitioning the space of X × Y into disjoint regions indexed by several latent classes, cα = {c1,c2,...,cJ }. Conditional on the latent class, the occurrence probabilities of ξi ... |

5 | Logisitc response projection pursuit
- Roosen, Hastie
- 1993
(Show Context)
Citation Context ...d to estimate f ′ km as well. Therefore this model is fitted by embedding a Gauss-Newton loop into a back-fitting loop, which is then embedded into the standard IRLS loop for logistic regression. See =-=[34]-=- for more details. Theoretically, this model is important because it has been shown in [7] that the righthand side is flexible enough to approximate any smooth function provided that M is large enough... |

3 | Canonical correspondence analysis as an approximation to Gaussian ordination
- Johnson, Altman
- 1999
(Show Context)
Citation Context ...ble? This will be our main topic. To appreciate the practical value of this problem, we notice that in ecological ordination applications, for example, there is already some empirical evidence (e.g., =-=[27]-=-) that the response curves for various botanical species can be multi-modal. The phenomenon of multi-modality is, in fact, quite common, as we illustrate through the example below. Example 4 (Targeted... |

2 |
Majority-Vote Classifiers: Theory and Applications.
- James
- 1998
(Show Context)
Citation Context ...mples the data by taking simple Bootstrap samples, whereas Boosting adaptively puts more weight on the easily misclassified training points with each iteration. A good review can be found in [14] and =-=[26]-=-. These wonderful classifiers, however, do not bear a direct relationship with this thesis.sChapter 4 A General Methodology for Feature Extraction In section 2.2, we saw that LDA can find features tha... |

1 |
Discriminative vs. Informative Classification
- Rubenstein
- 1998
(Show Context)
Citation Context ...derations. 3.3 Connections The materials in sections 3.1 and 3.2 are closely related. Different models of pk will lead to different models of gk. A detailed study of these connections can be found in =-=[35]-=-. Example 9 (LDA and Logit) In LDA, pk ∼ N(µk,Σ). One can easily work out the implied posterior odds: log P(y = k|x) P(y = K|x) πk = log πK = log πk πK △ = βk0 + β T x + log pk(x) pK(x) + (µk − µK) T ... |