## Nonparametric Weighted Feature Extraction for Classification (2004)

Venue: | IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING |

Citations: | 19 - 2 self |

### BibTeX

@ARTICLE{Kuo04nonparametricweighted,

author = {Bor-chen Kuo and David A. Landgrebe},

title = {Nonparametric Weighted Feature Extraction for Classification},

journal = {IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING},

year = {2004},

volume = {42},

number = {5},

pages = {1096--1105}

}

### OpenURL

### Abstract

### Citations

2865 |
Introduction to Statistical Pattern Recognition
- Fukunaga
- 1990
(Show Context)
Citation Context ...tric Feature Extraction Discriminant Analysis Feature Extraction (DAFE) is often used for dimension reduction in classification problems. It is also called the parametric feature extraction method in =-=[10]-=-, since DAFE uses the mean vector and covariance matrix of each class. Usually within-class, betweenclass, and mixture scatter matrices are used to formulate the criteria of class separability. A with... |

334 | Regularized discriminant analysis
- Friedman
- 1987
(Show Context)
Citation Context ... and Mapping Agency under grant NMA 201-01-C-0023 - 3 - 6/18/04sA. Dimensionality reduction by feature extraction or feature selection. B. Regularization of the class sample covariance matrices (e.g. =-=[5]-=-, [6], [7], [8], [9]). C. Structurization of a true covariance matrix described by a small number of parameters [4]. Group C is useful when the property and structure of the true covariance are known;... |

180 |
On the mean accuracy of statistical pattern recognizers
- Hughes
- 1968
(Show Context)
Citation Context ...ning samples are available to accurately estimate the class quantitative description is frequently not satisfied for high-dimensional data. Small training sets usually result in the Hughes phenomenon =-=[3]-=- and singularity problems. There are several ways to overcome these problems. In [4], these techniques are categorized into three groups: 5 Feature Selection 3 Determine Quantitative Class Description... |

150 | An algorithm for generalized matrix eigenvalue problems - MOLER - 1973 |

72 | Multiclass Linear Dimension Reduction by Weighted Pairwise Fisher Criteria
- Loog, Duin, et al.
- 2001
(Show Context)
Citation Context ...ue Perturbation to the within-class scatter matrix to solve the generalized eigenvalue problem [17]. i ≠ j - 5 - 6/18/04sApproximated pairwise accuracy criterion Linear Dimension Reduction (aPAC-LDR) =-=[18]-=- can be seen as DAFE weighted contributions of individual class pairs according to the Euclidian distance of respective class means. The major difference between DAFE and aPAC-LDR is that the Fisher c... |

66 |
Nonparametric Discriminant Analysis
- Fukunaga, Mantock
- 1997
(Show Context)
Citation Context ...ormulate the scatter matrices, hence it still suffers from those three major disadvantages of DAFE. - 6 - 6/18/04s2.2 Nonparametric Discriminant Analysis Nonparametric Discriminant Analysis (NDA) [10]=-=[20]-=- was proposed to solve the problems of DAFE. In NDA, the between-class scatter matrix is redefined as a new nonparametric between-class scatter matrix (for the 2 classes problem), denoted S NDA b = P ... |

56 |
An optimal set of discriminant vectors
- Foley
- 1975
(Show Context)
Citation Context ...hat if the within-class covariance is singular, which often occurs in high dimensional problems, DAFE will have a poor performance on classification. Foley-Sammon feature extraction and its extension =-=[13]-=-[14][15][19] can help to extract more than L-1 orthogonal features from n-dimensional space based on the following: € r i = max r r T DA Sb r r T , i =1,2,...,n −1 DA Sw r T DA subject to ri S w rj = ... |

46 |
Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data
- Krzanowski, Jonathan, et al.
- 1995
(Show Context)
Citation Context ...under grant NMA 201-01-C-0023 - 3 - 6/18/04sA. Dimensionality reduction by feature extraction or feature selection. B. Regularization of the class sample covariance matrices (e.g. [5], [6], [7], [8], =-=[9]-=-). C. Structurization of a true covariance matrix described by a small number of parameters [4]. Group C is useful when the property and structure of the true covariance are known; otherwise, methods ... |

42 |
Information Extraction Principles and Methods for Multispectral and Hyperspectral Image Data
- Landgrebe
- 2000
(Show Context)
Citation Context ...ion, discriminant analysis, nonparametric feature 1. Introduction Among the ways to approach high dimensional data classification, a useful processing model that has evolved in the last several years =-=[1,2]-=- is shown schematically in Figure 1. Given the availability of data (box 1), the process begins by the analyst specifying what classes are desired, usually by labeling training samples for each class ... |

30 |
Optimal discriminant plane for a small number of samples and design method of classifier on the plane," Pattern recognition
- hong, Yang
- 1991
(Show Context)
Citation Context ... estimators in the estimating procedure of the within-class scatter matrix [16] or by adding Singular Value Perturbation to the within-class scatter matrix to solve the generalized eigenvalue problem =-=[17]-=-. i ≠ j - 5 - 6/18/04sApproximated pairwise accuracy criterion Linear Dimension Reduction (aPAC-LDR) [18] can be seen as DAFE weighted contributions of individual class pairs according to the Euclidia... |

28 |
An optimal orthonormal system for discriminant analysis
- Okadada, Tomita
- 1985
(Show Context)
Citation Context ...if the within-class covariance is singular, which often occurs in high dimensional problems, DAFE will have a poor performance on classification. Foley-Sammon feature extraction and its extension [13]=-=[14]-=-[15][19] can help to extract more than L-1 orthogonal features from n-dimensional space based on the following: € r i = max r r T DA Sb r r T , i =1,2,...,n −1 DA Sw r T DA subject to ri S w rj = 0, T... |

21 | Classification of high dimensional data with limited training samples. Available online: http://dynamo.ecn.purdue.edu/ landgreb/ Saldju_TR.pdf
- Tadjudin, Landgrebe
- 1998
(Show Context)
Citation Context ...ency under grant NMA 201-01-C-0023 - 3 - 6/18/04sA. Dimensionality reduction by feature extraction or feature selection. B. Regularization of the class sample covariance matrices (e.g. [5], [6], [7], =-=[8]-=-, [9]). C. Structurization of a true covariance matrix described by a small number of parameters [4]. Group C is useful when the property and structure of the true covariance are known; otherwise, met... |

16 |
Covariance pooling and stabilization for classification
- Rayens, Greene
- 1991
(Show Context)
Citation Context ...Mapping Agency under grant NMA 201-01-C-0023 - 3 - 6/18/04sA. Dimensionality reduction by feature extraction or feature selection. B. Regularization of the class sample covariance matrices (e.g. [5], =-=[6]-=-, [7], [8], [9]). C. Structurization of a true covariance matrix described by a small number of parameters [4]. Group C is useful when the property and structure of the true covariance are known; othe... |

15 | Linear discriminant analysis for two classes via removal of classification structure
- Aladjem
- 1997
(Show Context)
Citation Context ...rmances of using DAFE, NDA, DAM-P, DAM-NP, DAM-N and NWFE features applied to Gaussian, 2NN, and Parzen classifiers. In Experiment 1, only the simultaneous orthogonal feature extraction method (ORTH, =-=[27]-=-) is used. In Experiment 2, ORTH and the successive extracting method (REM, [27]) are used. Euclidean distance and 2NN are used in NDA, DAM-P, DAM-NP, DAM-N, and kNN classifier. The grid method is use... |

13 |
Landgrebe,“Covariance matrix estimation and classification with limited training data
- Hoffbeck, A
- 1996
(Show Context)
Citation Context ...ng Agency under grant NMA 201-01-C-0023 - 3 - 6/18/04sA. Dimensionality reduction by feature extraction or feature selection. B. Regularization of the class sample covariance matrices (e.g. [5], [6], =-=[7]-=-, [8], [9]). C. Structurization of a true covariance matrix described by a small number of parameters [4]. Group C is useful when the property and structure of the true covariance are known; otherwise... |

8 | Improved Statistics Estimation And Feature Extraction For Hyperspectral Data Classification
- Kuo, Landgrebe
- 2001
(Show Context)
Citation Context ... i =1,2,...,n −1 DA Sw r T DA subject to ri S w rj = 0, This third limitation can be relieved by using regularized covariance estimators in the estimating procedure of the within-class scatter matrix =-=[16]-=- or by adding Singular Value Perturbation to the within-class scatter matrix to solve the generalized eigenvalue problem [17]. i ≠ j - 5 - 6/18/04sApproximated pairwise accuracy criterion Linear Dimen... |

7 |
On an extended Fisher criterion for feature selection
- Malina
- 1981
(Show Context)
Citation Context ...e is small, NDA will have the singularity problem.s2.3 Discriminant Analysis Using Malina’s Criterion In [21] and [22], the criterion function of DAFE and NDA was modified based on Malina’s criterion =-=[23]-=-,[24],[25], and DAM is used for representing it. For a 2-class classification problem, the following criterion was proposed for J DAM = T T ( i− j) ( 1− β ) r S r + β r S b r where r is the vector fea... |

6 |
A note on the orthonormal discriminant vector method for feature extraction. Pattern recognition
- Hamamoto, Matsuura, et al.
- 1991
(Show Context)
Citation Context ...ithin-class covariance is singular, which often occurs in high dimensional problems, DAFE will have a poor performance on classification. Foley-Sammon feature extraction and its extension [13][14][15]=-=[19]-=- can help to extract more than L-1 orthogonal features from n-dimensional space based on the following: € r i = max r r T DA Sb r r T , i =1,2,...,n −1 DA Sw r T DA subject to ri S w rj = 0, This thir... |

6 |
Parametric and nonparametric linear mappings of multidimensional data
- Aladjem
- 1991
(Show Context)
Citation Context ...e within-class scatter matrix in NDA is still with a parametric form. When the training set size is small, NDA will have the singularity problem.s2.3 Discriminant Analysis Using Malina’s Criterion In =-=[21]-=- and [22], the criterion function of DAFE and NDA was modified based on Malina’s criterion [23],[24],[25], and DAM is used for representing it. For a 2-class classification problem, the following crit... |

6 |
Multiclass discriminant mappings
- Aladjem
- 1994
(Show Context)
Citation Context ...) d ( x , x ) + d ( x , x ) l kNN ) ( j) where α is a control parameter between zero and infinity, and d( x , x ) is the Euclidean distance from (i) x l to its kNN point in class j. Based on [18] and =-=[26]-=-, the final discrete form of within and between-class scatter matrices for multi-class problem are expressed by S NDA b S = NDA w L ∑ i= 1 = P L ∑ i j = 1 j ≠i L ∑ i= 1 P N i ∑ l = 1 N i ∑ i l= 1 l kN... |

5 | Canonical variate analysis—a general model formulation. Australian journal of statistics - Campbell - 1984 |

3 |
Signal Theory Methods
- Landgrebe
- 2003
(Show Context)
Citation Context ...ion, discriminant analysis, nonparametric feature 1. Introduction Among the ways to approach high dimensional data classification, a useful processing model that has evolved in the last several years =-=[1,2]-=- is shown schematically in Figure 1. Given the availability of data (box 1), the process begins by the analyst specifying what classes are desired, usually by labeling training samples for each class ... |

2 |
PNM: A program for parametric and nonparametric mapping of multidimensional data
- Aladjem
- 1991
(Show Context)
Citation Context ...class scatter matrix in NDA is still with a parametric form. When the training set size is small, NDA will have the singularity problem.s2.3 Discriminant Analysis Using Malina’s Criterion In [21] and =-=[22]-=-, the criterion function of DAFE and NDA was modified based on Malina’s criterion [23],[24],[25], and DAM is used for representing it. For a 2-class classification problem, the following criterion was... |

2 |
Regularized feature extractions for hyperspectral data classification
- Kuo, Ko, et al.
- 2003
(Show Context)
Citation Context ... with largest f eigenvalues of the following matrix: ( S NW w ) −1 S NW b To reduce the effect of the cross products of within-class distances and prevent the singularity, some regularized techniques =-=[5, 29]-=-, can be applied to within-class scatter matrix. In this study, within-class scatter matrix is regularized by € Finally, the NWFE algorithm is NW NW NW Sw = 0.5Sw + 0.5diag( Sw ), where diag(A) means ... |

1 |
An optimal transformation for discriminant analysis and principal component analysis
- Duchene, Leclercq
- 1988
(Show Context)
Citation Context ...he within-class covariance is singular, which often occurs in high dimensional problems, DAFE will have a poor performance on classification. Foley-Sammon feature extraction and its extension [13][14]=-=[15]-=-[19] can help to extract more than L-1 orthogonal features from n-dimensional space based on the following: € r i = max r r T DA Sb r r T , i =1,2,...,n −1 DA Sw r T DA subject to ri S w rj = 0, This ... |

1 |
Some multiclass Fisher feature selection algorithms and their comparison with Karhunen-Loeve algorithms
- Malina
- 1987
(Show Context)
Citation Context ...small, NDA will have the singularity problem.s2.3 Discriminant Analysis Using Malina’s Criterion In [21] and [22], the criterion function of DAFE and NDA was modified based on Malina’s criterion [23],=-=[24]-=-,[25], and DAM is used for representing it. For a 2-class classification problem, the following criterion was proposed for J DAM = T T ( i− j) ( 1− β ) r S r + β r S b r where r is the vector feature ... |

1 |
Two-parameter Fisher criterion
- Malina
(Show Context)
Citation Context ..., NDA will have the singularity problem.s2.3 Discriminant Analysis Using Malina’s Criterion In [21] and [22], the criterion function of DAFE and NDA was modified based on Malina’s criterion [23],[24],=-=[25]-=-, and DAM is used for representing it. For a 2-class classification problem, the following criterion was proposed for J DAM = T T ( i− j) ( 1− β ) r S r + β r S b r where r is the vector feature that ... |

1 |
PRTools, a Matlab Toolbox for Pattern Recognition", August 2002. (Available for download from http://www.ph.tn.tudelft.nl/prtools
- R
(Show Context)
Citation Context ...n NDA, DAM-P, DAM-NP, DAM-N, and kNN classifier. The grid method is used for successively finding optimal β 1 (first feature) and β 2 (second feature) in DAM cases . Besides, all classifiers are from =-=[28]-=-. There are four different real data sets, Cuprite: a site of geologic interest in western Nevada, Jasper Ridge: a site of ecological interest in California, Indian Pine: a mixed forest/agricultural s... |