## Worst-Case Linear Discriminant Analysis

Citations: | 2 - 0 self |

### BibTeX

@MISC{Zhang_worst-caselinear,

author = {Yu Zhang and Dit-yan Yeung},

title = {Worst-Case Linear Discriminant Analysis},

year = {}

}

### OpenURL

### Abstract

Dimensionality reduction is often needed in many applications due to the high dimensionality of the data involved. In this paper, we first analyze the scatter measures used in the conventional linear discriminant analysis (LDA) model and note that the formulation is based on the average-case view. Based on this analysis, we then propose a new dimensionality reduction method called worst-case linear discriminant analysis (WLDA) by defining new between-class and within-class scatter measures. This new model adopts the worst-case view which arguably is more suitable for applications such as classification. When the number of training data points or the number of features is not very large, we relax the optimization problem involved and formulate it as a metric learning problem. Otherwise, we take a greedy approach by finding one direction of the transformation at a time. Moreover, we also analyze a special case of WLDA to show its relationship with conventional LDA. Experiments conducted on several benchmark datasets demonstrate the effectiveness of WLDA when compared with some related dimensionality reduction methods. 1

### Citations

4137 |
L.: Convex Optimization
- Boyd, Vandenberghe
- 2004
(Show Context)
Citation Context ...ℂ ... |

2326 |
Principal Component Analysis
- Jolliffe
- 1986
(Show Context)
Citation Context ...underlying many dimensionality reduction techniques is that the most useful information in many high-dimensional datasets resides in a low-dimensional latent space. Principal component analysis (PCA) =-=[8]-=- and linear discriminant analysis (LDA) [7] are two classical dimensionality reduction methods that are still widely used in many applications. PCA, as an unsupervised linear dimensionality reduction ... |

1677 | Eigenfaces vs. fisherfaces: recognition using class specific linear projection
- Belhumeur, Hespanha, et al.
- 1997
(Show Context)
Citation Context ...0.0211 hayes-roth 0.3125 0.3104 0.3104 0.2958 0.3050 waveform 0.1861 0.1865 0.2261 0.2303 0.1671 mfeat-factors 0.0732 0.0518 0.0868 0.0817 0.0250 of the ambient image space. Fisherface (based on LDA) =-=[2]-=- is one representative dimensionality reduction method. We use three face databases, ORL [2], PIE [17] and AR [13], and one object database, COIL [15], in our experiments. In the AR face database, 2,6... |

728 |
UCI Machine Learning Repository
- Asuncion, Newman
- 2007
(Show Context)
Citation Context ...than 100, the optimization method in Section 2.2 or 2.3 is used depending on which one is smaller; otherwise, we use the greedy method in Section 2.4. 4.1 Experiments on UCI Datasets Ten UCI datasets =-=[1]-=- are used in the first set of experiments. For each dataset, we randomly select 70% to form the training set and the rest for the test set. We perform 10 random splits and report in Table 2 the averag... |

552 | Distance metric learning, with application to clustering with side-information
- Xing, Ng, et al.
- 2002
(Show Context)
Citation Context ...ong all columns of W. 2.2 Optimization Procedure Since problem (4) is not easy to optimize with respect to W, we resort to formulate this dimensionality reduction problem as a metric learning problem =-=[22, 21, 4]-=-. We define a new variable Σ = WW ... |

366 | Distance metric learning for large margin nearest neighbor classification
- Weinberger, Blitzer, et al.
- 2006
(Show Context)
Citation Context ...ong all columns of W. 2.2 Optimization Procedure Since problem (4) is not easy to optimize with respect to W, we resort to formulate this dimensionality reduction problem as a metric learning problem =-=[22, 21, 4]-=-. We define a new variable Σ = WW ... |

246 | The cmu pose, illumination, and expression database
- Sim, Baker, et al.
(Show Context)
Citation Context ...t-factors 0.0732 0.0518 0.0868 0.0817 0.0250 of the ambient image space. Fisherface (based on LDA) [2] is one representative dimensionality reduction method. We use three face databases, ORL [2], PIE =-=[17]-=- and AR [13], and one object database, COIL [15], in our experiments. In the AR face database, 2,600 images of 100 persons (50 men and 50 women) are used. Before the experiment, each image is converte... |

167 | Information-theoretic metric learning
- Davis, Kulis, et al.
- 2007
(Show Context)
Citation Context ...ong all columns of W. 2.2 Optimization Procedure Since problem (4) is not easy to optimize with respect to W, we resort to formulate this dimensionality reduction problem as a metric learning problem =-=[22, 21, 4]-=-. We define a new variable Σ = WW ... |

141 |
Applications of second-order cone programming,” Linear Algebra its Appl
- Lobo, Vandenberghe, et al.
- 1998
(Show Context)
Citation Context ... ... |

106 | Columbia object image library (COIL-20
- Nene, Nayar, et al.
- 1996
(Show Context)
Citation Context ...the ambient image space. Fisherface (based on LDA) [2] is one representative dimensionality reduction method. We use three face databases, ORL [2], PIE [17] and AR [13], and one object database, COIL =-=[15]-=-, in our experiments. In the AR face database, 2,600 images of 100 persons (50 men and 50 women) are used. Before the experiment, each image is converted to gray scale and normalized to a size of 33 ×... |

103 |
The concaveconvex procedure
- Yuille, Rangarajan
- 2003
(Show Context)
Citation Context ... a metric learning problem. In case both the number of training data points and the number of features are large, we propose a greedy approach based on the constrained concave-convex procedure (CCCP) =-=[24, 18]-=- to find one direction of the transformation at a time with the other directions fixed. Moreover, we also analyze a special case of WLDA to show its relationship with conventional LDA. We will report ... |

62 | Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices
- Overton, Womersley
- 1993
(Show Context)
Citation Context ...nt in problem (4) is non-convex with respect to W and cannot be expressed in terms of Σ. We define a set ℳ... |

56 |
An optimal set of discriminant vectors
- Foley
- 1975
(Show Context)
Citation Context ...blems but not more general multi-class problems. The orthogonality constraint on the transformation matrix W has been widely used by dimensionality reduction methods, such as Foley-Sammon LDA (FSLDA) =-=[6, 5]-=- and orthogonal LDA [23]. The orthogonality constraint can help to eliminate the redundant information in W. This has been shown to be effective for dimensionality reduction. 4 Experimental Validation... |

56 | Kernel methods for missing variables
- Smola, Vishwanathan, et al.
- 2005
(Show Context)
Citation Context ... a metric learning problem. In case both the number of training data points and the number of features are large, we propose a greedy approach based on the constrained concave-convex procedure (CCCP) =-=[24, 18]-=- to find one direction of the transformation at a time with the other directions fixed. Moreover, we also analyze a special case of WLDA to show its relationship with conventional LDA. We will report ... |

50 | Constructing Descriptive and Discriminative Nonlinear Features: Rayleigh Coefficients in Kernel Feature Spaces
- Mika, Rätsch, et al.
- 2003
(Show Context)
Citation Context ...ller than the dimensionality ... |

49 | Efficient and robust feature extraction by maximum margin criterion
- Li, Jiang, et al.
- 2006
(Show Context)
Citation Context ...iterion of conventional LDA while controlling the complexity of the within-class scatter matrix as reflected by the second and third terms of the objective function in problem (18). 3 Related Work In =-=[11]-=-, Li et al. proposed a maximum margin criterion for dimensionality ( reduction by changing the optimization problem of conventional LDA to: maxW tr W... |

33 |
An optimal transformation for discriminant and principal component analysis
- Duchene, Leclercq
- 1988
(Show Context)
Citation Context ...blems but not more general multi-class problems. The orthogonality constraint on the transformation matrix W has been widely used by dimensionality reduction methods, such as Foley-Sammon LDA (FSLDA) =-=[6, 5]-=- and orthogonal LDA [23]. The orthogonality constraint can help to eliminate the redundant information in W. This has been shown to be effective for dimensionality reduction. 4 Experimental Validation... |

29 |
Introduction to Statistical Pattern Recognition
- Fukunnaga
- 1991
(Show Context)
Citation Context ...chniques is that the most useful information in many high-dimensional datasets resides in a low-dimensional latent space. Principal component analysis (PCA) [8] and linear discriminant analysis (LDA) =-=[7]-=- are two classical dimensionality reduction methods that are still widely used in many applications. PCA, as an unsupervised linear dimensionality reduction method, finds a lowdimensional subspace tha... |

19 | Trace ratio vs. ratio trace for dimensionality reduction
- Wang, Yan, et al.
- 2007
(Show Context)
Citation Context ...eduction. 4 Experimental Validation In this section, we evaluate WLDA empirically on some benchmark datasets and compare WLDA with several related methods, including conventional LDA, trace-ratio LDA =-=[20]-=-, FSLDA [6, 5], and MarginLDA [11]. For fair comparison with conventional LDA, we set the reduced dimensionality of each method compared to ... |

16 | Robust fisher discriminant analysis
- Alessandro, Magnani, et al.
- 2006
(Show Context)
Citation Context ...sionality reduction. The objective function in [10] is identical to that of support vector machine (SVM) and it treats the decision function in SVM as one direction in the transformation matrix W. In =-=[9]-=-, Kim et al. proposed a robust LDA algorithm to deal with data uncertainty in classification applications by formulating the problem as a convex problem. However, in many applications, it is not easy ... |

15 | Computational and theoretical analysis of null space and orthogonal linear discriminant analysis
- Ye, Xiang
- 2006
(Show Context)
Citation Context ... multi-class problems. The orthogonality constraint on the transformation matrix W has been widely used by dimensionality reduction methods, such as Foley-Sammon LDA (FSLDA) [6, 5] and orthogonal LDA =-=[23]-=-. The orthogonality constraint can help to eliminate the redundant information in W. This has been shown to be effective for dimensionality reduction. 4 Experimental Validation In this section, we eva... |

13 |
The AR database
- Mart́ınez, Benavente
- 1998
(Show Context)
Citation Context ...0732 0.0518 0.0868 0.0817 0.0250 of the ambient image space. Fisherface (based on LDA) [2] is one representative dimensionality reduction method. We use three face databases, ORL [2], PIE [17] and AR =-=[13]-=-, and one object database, COIL [15], in our experiments. In the AR face database, 2,600 images of 100 persons (50 men and 50 women) are used. Before the experiment, each image is converted to gray sc... |

7 | C.: Margin maximizing discriminant analysis
- Kocsor, Kovcs, et al.
- 2004
(Show Context)
Citation Context ...mall within-class scatter measure. However, similar to LDA, the maximum margin criterion also uses the average distances to describe the between-class and within-class scatter measures. Kocsor et al. =-=[10]-=- proposed another maximum margin criterion for dimensionality reduction. The objective function in [10] is identical to that of support vector machine (SVM) and it treats the decision function in SVM ... |

1 |
Semidefinite prgramming
- Vandenberghe, Boyd
- 1996
(Show Context)
Citation Context ...es ... |