## Semi-Supervised Discriminant Analysis Using Robust Path-Based Similarity (2008)

### Cached

### Download Links

- [www.cs.ust.hk]
- [mplab.ucsd.edu]
- [mplab.ucsd.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | Proc. IEEE Conf. Computer Vision and Pattern Recognition |

Citations: | 8 - 2 self |

### BibTeX

@INPROCEEDINGS{Zhang08semi-superviseddiscriminant,

author = {Yu Zhang and Dit-yan Yeung},

title = {Semi-Supervised Discriminant Analysis Using Robust Path-Based Similarity},

booktitle = {Proc. IEEE Conf. Computer Vision and Pattern Recognition},

year = {2008}

}

### OpenURL

### Abstract

Linear Discriminant Analysis (LDA), which works by maximizing the within-class similarity and minimizing the between-class similarity simultaneously, is a popular dimensionality reduction technique in pattern recognition and machine learning. In real-world applications when labeled data are limited, LDA does not work well. Under many situations, however, it is easy to obtain unlabeled data in large quantities. In this paper, we propose a novel dimensionality reduction method, called Semi-Supervised Discriminant Analysis (SSDA), which can utilize both labeled and unlabeled data to perform dimensionality reduction in the semisupervised setting. Our method uses a robust path-based similarity measure to capture the manifold structure of the data and then uses the obtained similarity to maximize the separability between different classes. A kernel extension of the proposed method for nonlinear dimensionality reduction in the semi-supervised setting is also presented. Experiments on face recognition demonstrate the effectiveness of the proposed method. 1.

### Citations

2802 | Eigenfaces for recognition
- Turk, Pentland
- 1991
(Show Context)
Citation Context ...xperimental results based on two face databases to evaluate the performance of SSDA. 4.1. Experimental Setup Subspace-based methods have achieved great successes in many face recognition applications =-=[23, 1]-=-. Previous research found that face images usually lie in some lowdimensional subspace within the ambient image space. Two famous methods are Eigenface [23] (based on PCA) and Fisherface [1] (based on... |

2029 |
Principal Component Analysis
- Jolliffe
- 1986
(Show Context)
Citation Context ...LDA [27], NullLDA [11], and DualLDA [24]. PseudoLDA overcomes the singularity problem by substituting the inverse of the within-class scatter matrix with its pseudo-inverse. PCA+LDA first applies PCA =-=[17]-=- to project the data into a lower-dimensional space so that the within-class scatter matrix computed there is nonsingular, and then applies LDA in the lowerdimensional space. Direct-LDA projects data ... |

1509 | Eigenfaces vs. fisherfaces: recognition using class specific linear projection
- Belhumeur, Hespanha, et al.
- 1997
(Show Context)
Citation Context ...such situations because there may not be enough data to make the within-class scatter matrix nonsingular. Several methods have been proposed to overcome the SSS problem, e.g., PseudoLDA [18], PCA+LDA =-=[1]-=-, Direct-LDA [27], NullLDA [11], and DualLDA [24]. PseudoLDA overcomes the singularity problem by substituting the inverse of the within-class scatter matrix with its pseudo-inverse. PCA+LDA first app... |

1249 | Combining labeled and unlabeled data with co-training
- Blum, Mitchell
- 1998
(Show Context)
Citation Context ... classification or regression tasks when labeled data are scarce. A good survey of semi-supervised learning methods can be found in [29]. Some early semisupervised learnng methods include Co-Training =-=[4]-=- and Transductive SVM [3]. More recently, graph-based semisupervised learning methods [2, 28, 30] have aroused the interests of many researchers. These methods model the relationships between data poi... |

991 |
Spectral graph theory
- Chung
- 1997
(Show Context)
Citation Context ...X ij S uu ij xix T j = 2w T XuL2X T u w, (9) where S uu ij is (i, j) element of S uu , D uu is a diagonal matrix whose entries are the column sums of S uu and L2 = D uu − S uu is the Laplacian matrix =-=[12]-=- of S uu . Finally, we combine Eqs. (8) and (9) to get the objective function for the optimization problem of our SSDA algorithm and maximize it with respect to w: max w wT Sbw wT (Sw + L1 + αXuL2X T ... |

971 |
The Use of Multiple Measurements in Taxonomic Problems
- Fisher
- 1936
(Show Context)
Citation Context ...ty reduction in the semi-supervised setting is also presented. Experiments on face recognition demonstrate the effectiveness of the proposed method. 1. Introduction Linear Discriminant Analysis (LDA) =-=[13, 21]-=- is a popular dimensionality reduction technique in pattern recognition and machine learning. It aims to maximize the withinclass similarity while minimizing the between-class similarity simultaneousl... |

495 | Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions
- Zhu, Ghahramani, et al.
(Show Context)
Citation Context ...upervised learning methods can be found in [29]. Some early semisupervised learnng methods include Co-Training [4] and Transductive SVM [3]. More recently, graph-based semisupervised learning methods =-=[2, 28, 30]-=- have aroused the interests of many researchers. These methods model the relationships between data points in the form of a graph, 1in which label information from the labeled data points is propagat... |

454 | Semi-supervised learning literature survey
- Zhu
- 2008
(Show Context)
Citation Context ...form of semi-supervised learning is to utilize unlabeled data to aid classification or regression tasks when labeled data are scarce. A good survey of semi-supervised learning methods can be found in =-=[29]-=-. Some early semisupervised learnng methods include Co-Training [4] and Transductive SVM [3]. More recently, graph-based semisupervised learning methods [2, 28, 30] have aroused the interests of many ... |

437 | Learning with local and global consistency
- Zhou, Bousquet, et al.
- 2003
(Show Context)
Citation Context ...upervised learning methods can be found in [29]. Some early semisupervised learnng methods include Co-Training [4] and Transductive SVM [3]. More recently, graph-based semisupervised learning methods =-=[2, 28, 30]-=- have aroused the interests of many researchers. These methods model the relationships between data points in the form of a graph, 1in which label information from the labeled data points is propagat... |

334 | regularization: A geometric framework for learning from labeled and unlabeled examples
- Belkin, Niyogi, et al.
(Show Context)
Citation Context ...upervised learning methods can be found in [29]. Some early semisupervised learnng methods include Co-Training [4] and Transductive SVM [3]. More recently, graph-based semisupervised learning methods =-=[2, 28, 30]-=- have aroused the interests of many researchers. These methods model the relationships between data points in the form of a graph, 1in which label information from the labeled data points is propagat... |

304 | Regularized discriminant analysis
- Friedman
- 1989
(Show Context)
Citation Context ...T u in Eq. (11) may be singular and hence the eigen-decomposition problem becomes unstable. To avoid this problem, we adopt the idea of Tikhonov regularization as in regularized discriminant analysis =-=[14]-=-. So the generalized eigenvalue problem in Eq. (11) becomes: Sbw = λ(Sw + L1 + αXuL2X T u + βI)w, (12) where β > 0 and I is the identity matrix. 3.3. The Algorithm The SSDA algorithm can be summarized... |

217 | The CMU Pose, Illumination, and Expression Database
- Sim, Baker, et al.
- 2003
(Show Context)
Citation Context ... image space. For Fisherface, we use PCA to preserve 95% variance of the data. The parameter β in Eq. (12) is fixed to 10−3 for SSDA and so is SDA. 4.2. PIE Face Database We use the PIE face database =-=[22]-=- for the first set of experiments. The database contains 41,368 face images from 68 individuals. The face images were captured under varying pose, illumination and expression conditions. For our exper... |

188 | Face recognition using laplacianfaces
- He, Yan, et al.
- 2005
(Show Context)
Citation Context ...point from the jth class, then xl+i is more likely to belong to the jth class. Let Q denote the similarity matrix between unlabeled data points and class means with Qij being its elements. Similar to =-=[16]-=-, a Laplacian-style measure is defined as follows: = = w T J1(w) mX CX i=1 j=1 −2 " mX i=1 mX (w T xl+i − w T ¯mj) 2 Qij CX i=1 j=1 CX j=1 Qij ! Qijxl+i ¯m T j xl+ix T l+i + # w CX j=1 mX i=1 = w T (X... |

184 | A direct LDA algorithm for high-dimensional data-with application to face recognition
- Yu, Yang
- 2001
(Show Context)
Citation Context ...because there may not be enough data to make the within-class scatter matrix nonsingular. Several methods have been proposed to overcome the SSS problem, e.g., PseudoLDA [18], PCA+LDA [1], Direct-LDA =-=[27]-=-, NullLDA [11], and DualLDA [24]. PseudoLDA overcomes the singularity problem by substituting the inverse of the within-class scatter matrix with its pseudo-inverse. PCA+LDA first applies PCA [17] to ... |

174 | Semi-supervised support vector machines
- Bennett, Demiriz
- 1999
(Show Context)
Citation Context ...sion tasks when labeled data are scarce. A good survey of semi-supervised learning methods can be found in [29]. Some early semisupervised learnng methods include Co-Training [4] and Transductive SVM =-=[3]-=-. More recently, graph-based semisupervised learning methods [2, 28, 30] have aroused the interests of many researchers. These methods model the relationships between data points in the form of a grap... |

157 |
A new LDA-based face recognition system which can solve the small sample size problem
- Chen, Liao, et al.
- 2000
(Show Context)
Citation Context ...tion, speech recognition, and character recognition, due to its effectiveness and computational efficiency. However, in some applications, LDA encounters the so-called small sample size (SSS) problem =-=[11]-=- which arises when the sample size is much smaller than the dimensionality of the feature space. The performance of LDA will seriously deteriorate under such situations because there may not be enough... |

154 | B.: Cluster kernels for semi-supervised learning
- Chapelle, Weston, et al.
- 2003
(Show Context)
Citation Context ...ut in [10], the similarity measure defined in Eq. (5) is a density-sensitive similarity measure. So our method can be regarded as adopting the so-called cluster assumption in semi-supervised learning =-=[9, 10]-=-, which says that two points are likely to have the same class label if there exists a path connecting them by passing through regions of high density only. Here we interpret the cluster assumption in... |

122 | Semi-supervised classification by low density separation
- Chapelle, Zien
(Show Context)
Citation Context ...onzero eigenvalues. 5. Let W = [w1, w2, . . . , wC−1]. Data points can be embedded into the lower-dimensional space via the following transformation: x → y = W T x. 3.4. Discussions As pointed out in =-=[10]-=-, the similarity measure defined in Eq. (5) is a density-sensitive similarity measure. So our method can be regarded as adopting the so-called cluster assumption in semi-supervised learning [9, 10], w... |

67 | Discriminative common vectors for face recognition
- Cevikalp, Neamtu, et al.
- 2005
(Show Context)
Citation Context ...lLDA first projects the data to the null space of the within-class scatter matrix and then maximizes the between-class scatter in this space. It is similar to the Discriminative Common Vectors method =-=[6]-=-. DualLDA, which combines the ideas from PCA+LDA and NullLDA, applies LDA in the range space and the null space of the within-class scatter matrix separately and then integrates the two parts together... |

56 |
The utilization of multiple measurements in problems of biological classification
- Rao
- 1948
(Show Context)
Citation Context ...ty reduction in the semi-supervised setting is also presented. Experiments on face recognition demonstrate the effectiveness of the proposed method. 1. Introduction Linear Discriminant Analysis (LDA) =-=[13, 21]-=- is a popular dimensionality reduction technique in pattern recognition and machine learning. It aims to maximize the withinclass similarity while minimizing the between-class similarity simultaneousl... |

52 | Dual-space linear discriminant analysis for face recognition
- Wang, Tang
- 2004
(Show Context)
Citation Context ...data to make the within-class scatter matrix nonsingular. Several methods have been proposed to overcome the SSS problem, e.g., PseudoLDA [18], PCA+LDA [1], Direct-LDA [27], NullLDA [11], and DualLDA =-=[24]-=-. PseudoLDA overcomes the singularity problem by substituting the inverse of the within-class scatter matrix with its pseudo-inverse. PCA+LDA first applies PCA [17] to project the data into a lower-di... |

49 |
2-D linear discriminant analysis
- Ye, Janardan, et al.
- 2004
(Show Context)
Citation Context ...ace of the within-class scatter matrix separately and then integrates the two parts together to get the final transformation. There also exists another approach to address the SSS problem, with 2DLDA =-=[26]-=- being the representative of this approach. The major difference between 2DLDA and the above algorithms is in the data representation. Specifically, 2DLDA works on data represented as matrices instead... |

44 |
Discriminant analysis with singular covariance matrices: Methods and applications to spectroscopic data
- Krzanowski, Jonathan, et al.
- 1995
(Show Context)
Citation Context ...riorate under such situations because there may not be enough data to make the within-class scatter matrix nonsingular. Several methods have been proposed to overcome the SSS problem, e.g., PseudoLDA =-=[18]-=-, PCA+LDA [1], Direct-LDA [27], NullLDA [11], and DualLDA [24]. PseudoLDA overcomes the singularity problem by substituting the inverse of the within-class scatter matrix with its pseudo-inverse. PCA+... |

43 | Constructing descriptive and discriminative nonlinear features: Rayleigh coefficients in kernel feature spaces
- Mika, Rätsch, et al.
- 2003
(Show Context)
Citation Context ...on term J φ 1 J φ 1 = ΦuD1Φ T u + ΨD2Ψ T − ΦuQΨ T − ΨQ T Φ T u . So the problem in Eq. (11) in F can be written as follows: ΨDΨ T v = λ(ΦlΦ T l −ΨDΨ T +J φ 1 +αΦuL2Φ T u )v. (13) From the analysis of =-=[20]-=-, the eigenvector of Eq. (13) is a linear combination of φ(x1), . . . , φ(xn). Thus there exist coefficients γi, i = 1, . . . , n such that v = ∑n i=1 γiΦ(xi) = Φa, where a = (γ1, . . . , γn) T . With... |

42 | Semi-supervised discriminant analysis
- Cai, He, et al.
(Show Context)
Citation Context ...propose a novel semisupervised dimensionality reduction algorithm called SemiSupervised Discriminant Analysis (SSDA). Even though there already exists another semi-supervised LDA algorithm called SDA =-=[5]-=- which exploits the local neighborhood information of data points in performing dimensionality reduction, our SSDA algorithm exploits the global structure of the data and is robust against noise in de... |

25 |
Introduction to Statistical Pattern Recognition
- Fukunnaga
- 1991
(Show Context)
Citation Context ...1 data set D and ¯mk = 1 nk Πk. ∑ (xi − ¯mk)(xi − ¯mk) T (2) (3) k=1 xi∈Πk ∑n n i=1 xi is ∑ the sample mean of the whole xi∈Πk xi is the class mean of Sb, where S−1 w denotes the matrix inverse of Sw =-=[15]-=-. According to [25], W ∗ computed as above may not be optimal with respect to the optimality criterion in Eq. (1), but it is a computationally simple and good approximate solution sufficient for many ... |

12 |
The AR-face database
- Martínez, Benavente
- 1998
(Show Context)
Citation Context ...50 Eigenface 0.6556±0.0163 0.6658±0.0148 Fisherface 0.3159±0.0254 0.3303±0.0264 SDA 0.4960±0.0282 0.5088±0.0330 SSDA 0.1809±0.0225 0.1972±0.0217 4.3. AR Face Database We next use the AR face database =-=[19]-=- for the second set of experiments. The database contains over 4,000 color face images from 126 persons, which include 70 men and 56 women. The face images are all frontal view images with different e... |

10 | Robust path-based spectral clustering with application to image segmentation
- Chang, Yeung
- 2005
(Show Context)
Citation Context ...loit the underlying manifold structure of the whole data set based on a robust pathbased similarity measure as described below. We first define a path-based similarity measure as in our previous work =-=[7, 8]-=-. Let Pij denote the set of all paths connecting vertices i and j. For each path p ∈ Pij, the effective similarity s p ij the path. The path-based similarity measure s ′ ij is the minimum edge weight ... |

8 |
Trace quotient problems revisited
- Yan, Tang
- 2006
(Show Context)
Citation Context ...k = 1 nk Πk. ∑ (xi − ¯mk)(xi − ¯mk) T (2) (3) k=1 xi∈Πk ∑n n i=1 xi is ∑ the sample mean of the whole xi∈Πk xi is the class mean of Sb, where S−1 w denotes the matrix inverse of Sw [15]. According to =-=[25]-=-, W ∗ computed as above may not be optimal with respect to the optimality criterion in Eq. (1), but it is a computationally simple and good approximate solution sufficient for many applications. Thus ... |

4 | Graph Laplacian kernels for object classification from a single example
- Chang, Yeung
- 2006
(Show Context)
Citation Context ...loit the underlying manifold structure of the whole data set based on a robust pathbased similarity measure as described below. We first define a path-based similarity measure as in our previous work =-=[7, 8]-=-. Let Pij denote the set of all paths connecting vertices i and j. For each path p ∈ Pij, the effective similarity s p ij the path. The path-based similarity measure s ′ ij is the minimum edge weight ... |