Results 1  10
of
21
Robust face recognition via sparse representation
 IEEE TRANS. PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2008
"... We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models, and argue that new theory from sparse signa ..."
Abstract

Cited by 828 (37 self)
 Add to MetaCart
We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models, and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by ℓ 1minimization, we propose a general classification algorithm for (imagebased) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as Eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly, by exploiting the fact that these errors are often sparse w.r.t. to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm, and corroborate the above claims.
Random projection in dimensionality reduction: Applications to image and text data
 in Knowledge Discovery and Data Mining
, 2001
"... Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using random projection as a dimensionality reduction t ..."
Abstract

Cited by 211 (0 self)
 Add to MetaCart
(Show Context)
Random projections have recently emerged as a powerful method for dimensionality reduction. Theoretical results indicate that the method preserves distances quite nicely; however, empirical results are sparse. We present experimental results on using random projection as a dimensionality reduction tool in a number of cases, where the high dimensionality of the data would otherwise lead to burdensome computations. Our application areas are the processing of both noisy and noiseless images, and information retrieval in text documents. We show that projecting the data onto a random lowerdimensional subspace yields results comparable to conventional dimensionality reduction methods such as principal component analysis: the similarity of data vectors is preserved well under random projection. However, using random projections is computationally signicantly less expensive than using, e.g., principal component analysis. We also show experimentally that using a sparse random matrix gives additional computational savings in random projection.
Random projection for high dimensional data clustering: A cluster ensemble approach
 In: Proceedings of the 20th International Conference on Machine Learning (ICML
"... We investigate how random projection can best be used for clustering high dimensional data. Random projection has been shown to have promising theoretical properties. In practice, however, we find that it results in highly unstable clustering performance. Our solution is to use random projection in ..."
Abstract

Cited by 122 (4 self)
 Add to MetaCart
We investigate how random projection can best be used for clustering high dimensional data. Random projection has been shown to have promising theoretical properties. In practice, however, we find that it results in highly unstable clustering performance. Our solution is to use random projection in a cluster ensemble approach. Empirical results show that the proposed approach achieves better and more robust clustering performance compared to not only single runs of random projection/clustering but also clustering with PCA, a traditional data reduction method for high dimensional data. To gain insights into the performance improvement obtained by our ensemble method, we analyze and identify the influence of the quality and the diversity of the individual clustering solutions on the final ensemble performance. 1.
SelfOrganization of Very Large Document Collections: State of the Art
 Proceedings of ICANN98, the 8th International Conference on Artificial Neural Networks
, 1998
"... The SelfOrganizing Map (SOM) forms a nonlinear projection from a highdimensional data manifold onto a lowdimensional grid. A representative model of some subset of data is associated with each grid point. The SOM algorithm computes an optimal collection of models that approximates the data in the ..."
Abstract

Cited by 61 (2 self)
 Add to MetaCart
(Show Context)
The SelfOrganizing Map (SOM) forms a nonlinear projection from a highdimensional data manifold onto a lowdimensional grid. A representative model of some subset of data is associated with each grid point. The SOM algorithm computes an optimal collection of models that approximates the data in the sense of some error criterion and also takes into account the similarity relations of the models. The models then become ordered on the grid according to their similarity. When the SOM is used for the exploration of statistical data, the data vectors can be approximated by models of the same dimensionality. When mapping documents, one can represent them statistically by their word frequency histograms or some reduced representations of the histograms that can be regarded as data vectors. We have made SOMs of collections of over one million documents. Each document is mapped onto some grid point, with a link from this point to the document database. The documents are ordered on the grid acco...
Feature selection in face recognition: A sparse representation perspective
, 2007
"... In this paper, we examine the role of feature selection in face recognition from the perspective of sparse representation. We cast the recognition problem as finding a sparse representation of the test image features w.r.t. the training set. The sparse representation can be accurately and efficientl ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we examine the role of feature selection in face recognition from the perspective of sparse representation. We cast the recognition problem as finding a sparse representation of the test image features w.r.t. the training set. The sparse representation can be accurately and efficiently computed by ℓ 1minimization. The proposed simple algorithm generalizes conventional face recognition classifiers such as nearest neighbors and nearest subspaces. Using face recognition under varying illumination and expression as an example, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficient and whether the sparse representation is correctly found. We conduct extensive experiments to validate the significance of imposing sparsity using the Extended Yale B database and the AR database. Our thorough evaluation shows that, using conventional features such as Eigenfaces and facial parts, the proposed algorithm achieves much higher recognition accuracy on face images with variation in either illumination or expression. Furthermore, other unconventional features such as severely downsampled images and randomly projected features perform almost equally well with the increase of feature dimensions. The differences in performance between different features become insignificant as the featurespace dimension is sufficiently large.
Comparing and Combining Dimension Reduction Techniques for Efficient Text Clustering
 Proceedings of the Workshop on Feature Selection for Data Mining, SIAM Data Mining, 2005
"... A great challenge of text mining arises from the increasingly large text datasets and the high dimensionality associated with natural language. In this research, a systematic study is conducted of six Dimension Reduction Techniques (DRT) in the context of the text clustering problem using three stan ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
A great challenge of text mining arises from the increasingly large text datasets and the high dimensionality associated with natural language. In this research, a systematic study is conducted of six Dimension Reduction Techniques (DRT) in the context of the text clustering problem using three standard benchmark datasets. The methods considered include three feature transformation techiques,
PrivacyPreserving Clustering by Object SimilarityBased Representation and Dimensionality Reduction Transformation
 IN PROC. OF THE WORKSHOP ON PRIVACY AND SECURITY ASPECTS OF DATA MINING (PSADM’04) IN CONJUNCTION WITH THE FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM’04
, 2004
"... Preserving privacy of individuals when data are shared for clustering is a challenging problem. Data owners must not only meet privacy requirements but also guarantee valid clustering results. In this paper, we show that this dual goal can be achieved by transforming a database using two simple and ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Preserving privacy of individuals when data are shared for clustering is a challenging problem. Data owners must not only meet privacy requirements but also guarantee valid clustering results. In this paper, we show that this dual goal can be achieved by transforming a database using two simple and effective data transformations: Object SimilarityBased Representation (OSBR) and Dimensionality ReductionBased Transformation (DRBT). The former relies on the idea behind the similarity between objects, and the latter relies on the intuition behind random projection. The major features of our data transformations are: a) they are independent of distancebased clustering algorithms; b) they have a sound mathematical foundation; and c) they do not require CPUintensive operations.
Dimensionality reduction by random projection and latent semantic indexing
 In proceedings of the Text Mining Workshop, at the 3rd SIAM International Conference on Data Mining
"... Document categorization and classification is of seminal importance for information retrieval. During the past decade the growth of Internet has fundamentally changed the ways that information is shared, and it has made clear that efficient methods for searching and exploring vast amounts of data sh ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Document categorization and classification is of seminal importance for information retrieval. During the past decade the growth of Internet has fundamentally changed the ways that information is shared, and it has made clear that efficient methods for searching and exploring vast amounts of data should be put forward. The largest challenges associated with information retrieval are synonymy and high dimensionality. An elegant and accurate technique to solve both problems has been presented in the form of Latent Semantic Analysis (LSI). However, its high computational cost makes it infeasible for large databases. Therefore other models such as random projection have been proposed. A suggestion for combining these two approaches has also been proposed. However, best to our knowledge, no empirical results have been presented on this “hybrid ” method. In this paper we evaluate and compare these three approaches and discover that the seemingly promising combination of LSI and random projection does not always result in faster running time as expected. 1.
A PrivacyPreserving Clustering Approach Toward Secure and Effective Data Analysis for Business
"... The sharing of data has been proven beneficial in data mining applications. However, privacy regulations and other privacy concerns may prevent data owners from sharing information for data analysis. To resolve this challenging problem, data owners must design a solution that meets privacy requireme ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
The sharing of data has been proven beneficial in data mining applications. However, privacy regulations and other privacy concerns may prevent data owners from sharing information for data analysis. To resolve this challenging problem, data owners must design a solution that meets privacy requirements and guarantees valid data clustering results. To achieve this dual goal, we introduce a new method for privacypreserving clustering, called Dimensionality ReductionBased Transformation (DRBT). This method relies on the intuition behind random projection to protect the underlying attribute values subjected to cluster analysis. The major features of this method are: a) it is independent of distancebased clustering algorithms; b) it has a sound mathematical foundation; and c) it does not require CPUintensive operations. We show analytically and empirically that transforming a dataset using DRBT, a data owner can achieve privacy preservation and get accurate clustering with a little overhead of communication cost.
Independent Component Analysis Using Random Projection For Data PreProcessing
"... There is an inherent difficulty of finding out latent structures within high dimensional data repository centers. It is assumed that this data is generated by these unknown latent variables and with the relationship and interaction between each of them. The task is to find these latent variables and ..."
Abstract
 Add to MetaCart
(Show Context)
There is an inherent difficulty of finding out latent structures within high dimensional data repository centers. It is assumed that this data is generated by these unknown latent variables and with the relationship and interaction between each of them. The task is to find these latent variables and the way they interact, given the observed data only. It is assumed that the latent variables do not depend on each other but act independently. A popular method for counteracting with the above stated problem scenario is independent component analysis (ICA). An ICA algorithm for analyzing complex valued signals is given; and an ICAtype algorithm is used for analyzing the topics in dynamically changing text data. Experimental results are given on all of the presented methods. Another, partially overlapping problem considered in this paper is dimensionality reduction. Empirical validation is given on a computationally simple method called random projection: it does not introduce severe distortions in the data. It is also proposed that random projection could be used as a preprocessing method prior to ICA, and experimental results are shown to support this claim.