Results 1  10
of
55
An introduction to variable and feature selection
 Journal of Machine Learning Research
, 2003
"... Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. ..."
Abstract

Cited by 688 (14 self)
 Add to MetaCart
Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available.
Robust face recognition via sparse representation,” (preprint
 IEEE Trans. Pattern Analysis and Machine Intelligence
"... Abstract — We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models, and argue that new theory from sp ..."
Abstract

Cited by 321 (22 self)
 Add to MetaCart
Abstract — We consider the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise. We cast the recognition problem as one of classifying among multiple linear regression models, and argue that new theory from sparse signal representation offers the key to addressing this problem. Based on a sparse representation computed by ℓ 1minimization, we propose a general classification algorithm for (imagebased) object recognition. This new framework provides new insights into two crucial issues in face recognition: feature extraction and robustness to occlusion. For feature extraction, we show that if sparsity in the recognition problem is properly harnessed, the choice of features is no longer critical. What is critical, however, is whether the number of features is sufficiently large and whether the sparse representation is correctly computed. Unconventional features such as downsampled images and random projections perform just as well as conventional features such as Eigenfaces and Laplacianfaces, as long as the dimension of the feature space surpasses certain threshold, predicted by the theory of sparse representation. This framework can handle errors due to occlusion and corruption uniformly, by exploiting the fact that these errors are often sparse w.r.t. to the standard (pixel) basis. The theory of sparse representation helps predict how much occlusion the recognition algorithm can handle and how to choose the training images to maximize robustness to occlusion. We conduct extensive experiments on publicly available databases to verify the efficacy of the proposed algorithm, and corroborate the above claims.
Use of the ZeroNorm With Linear Models and Kernel Methods
, 2002
"... We explore the use of the socalled zeronorm of the parameters of linear models in learning. ..."
Abstract

Cited by 115 (4 self)
 Add to MetaCart
We explore the use of the socalled zeronorm of the parameters of linear models in learning.
LowDimensional Linear Programming with Violations
 In Proc. 43th Annu. IEEE Sympos. Found. Comput. Sci
, 2002
"... Two decades ago, Megiddo and Dyer showed that linear programming in 2 and 3 dimensions (and subsequently, any constant number of dimensions) can be solved in linear time. In this paper, we consider linear programming with at most k violations: finding a point inside all but at most k of n given half ..."
Abstract

Cited by 46 (3 self)
 Add to MetaCart
Two decades ago, Megiddo and Dyer showed that linear programming in 2 and 3 dimensions (and subsequently, any constant number of dimensions) can be solved in linear time. In this paper, we consider linear programming with at most k violations: finding a point inside all but at most k of n given halfspaces. We give a simple algorithm in 2d that runs in O((n + k ) log n) expected time; this is faster than earlier algorithms by Everett, Robert, and van Kreveld (1993) and Matousek (1994) and is probably nearoptimal for all k n=2. A (theoretical) extension of our algorithm in 3d runs in near O(n + k ) expected time. Interestingly, the idea is based on concavechain decompositions (or covers) of the ( k)level, previously used in proving combinatorial klevel bounds.
Robust principal component analysis: Exact recovery of corrupted lowrank matrices via convex optimization
 Advances in Neural Information Processing Systems 22
, 2009
"... The supplementary material to the NIPS version of this paper [4] contains a critical error, which was discovered several days before the conference. Unfortunately, it was too late to withdraw the paper from the proceedings. Fortunately, since that time, a correct analysis of the proposed convex prog ..."
Abstract

Cited by 44 (3 self)
 Add to MetaCart
The supplementary material to the NIPS version of this paper [4] contains a critical error, which was discovered several days before the conference. Unfortunately, it was too late to withdraw the paper from the proceedings. Fortunately, since that time, a correct analysis of the proposed convex programming relaxation has been developed by Emmanuel Candes of Stanford University. That analysis is reported in a joint paper, Robust Principal Component Analysis? by Emmanuel Candes, Xiaodong Li, Yi Ma and John Wright,
False Data Injection Attacks against State Estimation in Electric Power Grids
, 2009
"... A power grid is a complex system connecting electric power generators to consumers through power transmission and distribution networks across a large geographical area. System monitoring is necessary to ensure the reliable operation of power grids, and state estimation is used in system monitoring ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
A power grid is a complex system connecting electric power generators to consumers through power transmission and distribution networks across a large geographical area. System monitoring is necessary to ensure the reliable operation of power grids, and state estimation is used in system monitoring to best estimate the power grid state through analysis of meter measurements and power system models. Various techniques have been developed to detect and identify bad measurements, including the interacting bad measurements introduced by arbitrary, nonrandom causes. At first glance, it seems that these techniques can also defeat malicious measurements injected by attackers. In this paper, we present a new class of attacks, called false data injection attacks, against state estimation in electric power grids. We show that an attacker can exploit the configuration of a power system to launch such attacks to successfully introduce arbitrary errors into certain state variables while bypassing existing techniques for bad measurement detection. Moreover, we look at two realistic attack scenarios, in which the attacker is either constrained to some specific meters (due to the physical protection of the meters), or limited in the resources required to compromise meters. We show that the attacker can systematically and efficiently construct attack vectors in both scenarios, which can not only change the results of state estimation, but also modify the results in arbitrary ways. We demonstrate the success of these attacks through simulation using IEEE test systems. Our results indicate that security protection of the electric power grid must be revisited when there are potentially malicious attacks.
Hardness of learning halfspaces with noise
 In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
, 2006
"... Learning an unknown halfspace (also called a perceptron) from labeled examples is one of the classic problems in machine learning. In the noisefree case, when a halfspace consistent with all the training examples exists, the problem can be solved in polynomial time using linear programming. However ..."
Abstract

Cited by 33 (3 self)
 Add to MetaCart
Learning an unknown halfspace (also called a perceptron) from labeled examples is one of the classic problems in machine learning. In the noisefree case, when a halfspace consistent with all the training examples exists, the problem can be solved in polynomial time using linear programming. However, under the promise that a halfspace consistent with a fraction (1 − ε) of the examples exists (for some small constant ε> 0), it was not known how to efficiently find a halfspace that is correct on even 51 % of the examples. Nor was a hardness result that ruled out getting agreement on more than 99.9 % of the examples known. In this work, we close this gap in our understanding, and prove that even a tiny amount of worstcase noise makes the problem of learning halfspaces intractable in a strong sense. Specifically, for arbitrary ε, δ> 0, we prove that given a set of exampleslabel pairs from the hypercube a fraction (1 − ε) of which can be explained by a halfspace, it is NPhard to find a halfspace that correctly labels a fraction (1/2 + δ) of the examples. The hardness result is tight since it is trivial to get agreement on 1/2 the examples. In learning theory parlance, we prove that weak proper agnostic learning of halfspaces is hard. This settles a question that was raised by Blum et al. in their work on learning halfspaces in the presence of random classification noise [10], and in some more recent works as well. Along the way, we also obtain a strong hardness result for another basic computational problem: solving a linear system over the rationals. 1
Self Adapting Software for Numerical Linear Algebra and LAPACK for Clusters
 Parallel Computing
, 2003
"... This article describes the context, design, and recent development of the LAPACK for Clusters (LFC) project. It has been developed in the framework of SelfAdapting Numerical Software (SANS) since we believe such an approach can deliver the con venience and ease of use of existing sequential enviro ..."
Abstract

Cited by 23 (16 self)
 Add to MetaCart
This article describes the context, design, and recent development of the LAPACK for Clusters (LFC) project. It has been developed in the framework of SelfAdapting Numerical Software (SANS) since we believe such an approach can deliver the con venience and ease of use of existing sequential environments bundled with the power and versatility of highlytuned parallel codes that execute on clusters. Accomplishing this task is far from trivial as we argue in the paper by presenting pertinent case studies and possible usage scenarios.
Parsimonious Least Norm Approximation
, 1997
"... A theoretically justifiable fast finite successive linear approximation algorithm is proposed for obtaining a parsimonious solution to a corrupted linear system Ax = b + p, where the corruption p is due to noise or error in measurement. The proposed linearprogrammingbased algorithm finds a solutio ..."
Abstract

Cited by 20 (7 self)
 Add to MetaCart
A theoretically justifiable fast finite successive linear approximation algorithm is proposed for obtaining a parsimonious solution to a corrupted linear system Ax = b + p, where the corruption p is due to noise or error in measurement. The proposed linearprogrammingbased algorithm finds a solution x by parametrically minimizing the number of nonzero elements in x and the error k Ax \Gamma b \Gamma p k 1 . Numerical tests on a signalprocessingbased example indicate that the proposed method is comparable to a method that parametrically minimizes the 1norm of the solution x and the error k Ax \Gamma b \Gamma p k 1 , and that both methods are superior, by orders of magnitude, to solutions obtained by least squares as well by combinatorially choosing an optimal solution with a specific number of nonzero elements. Keywords Minimal cardinality, least norm approximation 1 Introduction A wide range of important applications can be reduced to the problem of estimating a vector x by minim...