Results 1 
9 of
9
Robust PCA via outlier pursuit
, 2010
"... Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a wellknown, welldocumented sensitivity to outliers. Recent work has considered the setting w ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a wellknown, welldocumented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimizationbased algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal lowdimensional subspace, and identifies the corrupted points. Such identification of corrupted points that do not conform to the lowdimensional approximation, is of paramount interest in bioinformatics and financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself. 1
TWO PROPOSALS FOR ROBUST PCA USING SEMIDEFINITE PROGRAMMING
, 1012
"... Abstract. The performance of principal component analysis (PCA) suffers badly in the presence of outliers. This paper proposes two novel approaches for robust PCA based on semidefinite programming. The first method, maximum mean absolute deviation rounding (MDR), seeks directions of large spread in ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Abstract. The performance of principal component analysis (PCA) suffers badly in the presence of outliers. This paper proposes two novel approaches for robust PCA based on semidefinite programming. The first method, maximum mean absolute deviation rounding (MDR), seeks directions of large spread in the data while damping the effect of outliers. The second method produces a lowleverage decomposition (LLD) of the data that attempts to form a lowrank model for the data by separating out corrupted observations. This paper also presents efficient computational methods for solving these SDPs. Numerical experiments confirm the value of these new techniques. 1.
ROBUST COMPUTATION OF LINEAR MODELS, OR HOW TO FIND A NEEDLE IN A HAYSTACK
"... Abstract. Consider a dataset of vectorvalued observations that consists of a modest number of noisy inliers, which are explained well by a lowdimensional subspace, along with a large number of outliers, which have no linear structure. This work describes a convex optimization problem, called reape ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
Abstract. Consider a dataset of vectorvalued observations that consists of a modest number of noisy inliers, which are explained well by a lowdimensional subspace, along with a large number of outliers, which have no linear structure. This work describes a convex optimization problem, called reaper, that can reliably fit a lowdimensional model to this type of data. The paper provides an efficient algorithm for solving the reaper problem, and it documents numerical experiments which confirm that reaper can dependably find linear structure in synthetic and natural data. In addition, when the inliers are contained in a lowdimensional subspace, there is a rigorous theory that describes when reaper can recover the subspace exactly. 1.
Robust matrix completion with corrupted columns. Arxiv preprint arXiv:1102.2254
, 2011
"... This paper considers the problem of matrix completion, when some number of the columns are arbitrarily corrupted, potentially by a malicious adversary. It is wellknown that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. What can ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper considers the problem of matrix completion, when some number of the columns are arbitrarily corrupted, potentially by a malicious adversary. It is wellknown that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. What can be done if a large number, or even a constant fraction of columns are corrupted? In this paper, we study this very problem, and develop an efficient algorithm for its solution. Our results show that with a vanishing fraction of observed entries, it is nevertheless possible to succeed in performing matrix completion, even when the number of corrupted columns grows. When the number of corruptions is as high as a constant fraction of the total number of columns, we show that again exact matrix completion is possible, but in this case our algorithm requires many more – a constant fraction – of observations. One direct application comes from robust collaborative filtering. Here, some number of users are socalled manipulators, and try to skew the predictions of the algorithm. Significantly, our results hold without any assumptions on the number, locations or values of the observed entries of the manipulated columns. In particular, this means that manipulators can act in a completely adversarial manner. I.
Direct Robust Matrix Factorization for Anomaly Detection
"... Abstract—Matrix factorization methods are extremely useful in many data mining tasks, yet their performances are often degraded by outliers. In this paper, we propose a novel robust matrix factorization algorithm that is insensitive to outliers. We directly formulate robust factorization as a matrix ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Abstract—Matrix factorization methods are extremely useful in many data mining tasks, yet their performances are often degraded by outliers. In this paper, we propose a novel robust matrix factorization algorithm that is insensitive to outliers. We directly formulate robust factorization as a matrix approximation problem with constraints on the rank of the matrix and the cardinality of the outlier set. Then, unlike existing methods that resort to convex relaxations, we solve this problem directly and efficiently. In addition, structural knowledge about the outliers can be incorporated to find outliers more effectively. We applied this method in anomaly detection tasks on various data sets. Empirical results show that this new algorithm is effective in robust modeling and anomaly detection, and our direct solution achieves superior performance over the stateoftheart methods based on the L1norm and the nuclear norm of matrices. Keywordsmatrix factorization, robust, anomaly detection I.
Robust matrix completion and corrupted columns
 In International Conference on Machine Learning
, 2011
"... This paper considers the problem of matrix completion, when some number of the columns are arbitrarily corrupted. It is wellknown that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. What can be done if a large number, or even a c ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper considers the problem of matrix completion, when some number of the columns are arbitrarily corrupted. It is wellknown that standard algorithms for matrix completion can return arbitrarily poor results, if even a single column is corrupted. What can be done if a large number, or even a constant fraction of columns are corrupted? In this paper, we study this very problem, and develop an robust and efficient algorithm for its solution. One direct application comes from robust collaborative filtering. Here, some number of users are socalled manipulators, and try to skew the predictions of the algorithm. Significantly, our results hold without any assumptions on the observed entries of the manipulated columns.
Learning a Factor Model via Regularized PCA
"... We consider the problem of learning a linear factor model. We propose a regularized form of principal component analysis (PCA) and demonstrate through experiments with synthetic and real data the superiority of resulting estimates to those produced by preexisting factor analysis approaches. We also ..."
Abstract
 Add to MetaCart
We consider the problem of learning a linear factor model. We propose a regularized form of principal component analysis (PCA) and demonstrate through experiments with synthetic and real data the superiority of resulting estimates to those produced by preexisting factor analysis approaches. We also establish theoretical results that explain how our algorithm corrects the biases induced by conventional approaches. An important feature of our algorithm is that its computational requirements are similar to those of PCA, which enjoys wide use in large part due to its efficiency. 1
Directed Principal Component Analysis YiHao
, 2012
"... We consider a problem involving estimation of a highdimensional covariance matrix that is the sum of a diagonal matrix and a lowrank matrix, and making a decision based on the resulting estimate. Such problems arise, for example, in portfolio management, where a common approach employs principal c ..."
Abstract
 Add to MetaCart
We consider a problem involving estimation of a highdimensional covariance matrix that is the sum of a diagonal matrix and a lowrank matrix, and making a decision based on the resulting estimate. Such problems arise, for example, in portfolio management, where a common approach employs principal component analysis (PCA) to estimate factors used in constructing the lowrank term of the covariance matrix. The decision problem is typically treated separately, with the estimated covariance matrix taken to be an input to an optimization problem. We propose directed PCA, an efficient algorithm that takes the decision objective into account when estimating the covariance matrix. Directed PCA effectively adjusts factors that would be produced by PCA so that they better guide the specific decision at hand. We demonstrate through computational studies that directed PCA yields significant benefit, and we prove theoretical results establishing that the degree of improvement over conventional PCA can be unbounded. 1
On Learning from Collective Data
, 2013
"... not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: Collective data; grouped data; point sets; lowrank decomposition; robust methods; anomaly detection; novelty detection; group anom ..."
Abstract
 Add to MetaCart
not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: Collective data; grouped data; point sets; lowrank decomposition; robust methods; anomaly detection; novelty detection; group anomaly; hierarchical probabilistic models; topic models; divergence estimation; distribution classification; efficient learning; distance completion; In many machine learning problems and application domains, the data are naturally organized by groups. For example, a video sequence is a group of images, an image is a group of patches, a document is a group of paragraphs/words, and a community is a group of people. We call them the collective data. In this thesis, we study how and what we can learn from collective data. Usually, machine learning focuses on individual objects, each of which is described by a feature vector and studied as a point in some metric space. When approaching collective data, researchers often reduce the groups into vectors to which traditional methods can be applied. We, on the other hand, will try to develop machine learning methods that