Results 1  10
of
57
A Singular Value Thresholding Algorithm for Matrix Completion
, 2008
"... This paper introduces a novel algorithm to approximate the matrix with minimum nuclear norm among all matrices obeying a set of convex constraints. This problem may be understood as the convex relaxation of a rank minimization problem, and arises in many important applications as in the task of reco ..."
Abstract

Cited by 192 (12 self)
 Add to MetaCart
This paper introduces a novel algorithm to approximate the matrix with minimum nuclear norm among all matrices obeying a set of convex constraints. This problem may be understood as the convex relaxation of a rank minimization problem, and arises in many important applications as in the task of recovering a large matrix from a small subset of its entries (the famous Netflix problem). Offtheshelf algorithms such as interior point methods are not directly amenable to large problems of this kind with over a million unknown entries. This paper develops a simple firstorder and easytoimplement algorithm that is extremely efficient at addressing problems in which the optimal solution has low rank. The algorithm is iterative and produces a sequence of matrices {X k, Y k} and at each step, mainly performs a softthresholding operation on the singular values of the matrix Y k. There are two remarkable features making this attractive for lowrank matrix completion problems. The first is that the softthresholding operation is applied to a sparse matrix; the second is that the rank of the iterates {X k} is empirically nondecreasing. Both these facts allow the algorithm to make use of very minimal storage space and keep the computational cost of each iteration low. On
Projected gradient methods for Nonnegative Matrix Factorization
 Neural Computation
, 2007
"... Nonnegative matrix factorization (NMF) can be formulated as a minimization problem with bound constraints. Although boundconstrained optimization has been studied extensively in both theory and practice, so far no study has formally applied its techniques to NMF. In this paper, we propose two proj ..."
Abstract

Cited by 134 (2 self)
 Add to MetaCart
Nonnegative matrix factorization (NMF) can be formulated as a minimization problem with bound constraints. Although boundconstrained optimization has been studied extensively in both theory and practice, so far no study has formally applied its techniques to NMF. In this paper, we propose two projected gradient methods for NMF, both of which exhibit strong optimization properties. We discuss efficient implementations and demonstrate that one of the proposed methods converges faster than the popular multiplicative update approach. A simple MATLAB code is also provided. 1
Nonmonotone spectral projected gradient methods on convex sets
 SIAM Journal on Optimization
, 2000
"... Abstract. Nonmonotone projected gradient techniques are considered for the minimization of differentiable functions on closed convex sets. The classical projected gradient schemes are extended to include a nonmonotone steplength strategy that is based on the Grippo–Lampariello–Lucidi nonmonotone lin ..."
Abstract

Cited by 133 (25 self)
 Add to MetaCart
Abstract. Nonmonotone projected gradient techniques are considered for the minimization of differentiable functions on closed convex sets. The classical projected gradient schemes are extended to include a nonmonotone steplength strategy that is based on the Grippo–Lampariello–Lucidi nonmonotone line search. In particular, the nonmonotone strategy is combined with the spectral gradient choice of steplength to accelerate the convergence process. In addition to the classical projected gradient nonlinear path, the feasible spectral projected gradient is used as a search direction to avoid additional trial projections during the onedimensional search process. Convergence properties and extensive numerical results are presented.
Metric Learning by Collapsing Classes
"... We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance) for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in th ..."
Abstract

Cited by 130 (2 self)
 Add to MetaCart
We present an algorithm for learning a quadratic Gaussian metric (Mahalanobis distance) for use in classification tasks. Our method relies on the simple geometric intuition that a good metric is one under which points in the same class are simultaneously near each other and far from points in the other classes. We construct a convex optimization problem whose solution generates such a metric by trying to collapse all examples in the same class to a single point and push examples in other classes infinitely far away. We show that when the metric we learn is used in simple classifiers, it yields substantial improvements over standard alternatives on a variety of problems. We also discuss how the learned metric may be used to obtain a compact low dimensional feature representation of the original input space, allowing more efficient classification with very little reduction in performance.
The analysis of decomposition methods for support vector machines
 IEEE Transactions on Neural Networks
, 1999
"... Abstract. The decomposition method is currently one of the major methods for solving support vector machines. An important issue of this method is the selection of working sets. In this paper through the design of decomposition methods for boundconstrained SVM formulations we demonstrate that the w ..."
Abstract

Cited by 108 (21 self)
 Add to MetaCart
Abstract. The decomposition method is currently one of the major methods for solving support vector machines. An important issue of this method is the selection of working sets. In this paper through the design of decomposition methods for boundconstrained SVM formulations we demonstrate that the working set selection is not a trivial task. Then from the experimental analysis we propose a simple selection of the working set which leads to faster convergences for difficult cases. Numerical experiments on different types of problems are conducted to demonstrate the viability of the proposed method.
Is that you? Metric learning approaches for face identification
 In ICCV
, 2009
"... Face identification is the problem of determining whether two face images depict the same person or not. This is difficult due to variations in scale, pose, lighting, background, expression, hairstyle, and glasses. In this paper we present two methods for learning robust distance measures: (a) a log ..."
Abstract

Cited by 62 (5 self)
 Add to MetaCart
Face identification is the problem of determining whether two face images depict the same person or not. This is difficult due to variations in scale, pose, lighting, background, expression, hairstyle, and glasses. In this paper we present two methods for learning robust distance measures: (a) a logistic discriminant approach which learns the metric from a set of labelled image pairs (LDML) and (b) a nearest neighbour approach which computes the probability for two images to belong to the same class (MkNN). We evaluate our approaches on the Labeled Faces in the Wild data set, a large and very challenging data set of faces from Yahoo! News. The evaluation protocol for this data set defines a restricted setting, where a fixed set of positive and negative image pairs is given, as well as an unrestricted one, where faces are labelled by their identity. We are the first to present results for the unrestricted setting, and show that our methods benefit from this richer training data, much more so than the current stateoftheart method. Our results of 79.3 % and 87.5 % correct for the restricted and unrestricted setting respectively, significantly improve over the current stateoftheart result of 78.5%. Confidence scores obtained for face identification can be used for many applications e.g. clustering or recognition from a single training example. We show that our learned metrics also improve performance for these tasks. 1.
An Implicit Filtering Algorithm For Optimization Of Functions With Many Local Minima
 SIAM J. Optim
, 1995
"... . In this paper we describe and analyze an algorithm for certain box constrained optimization problems that may have several local minima. A paradigm for these problems is one in which the function to be minimized is the sum of a simple function, such as a convex quadratic, and high frequency, low a ..."
Abstract

Cited by 53 (16 self)
 Add to MetaCart
. In this paper we describe and analyze an algorithm for certain box constrained optimization problems that may have several local minima. A paradigm for these problems is one in which the function to be minimized is the sum of a simple function, such as a convex quadratic, and high frequency, low amplitude terms which cause local minima away from the global minimum of the simple function. Our method is gradient based and therefore the performance can be improved by use of quasiNewton methods. Key words. filtering, projected gradient algorithm, quasiNewton method AMS(MOS) subject classifications. 65H10, 65K05, 65K10 1. Introduction. In this paper we describe and analyze an algorithm for bound constrained optimization problems that may have several local minima. The type of problem we have in mind is one in which the function to be minimized is the sum of a simple function, such as a convex quadratic, and high frequency, low amplitude terms which cause the local minima. Of particul...
Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis
 Journal of Machine Learning Research
, 2007
"... Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in highdimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a ..."
Abstract

Cited by 48 (11 self)
 Add to MetaCart
Reducing the dimensionality of data without losing intrinsic information is an important preprocessing step in highdimensional data analysis. Fisher discriminant analysis (FDA) is a traditional technique for supervised dimensionality reduction, but it tends to give undesired results if samples in a class are multimodal. An unsupervised dimensionality reduction method called localitypreserving projection (LPP) can work well with multimodal data due to its locality preserving property. However, since LPP does not take the label information into account, it is not necessarily useful in supervised learning scenarios. In this paper, we propose a new linear supervised dimensionality reduction method called local Fisher discriminant analysis (LFDA), which effectively combines the ideas of FDA and LPP. LFDA has an analytic form of the embedding transformation and the solution can be easily computed just by solving a generalized eigenvalue problem. We demonstrate the practical usefulness and high scalability of the LFDA method in data visualization and classification tasks through extensive simulation studies. We also show that LFDA can be extended to nonlinear dimensionality reduction scenarios by applying the kernel trick.
Euclidean embedding of cooccurrence data
 Advances in Neural Information Processing Systems 17
, 2005
"... Abstract Embedding algorithms search for low dimensional structure in complexdata, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single comm ..."
Abstract

Cited by 36 (2 self)
 Add to MetaCart
Abstract Embedding algorithms search for low dimensional structure in complexdata, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space based on their cooccurrence statistics. Thejoint distributions are modeled as exponentials of Euclidean distances in the lowdimensional embedding space, which links the problem to convex optimization over positive semidefinite matrices. The local structure of our embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text datasets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling and correspondence analysis. 1 Introduction Embeddings of objects in a lowdimensional space are an important tool in unsupervisedlearning and in preprocessing data for supervised learning algorithms. They are especially valuable for exploratory data analysis and visualization by providing easily interpretablerepresentations of the relationships among objects. Most current embedding techniques build low dimensional mappings that preserve certain relationships among objects and differ in the relationships they choose to preserve, which range from pairwise distances in multidimensional scaling (MDS) [4] to neighborhood structure in locally linear embedding[12]. All these methods operate on objects of a single type endowed with a measure of similarity or dissimilarity. However, realworld data often involve objects of several very different types without anatural measure of similarity. For example, typical web pages or scientific papers contain
Projected Subgradient Methods for Learning Sparse Gaussians
"... Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a highdimensional space. Our approach uses the ℓ1norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient meth ..."
Abstract

Cited by 31 (0 self)
 Add to MetaCart
Gaussian Markov random fields (GMRFs) are useful in a broad range of applications. In this paper we tackle the problem of learning a sparse GMRF in a highdimensional space. Our approach uses the ℓ1norm as a regularization on the inverse covariance matrix. We utilize a novel projected gradient method, which is faster than previous methods in practice and equal to the best performing of these in asymptotic complexity. We also extend the ℓ1regularized objective to the problem of sparsifying entire blocks within the inverse covariance matrix. Our methods generalize fairly easily to this case, while other methods do not. We demonstrate that our extensions give better generalization performance on two real domains—biological network analysis and a 2Dshape modeling image task. 1