Results 1  10
of
65
An Improved Approximation Algorithm for the Column Subset Selection Problem
"... We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel twostage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stag ..."
Abstract

Cited by 71 (13 self)
 Add to MetaCart
We consider the problem of selecting the “best ” subset of exactly k columns from an m × n matrix A. In particular, we present and analyze a novel twostage algorithm that runs in O(min{mn 2, m 2 n}) time and returns as output an m × k matrix C consisting of exactly k columns of A. In the first stage (the randomized stage), the algorithm randomly selects O(k log k) columns according to a judiciouslychosen probability distribution that depends on information in the topk right singular subspace of A. In the second stage (the deterministic stage), the algorithm applies a deterministic columnselection procedure to select and return exactly k columns from the set of columns selected in the first stage. Let C be the m × k matrix containing those k columns, let PC denote the projection matrix onto the span of those columns, and let Ak denote the “best ” rankk approximation to the matrix A as computed with the singular value decomposition. Then, we prove that ‖A − PCA‖2 ≤ O k 3 4 log 1
The 4th Montreal Scientific Computing Days
, 2007
"... A randomized algorithm for rank revealing QR factorizations and applications ..."
Abstract
 Add to MetaCart
A randomized algorithm for rank revealing QR factorizations and applications
Improved matrix algorithms via the subsampled randomized Hadamard transform
 SIAM J. Matrix Analysis Applications
"... Abstract. Several recent randomized linear algebra algorithms rely upon fast dimension reduction methods. A popular choice is the subsampled randomized Hadamard transform (SRHT). In this article, we address the efficacy, in the Frobenius and spectral norms, of an SRHTbased lowrank matrix approxim ..."
Abstract

Cited by 16 (3 self)
 Add to MetaCart
Abstract. Several recent randomized linear algebra algorithms rely upon fast dimension reduction methods. A popular choice is the subsampled randomized Hadamard transform (SRHT). In this article, we address the efficacy, in the Frobenius and spectral norms, of an SRHTbased lowrank matrix approximation technique introduced by Woolfe, Liberty, Rohklin, and Tygert. We establish a slightly better Frobenius norm error bound than is currently available, and a much sharper spectral norm error bound (in the presence of reasonable decay of the singular values). Along the way, we produce several results on matrix operations with SRHTs (such as approximate matrix multiplication) that may be of independent interest. Our approach builds upon Tropp’s in “Improved Analysis of the
Random projections for the nonnegative leastsquares problem
 LINEAR ALGEBRA AND ITS APPLICATIONS
, 2009
"... ..."
Unsupervised Feature Selection for Principal Components Analysis [Extended Abstract]
"... Principal Components Analysis (PCA) is the predominant linear dimensionality reduction technique, and has been widely applied on datasets in all scientific domains. We consider, both theoretically and empirically, the topic of unsupervised feature selection for PCA, by leveraging algorithms for the ..."
Abstract

Cited by 24 (8 self)
 Add to MetaCart
Principal Components Analysis (PCA) is the predominant linear dimensionality reduction technique, and has been widely applied on datasets in all scientific domains. We consider, both theoretically and empirically, the topic of unsupervised feature selection for PCA, by leveraging algorithms for the socalled Column Subset Selection Problem (CSSP). In words, the CSSP seeks the“best”subset of exactly k columns from an m×n data matrix A, and has been extensively studied in the Numerical Linear Algebra community. We present a novel twostage algorithm for the CSSP. From a theoretical perspective, for small to moderate values of k, this algorithm significantly improves upon the best previouslyexisting results [24, 12] for the CSSP. From an empirical perspective, we evaluate this algorithm as an unsupervised feature selection strategy in three application domains of modern statistical data analysis: finance, documentterm data, and genetics. We pay particular attention to how this algorithm may be used to select representative or landmark features from an objectfeature matrix in an unsupervised manner. In all three application domains, we are able to identify k landmark features, i.e., columns of the data matrix, that capture nearly the same amount of information as does the subspace that is spanned by the top k “eigenfeatures.”
Random Projections for kmeans Clustering
 In Advances in Neural Information Processing Systems 23, number iii
, 2010
"... This paper discusses the topic of dimensionality reduction for kmeans clustering. We prove that any set of n points in d dimensions (rows in a matrix A ∈ R n×d) can be projected into t = Ω(k/ε 2) dimensions, for any ε ∈ (0, 1/3), in O(nd⌈ε −2 k / log(d)⌉) time, such that with constant probability t ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
This paper discusses the topic of dimensionality reduction for kmeans clustering. We prove that any set of n points in d dimensions (rows in a matrix A ∈ R n×d) can be projected into t = Ω(k/ε 2) dimensions, for any ε ∈ (0, 1/3), in O(nd⌈ε −2 k / log(d)⌉) time, such that with constant probability the optimal kpartition of the point set is preserved within a factor of 2 + ε. The projection is done by postmultiplying A with a d × t random matrix R having entries +1 / √ t or −1 / √ t with equal probability. A numerical implementation of our technique and experiments on a large face images dataset verify the speed and the accuracy of our theoretical results. 1
Faster subset selection for matrices and applications
 SIAM J. Matrix Anal. Appl
"... Abstract. We study the following problem of subset selection for matrices: given a matrix X ∈ Rn×m (m> n) and a sampling parameter k (n ≤ k ≤ m), select a subset of k columns from X such that the pseudoinverse of the sampled matrix has as small a norm as possible. In this work, we focus on the Fr ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
Abstract. We study the following problem of subset selection for matrices: given a matrix X ∈ Rn×m (m> n) and a sampling parameter k (n ≤ k ≤ m), select a subset of k columns from X such that the pseudoinverse of the sampled matrix has as small a norm as possible. In this work, we focus on the Frobenius and the spectral matrix norms. We describe several novel (deterministic and randomized) approximation algorithms for this problem with approximation bounds that are optimal up to constant factors. Additionally, we show that the combinatorial problem of finding a lowstretch spanning tree in an undirected graph corresponds to subset selection, and discuss various implications of this reduction.
Unsupervised Feature Selection for the kmeans Clustering Problem
"... We present a novel feature selection algorithm for the kmeans clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ϵ ∈ (0, 1), selects and appropriately rescales in an unsupervised manner Θ(k log(k/ϵ)/ϵ 2) features from a dataset of arbitrary dimensions. We prove that ..."
Abstract

Cited by 16 (9 self)
 Add to MetaCart
We present a novel feature selection algorithm for the kmeans clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ϵ ∈ (0, 1), selects and appropriately rescales in an unsupervised manner Θ(k log(k/ϵ)/ϵ 2) features from a dataset of arbitrary dimensions. We prove that, if we run any γapproximate kmeans algorithm (γ ≥ 1) on the features selected using our method, we can find a (1 + (1 + ϵ)γ)approximate partition with high probability. 1
Deterministic and randomized column selection algorithms for matrices
"... Abstract. Given a matrix A ∈ R m×n (m ≥ n) and an integer k (k ≪ n) we discuss deterministic and randomized algorithms for selecting the k “most linearly independent ” columns of A. After summarizing previous deterministic and randomized algorithms for this task, we present a hybrid approach. First, ..."
Abstract
 Add to MetaCart
Abstract. Given a matrix A ∈ R m×n (m ≥ n) and an integer k (k ≪ n) we discuss deterministic and randomized algorithms for selecting the k “most linearly independent ” columns of A. After summarizing previous deterministic and randomized algorithms for this task, we present a hybrid approach. First, we employ a randomized algorithm presented in (1) to select c = O(k log k) columns of A and then we employ the deterministic algorithm of (2) to pick exactly k columns from the c columns that were kept after the first step. We provide novel provable bounds for the singular values of the matrix containing the selected columns. 1
Results 1  10
of
65