Results 1  10
of
156
Nearoptimal hashing algorithms for approximate nearest neighbor in high dimensions
, 2008
"... In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The ..."
Abstract

Cited by 443 (7 self)
 Add to MetaCart
In this article, we give an overview of efficient algorithms for the approximate and exact nearest neighbor problem. The goal is to preprocess a dataset of objects (e.g., images) so that later, given a new query object, one can quickly return the dataset object that is most similar to the query. The problem is of significant interest in a wide variety of areas.
FINDING STRUCTURE WITH RANDOMNESS: PROBABILISTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS
"... Lowrank matrix approximations, such as the truncated singular value decomposition and the rankrevealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for ..."
Abstract

Cited by 248 (6 self)
 Add to MetaCart
(Show Context)
Lowrank matrix approximations, such as the truncated singular value decomposition and the rankrevealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing lowrank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired lowrank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition
Improved approximation algorithms for large matrices via random projections
 In Proc. 47th Ann. IEEE Symp. Foundations of Computer Science (FOCS
, 2006
"... ..."
(Show Context)
Sensing by Random Convolution
 IEEE Int. Work. on Comp. Adv. MultiSensor Adaptive Proc., CAMPSAP
, 2007
"... Abstract. This paper outlines a new framework for compressive sensing: convolution with a random waveform followed by random time domain subsampling. We show that sensing by random convolution is a universally efficient data acquisition strategy in that an ndimensional signal which is S sparse in a ..."
Abstract

Cited by 114 (8 self)
 Add to MetaCart
(Show Context)
Abstract. This paper outlines a new framework for compressive sensing: convolution with a random waveform followed by random time domain subsampling. We show that sensing by random convolution is a universally efficient data acquisition strategy in that an ndimensional signal which is S sparse in any fixed representation can be recovered from m � S log n measurements. We discuss two imaging scenarios — radar and Fourier optics — where convolution with a random pulse allows us to seemingly superresolve finescale features, allowing us to recover highresolution signals from lowresolution measurements. 1. Introduction. The new field of compressive sensing (CS) has given us a fresh look at data acquisition, one of the fundamental tasks in signal processing. The message of this theory can be summarized succinctly [7, 8, 10, 15, 32]: the number of measurements we need to reconstruct a signal depends on its sparsity rather than its bandwidth. These measurements, however, are different than the samples that
RELATIVEERROR CUR MATRIX DECOMPOSITIONS
 SIAM J. MATRIX ANAL. APPL
, 2008
"... Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the ..."
Abstract

Cited by 83 (18 self)
 Add to MetaCart
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an m × n matrix A and a rank parameter k. In our first algorithm, C is chosen, and we let A ′ = CC + A, where C + is the Moore–Penrose generalized inverse of C. In our second algorithm C, U, R are chosen, and we let A ′ = CUR. (C and R are matrices that consist of actual columns and rows, respectively, of A, and U is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least 1 − δ, ‖A − A ′ ‖F ≤ (1 + ɛ) ‖A − Ak‖F, where Ak is the “best ” rankk approximation provided by truncating the SVD of A, and where ‖X‖F is the Frobenius norm of the matrix X. The number of columns of C and rows of R is a lowdegree polynomial in k, 1/ɛ, and log(1/δ). Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants
Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes
"... The Fast JohnsonLindenstrauss Transform (FJLT) was recently discovered by Ailon and Chazelle as a novel technique for performing fast dimension reduction with small distortion from ℓ d 2 to ℓ k 2 in time O(max{d log d, k 3}). For k in [Ω(log d), O(d 1/2)] this beats time O(dk) achieved by naive mul ..."
Abstract

Cited by 76 (10 self)
 Add to MetaCart
The Fast JohnsonLindenstrauss Transform (FJLT) was recently discovered by Ailon and Chazelle as a novel technique for performing fast dimension reduction with small distortion from ℓ d 2 to ℓ k 2 in time O(max{d log d, k 3}). For k in [Ω(log d), O(d 1/2)] this beats time O(dk) achieved by naive multiplication by random dense matrices, an approach followed by several authors as a variant of the seminal result by Johnson and Lindenstrauss (JL) from the mid 80’s. In this work we show how to significantly improve the running time to O(d log k) for k = O(d 1/2−δ), for any arbitrary small fixed δ. This beats the better of FJLT and JL. Our analysis uses a powerful measure concentration bound due to Talagrand applied to Rademacher series in Banach spaces (sums of vectors in Banach spaces with random signs). The set of vectors used is a real embedding of dual BCH code vectors over GF (2). We also discuss the number of random bits used and reduction to ℓ1 space. The connection between geometry and discrete coding theory discussed here is interesting in its own right and may be useful in other algorithmic applications as well.
A fast randomized algorithm for the approximation of matrices
, 2007
"... We introduce a randomized procedure that, given an m×n matrix A and a positive integer k, approximates A with a matrix Z of rank k. The algorithm relies on applying a structured l × m random matrix R to each column of A, where l is an integer near to, but greater than, k. The structure of R allows u ..."
Abstract

Cited by 62 (7 self)
 Add to MetaCart
We introduce a randomized procedure that, given an m×n matrix A and a positive integer k, approximates A with a matrix Z of rank k. The algorithm relies on applying a structured l × m random matrix R to each column of A, where l is an integer near to, but greater than, k. The structure of R allows us to apply it to an arbitrary m × 1 vector at a cost proportional to m log(l); the resulting procedure can construct a rankk approximation Z from the entries of A at a cost proportional to mn log(k)+l 2 (m+n). We prove several bounds on the accuracy of the algorithm; one such bound guarantees that the spectral norm ‖A − Z ‖ of the discrepancy between A and Z is of the same order as √ max{m, n} times the (k + 1) st greatest singular value σk+1 of A, with small probability of large deviations. In contrast, the classical pivoted “Q R ” decomposition algorithms (such as GramSchmidt or Householder) require at least kmn floatingpoint operations in order to compute a similarly accurate rankk approximation. In practice, the algorithm of this paper is faster than the classical algorithms, as long as k is neither very small nor very large. Furthermore, the algorithm operates reliably independently of the structure of the matrix A, can access each column of A independently and at most twice, and parallelizes naturally. The results are illustrated via several numerical examples.
Distributed sparse random projections for refinable approximation
 In IEEE/ACM Int. Symposium on Information Processing in Sensor Networks (IPSN
, 2007
"... berkeley.edu Consider a largescale wireless sensor network measuring compressible data, where n distributed data values can be wellapproximated using only k ≪ n coefficients of some known transform. We address the problem of recovering an approximation of the n data values by querying any L sensor ..."
Abstract

Cited by 61 (5 self)
 Add to MetaCart
(Show Context)
berkeley.edu Consider a largescale wireless sensor network measuring compressible data, where n distributed data values can be wellapproximated using only k ≪ n coefficients of some known transform. We address the problem of recovering an approximation of the n data values by querying any L sensors, so that the reconstruction error is comparable to the optimal kterm approximation. To solve this problem, we present a novel distributed algorithm based on sparse random projections, which requires no global coordination or knowledge. The key idea is that the sparsity of the random projections greatly reduces the communication cost of preprocessing the data. Our algorithm allows the collector to choose the number of sensors to query according to the desired approximation error. The reconstruction quality depends only on the number of sensors queried, enabling robust refinable approximation.
FINDING STRUCTURE WITH RANDOMNESS: STOCHASTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS
, 2009
"... Lowrank matrix approximations, such as the truncated singular value decomposition and the rankrevealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys recent research which demonstrates that randomization offers a powerful tool for performing l ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
(Show Context)
Lowrank matrix approximations, such as the truncated singular value decomposition and the rankrevealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys recent research which demonstrates that randomization offers a powerful tool for performing lowrank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. In particular, these techniques offer a route toward principal component analysis (PCA) for petascale data. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired lowrank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider