## Compressed Matrix Multiplication ∗

Citations: | 6 - 2 self |

### BibTeX

@MISC{Pagh_compressedmatrix,

author = {Rasmus Pagh},

title = {Compressed Matrix Multiplication ∗},

year = {}

}

### OpenURL

### Abstract

Motivated by the problems of computing sample covariance matrices, and of transforming a collection of vectors to a basis where they are sparse, we present a simple algorithm that computes an approximation of the product of two n-by-n real matrices A and B. Let ||AB||F denote the Frobenius norm of AB, and b be a parameter determining the time/accuracy trade-off. Given 2-wise independent hash functions h1, h2: [n] → [b], and s1, s2: [n] → {−1, +1} the algorithm works by first “compressing ” the matrix product into the polynomial n∑ n∑ p(x) = Aiks1(i) x h1(i) n∑ Bkjs2(j) x h2(j) k=1 i=1 j=1 Using FFT for polynomial multiplication, we can compute c0,..., cb−1 such that ∑ i cixi = (p(x) mod x b)+(p(x) div x b) in time Õ(n2 + nb). An unbiased estimator of (AB)ij with variance at most ||AB| | 2 F /b can then be computed as: Cij = s1(i) s2(j) c(h1(i)+h2(j)) mod b. Our approach also leads to an algorithm for computing AB exactly, whp., in time Õ(N + nb) in the case where A and B have at most N nonzero entries, and AB has at most b nonzero entries. Also, we use error-correcting codes in a novel way to recover significant entries of AB in near-linear time.

### Citations

801 |
Matrix multiplications via arithmetic progressions
- Coppersmith, Winograd
- 1990
(Show Context)
Citation Context ...duction to fast rectangular matrix multiplication, that this is possible in time O(n 2¯0.188 b ). Observe that for ¯b = n 2 this becomes identical to the O(n 2.376 ) bound by Coppersmith and Winograd =-=[12]-=-. Yuster and Zwick [25] devised asymptotically fast algorithms for the case of sparse input matrices, using a matrix partitioning idea. Amossen and Pagh [4] extended this result to be more efficient i... |

701 | The space complexity of approximating the frequency moments
- Alon, Matias, et al.
- 1996
(Show Context)
Citation Context ...This is a rather weak bound in general, since the largest possible magnitude of an entry in AB is M 2 n. Sarlós [23] showed how to achieve the same Frobenius norm error guarantee using c AMS sketches =-=[2]-=- on rows of A and columns of B. Again, if the classical matrix multiplication algorithm is used to combine the sketches, the time complexity is O(n 2 c). This method gives a stronger error bound for e... |

668 |
Universal classes of hash functions
- Carter, Wegman
- 1977
(Show Context)
Citation Context ...e function h as follows: h(i, j) = h1(i) + h2(j) mod b, where h1 and h2 are chosen independently at random from a 3-wise independent family. It is well-known that this also makes h 3-wise independent =-=[6, 22]-=-. Given a vector u ∈ R n and functions ht : [n] → {0, . . . , b − 1}, st : [n] → {−1, +1} we define the following polynomial: p ht,st u (x) = n∑ i=1 st(i) ui x ht(i) . The polynomial can be represente... |

662 |
An algorithm for machine calculation of complex fourier series
- Cooley, Tukey
- 1965
(Show Context)
Citation Context ...an be seen as a compressed sensing method for the matrix product, with the nonstandard idea that the sketch of AB is computed without explicitly constructing AB. The main technical idea is to use FFT =-=[11]-=- to efficiently compute a linear sketch of an outer product of two vectors. We also make use of error-correcting codes in a novel way to achieve recovery of the entries of AB having highest magnitude ... |

277 | Expander codes - Sipser, Spielman - 1996 |

260 | Finding frequent items in data streams
- Charikar, Chen, et al.
(Show Context)
Citation Context ...erent sketch we can avoid this problem, and additionally get better estimates for compressible matrices. 2.2 Count sketches Our algorithm will use the Count Sketch of Charikar, Chen and Farach-Colton =-=[7]-=-, which has precision at least as good as the estimator obtained by taking the average of b AMS sketches, but is much better for skewed distributions. The method maintains a sketch of any desired size... |

107 | Tracking join and self-join sizes in limited storage
- Alon, Gibbons, et al.
- 1999
(Show Context)
Citation Context ...ividual entry of the approximation matrix. If we write an entry of AB as a dot product, (AB)ij = ãi · ˜ bj, the magnitude of the additive error is O(||ãi||2|| ˜ bj||2/ √ c) with high probability (see =-=[23, 1]-=-). In contrast to the previous results, this approximation can be computed in a single pass over the input matrices. Clarkson and Woodruff [8] further refine the results of Sarlós, and show that the s... |

92 | Improved approximation algorithms for large matrices via random projections
- Sarlós
- 2006
(Show Context)
Citation Context ...ned is of size Ω(M 2 n/ √ c), where M is the magnitude of the largest entry in A and B. This is a rather weak bound in general, since the largest possible magnitude of an entry in AB is M 2 n. Sarlós =-=[23]-=- showed how to achieve the same Frobenius norm error guarantee using c AMS sketches [2] on rows of A and columns of B. Again, if the classical matrix multiplication algorithm is used to combine the sk... |

61 | Fast Monte Carlo algorithms for matrices I: Approximating matrix multiplication
- Drineas, Kannan, et al.
(Show Context)
Citation Context ...enius norm 1 of AB, namely ||AB − C||F = O(||AB||F / √ c) . (This is not shown in [10], but follows from the fact that each estimator has a scaled binomial distribution.) Drineas, Kannan, and Mahoney =-=[14]-=- showed how a simpler sampling strategy can lead to a good approximation of the form CR, where matrices C and R consist of c columns and c rows of A and B, respectively. If Their main error bound is i... |

57 |
On the exact variance of products
- Goodman
- 1960
(Show Context)
Citation Context ...m variable that has expectation 0 if Xi and Xj are 1 m−1 independent. It is computed as times the sum over m observations of (Xi − E[Xi])(Xj − E[Xj]). Assuming independence and using the formula from =-=[17]-=-, this means that m its variance is (m−1) 2 Var(Xi)Var(Xj) < 4 m Var(Xi)Var(Xj), for m ≥ 2. If cov(X) is a diagonal matrix (i.e., every pair of variables is independent), the expected squared Frobeniu... |

45 | Numerical linear algebra in the streaming model
- Clarkson, Woodruff
(Show Context)
Citation Context ... O(||ãi||2|| ˜ bj||2/ √ c) with high probability (see [23, 1]). In contrast to the previous results, this approximation can be computed in a single pass over the input matrices. Clarkson and Woodruff =-=[8]-=- further refine the results of Sarlós, and show that the space usage is nearly optimal in a streaming setting. 1.2 New results In this paper we improve existing results in cases where the matrix produ... |

36 | Fast sparse matrix multiplication
- YUSTER, ZWICK
(Show Context)
Citation Context ...ular matrix multiplication, that this is possible in time O(n 2¯0.188 b ). Observe that for ¯b = n 2 this becomes identical to the O(n 2.376 ) bound by Coppersmith and Winograd [12]. Yuster and Zwick =-=[25]-=- devised asymptotically fast algorithms for the case of sparse input matrices, using a matrix partitioning idea. Amossen and Pagh [4] extended this result to be more efficient in the case where also t... |

33 | Approximating matrix multiplication for pattern recognition tasks (special issue of selected papers from SODA’97
- Cohen, Lewis
- 1999
(Show Context)
Citation Context ...in each column, and is almost as good as the best approximation of this form. However, if some column of AB is dense, the approximation may differ significantly from AB. Historically, Cohen and Lewis =-=[10]-=- were the first to consider randomized algorithms for approximate matrix multiplication, with theoretical results restricted to the case where input matrices do not have negative entries. Suppose A ha... |

28 | Sparse recovery using sparse matrices
- Gilbert, Indyk
(Show Context)
Citation Context ...Also, replacing the fast rectangular matrix multiplication in the result of Iwen and Spencer [19] by a naïve matrix multiplication algorithm, and making use of randomized sparse recovery methods (see =-=[15]-=-), leads to a combinatorial algorithm running in time Õ(n2 +nb) when each column of AB has O(b/n) nonzero values. Approximate matrix multiplication. The result of [19] is not restricted to sparse matr... |

27 |
Polynomial hash functions are reliable (extended abstract
- Dietzfelbinger, Gil, et al.
(Show Context)
Citation Context ...ons involved are fully random. A practical implementation of the involved hash functions is character-based tabulation [22], but for the best theoretical space bounds we use polynomial hash functions =-=[13]-=-. Time and space analysis. We analyze each iteration of the outer loop. Computing puv(x) takes time O(n + b lg b), where the first term is the time to construct the polynomials, and the last term is t... |

20 | Declaring independence via the sketching of sketches
- Indyk, McGregor
- 2008
(Show Context)
Citation Context ...m X = ∑ i s(zi) wi (which we refer to as the AMS sketch). We will use a sign function on pairs (i, j) that is a product of sign functions on the coordinates: s(i, j) = s1(i) s2(j). Indyk and McGregor =-=[18]-=-, and Braverman et al. [5] have previously analyzed moments of AMS sketches with hash functions of this form. However, for our purposes it suffices to observe that s(i, j) is 2-wise independent if s1 ... |

17 |
M.: Approximate sparse recovery: Optimizing time and measurements
- Gilbert, Li, et al.
- 2012
(Show Context)
Citation Context ...tor O(lg n) more time and space, the time complexity for decompression will be O(b lg 2 n). Our main tool is errorcorrecting codes, previously applied to the sparse recovery problem by Gilbert et al. =-=[16]-=-. However, compared to [16] we are able to proceed in a more direct way that avoids iterative decoding. We note that a similar result could be obtained by a 2-dimensional dyadic decomposition of [n] ×... |

12 |
The power of simple tabulation hashing
- Pǎtra¸scu, Thorup
- 2012
(Show Context)
Citation Context ...e function h as follows: h(i, j) = h1(i) + h2(j) mod b, where h1 and h2 are chosen independently at random from a 3-wise independent family. It is well-known that this also makes h 3-wise independent =-=[6, 22]-=-. Given a vector u ∈ R n and functions ht : [n] → {0, . . . , b − 1}, st : [n] → {−1, +1} we define the following polynomial: p ht,st u (x) = n∑ i=1 st(i) ui x ht(i) . The polynomial can be represente... |

9 | Structure prediction and computation of sparse matrix products
- Cohen
- 1998
(Show Context)
Citation Context ...tively, estimation of the number of nonzero entries, and of the ErrF value. 5.1 Number of nonzero entries An constant-factor estimate of ¯ b ≥ b can be computed in time O(N lg N) using Cohen’s method =-=[9]-=- or its refinement for matrix products [3]. Recall that ¯ b is an upper bound on the number of nonzeros, when not taking into account that there may be zeros in AB that are due to cancellation of term... |

6 | Better size estimation for sparse matrix products
- AMOSSEN, CAMPAGNA, et al.
(Show Context)
Citation Context ...o entries, and of the ErrF value. 5.1 Number of nonzero entries An constant-factor estimate of ¯ b ≥ b can be computed in time O(N lg N) using Cohen’s method [9] or its refinement for matrix products =-=[3]-=-. Recall that ¯ b is an upper bound on the number of nonzeros, when not taking into account that there may be zeros in AB that are due to cancellation of terms. We next show how to take cancellation o... |

6 | AMS without 4-wise independence on product domains
- Braverman, Ostrovsky
(Show Context)
Citation Context ...e refer to as the AMS sketch). We will use a sign function on pairs (i, j) that is a product of sign functions on the coordinates: s(i, j) = s1(i) s2(j). Indyk and McGregor [18], and Braverman et al. =-=[5]-=- have previously analyzed moments of AMS sketches with hash functions of this form. However, for our purposes it suffices to observe that s(i, j) is 2-wise independent if s1 and s2 are 2-wise independ... |

6 |
A fast output-sensitive algorithm for Boolean matrix multiplication
- Lingas
- 2009
(Show Context)
Citation Context ...ence and statistics is an area where there is high potential for cross-fertilization (see also [21] for arguments in this direction). 1.1 Related work Matrix multiplication with sparse output. Lingas =-=[20]-=- considered the problem of computing a matrix product AB with at most ¯b entries that are not trivially zero. A matrix entry is said to be trivially zero if every term in the corresponding dot product... |

3 |
Algorithmic and statistical perspectives on large-scale data analysis
- Mahoney
- 2012
(Show Context)
Citation Context ...vectors. We outline some such targets in the conclusion.• The interface between theoretical computer science and statistics is an area where there is high potential for cross-fertilization (see also =-=[21]-=- for arguments in this direction). 1.1 Related work Matrix multiplication with sparse output. Lingas [20] considered the problem of computing a matrix product AB with at most ¯b entries that are not t... |

2 |
A note on compressed sensing and the complexity of matrix multiplication
- IWEN, SPENCER
- 2009
(Show Context)
Citation Context ...ient in the case where also the output matrix is sparse. In the dense input setting of Lingas, this leads to an improved time complexity of O(n 1.724 ¯0.408 b ) for n ≤ ¯b ≤ n 1.25 . Iwen and Spencer =-=[19]-=- showed how to use compressed sensing to compute a matrix product AB in time O(n 2+ε ), for any given constant ε > 0, in the special case where each column of AB contains at most n 0.29462 nonzero val... |

1 |
Amossen and Rasmus Pagh. Faster join-projects and sparse matrix multiplications
- Resen
- 2009
(Show Context)
Citation Context ... 2.376 ) bound by Coppersmith and Winograd [12]. Yuster and Zwick [25] devised asymptotically fast algorithms for the case of sparse input matrices, using a matrix partitioning idea. Amossen and Pagh =-=[4]-=- extended this result to be more efficient in the case where also the output matrix is sparse. In the dense input setting of Lingas, this leads to an improved time complexity of O(n 1.724 ¯0.408 b ) f... |