Results 1 
6 of
6
On Parallelizing Matrix Multiplication by the ColumnRow Method
"... We consider the problem of sparse matrix multiplication by the column row method in a distributed setting where the matrix product is not necessarily sparse. We present a surprisingly simple method for “consistent ” parallel processing of sparse outer products (columnrow vector products) over sever ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
We consider the problem of sparse matrix multiplication by the column row method in a distributed setting where the matrix product is not necessarily sparse. We present a surprisingly simple method for “consistent ” parallel processing of sparse outer products (columnrow vector products) over several processors, in a communicationavoiding setting where each processor has a copy of the input. The method is consistent in the sense that a given output entry is always assigned to the same processor independently of the specific structure of the outer product. We show guarantees on the work done by each processor, and achieve linear speedup down to the point where the cost is dominated by reading the input. Our method gives a way of distributing (or parallelizing) matrix product computations in settings where the main bottlenecks are storing the result matrix, and interprocessor communication. Motivated by observations on real data that often the absolute values of the entries in the product adhere to a power law, we combine our approach with frequent items mining algorithms and show how to obtain a tight approximation of the weight of the heaviest entries in the product matrix. As a case study we present the application of our approach to frequent pair mining in transactional data streams, a problem that can be phrased in terms of sparse {0, 1}integer matrix multiplication by the columnrow method. Experimental evaluation of the proposed method on reallife data supports the theoretical findings. 1
RectangleEfficient Aggregation in Spatial Data Streams
"... We consider the estimation of aggregates over a data stream of multidimensional axisaligned rectangles. Rectangles are a basic primitive object in spatial databases, and efficient aggregation of rectangles is a fundamental task. The data stream model has emerged as a de facto model for processing m ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We consider the estimation of aggregates over a data stream of multidimensional axisaligned rectangles. Rectangles are a basic primitive object in spatial databases, and efficient aggregation of rectangles is a fundamental task. The data stream model has emerged as a de facto model for processing massive databases in which the data resides in external memory or the cloud and is streamed through main memory. For a point p, let n(p) denote the sum of the weights of all rectangles in the stream that contain p. We give nearoptimal solutions for basic problems, including (1) the kth frequency moment Fk = ∑ points p n(p)k, (2) the counting version of stabbing queries, which seeks an estimate of n(p) given p, and (3) identification of heavyhitters, i.e., points p for which n(p) is large. An important special case of Fk is F0, which corresponds to the volume of the union of the rectangles. This is a celebrated problem in computational geometry known as “Klee’s measure problem”, and our work yields the first solution in the streaming model for dimensions greater than one.
Compressed Matrix Multiplication
"... We present a simple algorithm that approximates the product of nbyn real matrices A and B. Let ABF denote the Frobenius norm of AB, and b be a parameter determining the time/accuracy tradeoff. Given 2wise independent hash functions h1, h2: [n] → [b], and s1, s2: [n] → {−1, +1} the algorith ..."
Abstract
 Add to MetaCart
We present a simple algorithm that approximates the product of nbyn real matrices A and B. Let ABF denote the Frobenius norm of AB, and b be a parameter determining the time/accuracy tradeoff. Given 2wise independent hash functions h1, h2: [n] → [b], and s1, s2: [n] → {−1, +1} the algorithm works by first “compressing ” the matrix product into the polynomial n∑ n∑ p(x) = Aiks1(i) x h1(i) n∑ ⎝ Bkjs2(j) x h2(j) k=1 i=1 Using the fast Fourier transform to compute polynomial multiplication, we can compute c0,..., cb−1 such that ∑ i cixi = (p(x) mod xb) + (p(x) div xb) in time Õ(n2 + nb). An unbiased estimator of (AB)ij with variance at most AB  2 F /b can then be computed as: j=1 Cij = s1(i) s2(j) c (h1(i)+h2(j)) mod b. Our approach also leads to an algorithm for computing AB exactly, with high probability, in time Õ(N + nb) in the case where A and B have at most N nonzero entries, and AB has at most b nonzero entries.
Beating the Direct Sum Theorem in Communication Complexity with Implications for Sketching
"... A direct sum theorem for two parties and a function f states that the communication cost of solving k copies of f simultaneously with error probability 1/3 is at least k · R1/3(f), where R1/3(f) is the communication required to solve a single copy of f with error probability 1/3. We improve this for ..."
Abstract
 Add to MetaCart
A direct sum theorem for two parties and a function f states that the communication cost of solving k copies of f simultaneously with error probability 1/3 is at least k · R1/3(f), where R1/3(f) is the communication required to solve a single copy of f with error probability 1/3. We improve this for a natural family of functions f, showing that the 1way communication required to solve k copies of f simultaneously with probability 2/3 is Ω(k · R1/k(f)). Since R1/k(f) may be as large as Ω(R1/3(f) · log k), we asymptotically beat the direct sum bound for such functions, showing that the trivial upper bound of solving each of the k copies of f with probability 1 − O(1/k) and taking a union bound is optimal! In order to achieve this, our direct sum involves a novel measure of information cost which allows a protocol to abort with constant probability, and otherwise must be correct with very high probability. Moreover, for the functions considered, we show strong lower bound on the communication cost of protocols with these relaxed guarantees; indeed, our lower bounds match those for protocols that are not allowed to abort. In the distributed and streaming models, where one wants to be correct not only on a single query, but simultaneously on a sequence of n queries, we obtain optimal lower bounds on the communication or space complexity. Lower bounds obtained from our direct sum result show that a number of techniques in the sketching literature are optimal, including the following: • (JL transform) Lower bound of Ω ( 1 ɛ2 log n) on the δ dimension of (oblivious) JohnsonLindenstrauss transforms. • (ℓpestimation) Lower bound for the size of encodings of n vectors in [±M] d that allow ℓ1 or ℓ2estimation of (log d + log M)). Ω(nɛ −2 log n δ • (Matrix sketching) Lower bound of Ω ( 1 ɛ2 log n) on the δ dimension of a matrix sketch S satisfying the entrywise guarantee (ASS T B)i,j − (AB)i,j  ≤ ɛ‖Ai‖2‖B j ‖2. • (Database joins) Lower bound of Ω(n 1 ɛ2 log n log M) for δ sketching frequency vectors of n tables in a database, each with M records, in order to allow join size estimation. 1
Fast and Scalable Polynomial Kernels via Explicit Feature Maps *
"... Approximation of nonlinear kernels using random feature mapping has been successfully employed in largescale data analysis applications, accelerating the training of kernel machines. While previous random feature mappings run in O(ndD) time for n training samples in ddimensional space and D rando ..."
Abstract
 Add to MetaCart
Approximation of nonlinear kernels using random feature mapping has been successfully employed in largescale data analysis applications, accelerating the training of kernel machines. While previous random feature mappings run in O(ndD) time for n training samples in ddimensional space and D random feature maps, we propose a novel randomized tensor product technique, called Tensor Sketching, for approximating any polynomial kernel in O(n(d + D log D)) time. Also, we introduce both absolute and relative error bounds for our approximation to guarantee the reliability of our estimation algorithm. Empirically, Tensor Sketching achieves higher accuracy and often runs orders of magnitude faster than the stateoftheart approach for largescale realworld datasets.