Results 1  10
of
2,025
Performance Optimizations and Bounds for Sparse MatrixVector Multiply
 In Proceedings of Supercomputing
, 2002
"... We consider performance tuning, by code and data structure reorganization, of sparse matrixvector multiply (SpMV), one of the most important computational kernels in scientific applications. This paper addresses the fundamental questions of what limits exist on such performance tuning, and how ..."
Abstract

Cited by 57 (10 self)
 Add to MetaCart
We consider performance tuning, by code and data structure reorganization, of sparse matrixvector multiply (SpMV), one of the most important computational kernels in scientific applications. This paper addresses the fundamental questions of what limits exist on such performance tuning, and how
Benchmarking Sparse MatrixVector Multiply
, 2006
"... Abstract — We present a benchmark for evaluating the performance of Sparse matrixdense vector multiply (abbreviated as SpMV) on scalar uniprocessor machines. Though SpMV is an important kernel in scientific computation, there are currently no adequate benchmarks for measuring its performance across ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Abstract — We present a benchmark for evaluating the performance of Sparse matrixdense vector multiply (abbreviated as SpMV) on scalar uniprocessor machines. Though SpMV is an important kernel in scientific computation, there are currently no adequate benchmarks for measuring its performance
A library for parallel sparse matrixvector multiplies
, 2005
"... We provide parallel matrixvector multiply routines for 1D and 2D partitioned sparse square and rectangular matrices. We clearly give pseudocodes that perform necessary initializations for parallel execution. We show how to maximize overlapping between communication and computation through the pro ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
We provide parallel matrixvector multiply routines for 1D and 2D partitioned sparse square and rectangular matrices. We clearly give pseudocodes that perform necessary initializations for parallel execution. We show how to maximize overlapping between communication and computation through
A BENCHMARK FOR REGISTERBLOCKED SPARSE MATRIXVECTOR MULTIPLY
"... Abstract. We develop a sparse matrixvector multiply (SMVM) benchmark for block compressed sparse row (BSR) matrices. These occur frequently in linear systems generated by the finite element method (FEM), for example, and are naturally suited for register blocking optimizations. Unlike current SMVM ..."
Abstract
 Add to MetaCart
Abstract. We develop a sparse matrixvector multiply (SMVM) benchmark for block compressed sparse row (BSR) matrices. These occur frequently in linear systems generated by the finite element method (FEM), for example, and are naturally suited for register blocking optimizations. Unlike current SMVM
FPGA vs. GPU for Sparse Matrix Vector Multiply
"... Abstract—Sparse matrixvector multiplication (SpMV) is a common operation in numerical linear algebra and is the computational kernel of many scientific applications. It is one of the original and perhaps most studied targets for FPGA acceleration. Despite this, GPUs, which have only recently gained ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract—Sparse matrixvector multiplication (SpMV) is a common operation in numerical linear algebra and is the computational kernel of many scientific applications. It is one of the original and perhaps most studied targets for FPGA acceleration. Despite this, GPUs, which have only recently
An Improved Sparse MatrixVector Multiply Based on Recursive Sparse Blocks Layout
"... Abstract. The Recursive Sparse Blocks (RSB) is a sparse matrix layout designed for coarse grained parallelism and reduced cache misses when operating with matrices, which are larger than a computer’s cache. By laying out the matrix in sparse, non overlapping blocks, we allow for the shared memory pa ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
parallel execution of transposed SParse MatrixVector multiply (SpMV), with higher efficiency than the traditional Compressed Sparse Rows (CSR) format. In this note we cover two issues. First, we propose two improvements to our original approach. Second, we look at the performance of standard
Performance models for evaluation and automatic tuning of symmetric sparse matrixvector multiply
 In Proceedings of the International Conference on Parallel Processing
, 2004
"... We present optimizations for sparse matrixvector multiply SpMV and its generalization to multiple vectors, SpMM, when the matrix is symmetric: (1) symmetric storage, (2) register blocking, and (3) vector blocking. Combined with register blocking, symmetry saves more than 50 % in matrix storage. We ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
We present optimizations for sparse matrixvector multiply SpMV and its generalization to multiple vectors, SpMM, when the matrix is symmetric: (1) symmetric storage, (2) register blocking, and (3) vector blocking. Combined with register blocking, symmetry saves more than 50 % in matrix storage. We
When cache blocking sparse matrix vector multiply works and why
 In Proceedings of the PARA’04 Workshop on the Stateoftheart in Scientific Computing
, 2004
"... Abstract We present new performance models and more compact data structures for cache blocking when applied to sparse matrixvector multiply (SpM×V). We extend our prior models by relaxing the assumption that the vectors fit in cache and find that the new models are accurate enough to predict optimu ..."
Abstract

Cited by 28 (5 self)
 Add to MetaCart
Abstract We present new performance models and more compact data structures for cache blocking when applied to sparse matrixvector multiply (SpM×V). We extend our prior models by relaxing the assumption that the vectors fit in cache and find that the new models are accurate enough to predict
Modeldriven autotuning of sparse matrixvector multiply on GPUs
 In PPoPP
, 2010
"... We present a performance modeldriven framework for automated performance tuning (autotuning) of sparse matrixvector multiply (SpMV) on systems accelerated by graphics processing units (GPU). Our study consists of two parts. First, we describe several carefully handtuned SpMV implementations for G ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
We present a performance modeldriven framework for automated performance tuning (autotuning) of sparse matrixvector multiply (SpMV) on systems accelerated by graphics processing units (GPU). Our study consists of two parts. First, we describe several carefully handtuned SpMV implementations
Results 1  10
of
2,025