Results 1  10
of
261
A BENCHMARK FOR REGISTERBLOCKED SPARSE MATRIXVECTOR MULTIPLY
"... Abstract. We develop a sparse matrixvector multiply (SMVM) benchmark for block compressed sparse row (BSR) matrices. These occur frequently in linear systems generated by the finite element method (FEM), for example, and are naturally suited for register blocking optimizations. Unlike current SMVM ..."
Abstract
 Add to MetaCart
Abstract. We develop a sparse matrixvector multiply (SMVM) benchmark for block compressed sparse row (BSR) matrices. These occur frequently in linear systems generated by the finite element method (FEM), for example, and are naturally suited for register blocking optimizations. Unlike current SMVM
Benchmarking Sparse MatrixVector Multiply
, 2006
"... Abstract — We present a benchmark for evaluating the performance of Sparse matrixdense vector multiply (abbreviated as SpMV) on scalar uniprocessor machines. Though SpMV is an important kernel in scientific computation, there are currently no adequate benchmarks for measuring its performance across ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Abstract — We present a benchmark for evaluating the performance of Sparse matrixdense vector multiply (abbreviated as SpMV) on scalar uniprocessor machines. Though SpMV is an important kernel in scientific computation, there are currently no adequate benchmarks for measuring its performance
Performance models for evaluation and automatic tuning of symmetric sparse matrixvector multiply
 In Proceedings of the International Conference on Parallel Processing
, 2004
"... We present optimizations for sparse matrixvector multiply SpMV and its generalization to multiple vectors, SpMM, when the matrix is symmetric: (1) symmetric storage, (2) register blocking, and (3) vector blocking. Combined with register blocking, symmetry saves more than 50 % in matrix storage. We ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
We present optimizations for sparse matrixvector multiply SpMV and its generalization to multiple vectors, SpMM, when the matrix is symmetric: (1) symmetric storage, (2) register blocking, and (3) vector blocking. Combined with register blocking, symmetry saves more than 50 % in matrix storage. We
Performance Optimizations and Bounds for Sparse MatrixVector Multiply
 In Proceedings of Supercomputing
, 2002
"... We consider performance tuning, by code and data structure reorganization, of sparse matrixvector multiply (SpMV), one of the most important computational kernels in scientific applications. This paper addresses the fundamental questions of what limits exist on such performance tuning, and how ..."
Abstract

Cited by 57 (10 self)
 Add to MetaCart
We consider performance tuning, by code and data structure reorganization, of sparse matrixvector multiply (SpMV), one of the most important computational kernels in scientific applications. This paper addresses the fundamental questions of what limits exist on such performance tuning, and how
Optimization of Sparse Matrixvector Multiplication on Emerging Multicore Platforms
 In Proc. SC2007: High performance computing, networking, and storage conference
, 2007
"... We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore spec ..."
Abstract

Cited by 153 (20 self)
 Add to MetaCart
specific optimization methodologies for important scientific computations. In this work, we examine sparse matrixvector multiply (SpMV) – one of the most heavily used kernels in scientific computing – across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD
Reducedbandwidth multithreaded algorithms for sparse matrixvector multiplication
 In Proc. IPDPS
, 2011
"... Abstract—On multicore architectures, the ratio of peak memory bandwidth to peak floatingpoint performance (byte:flop ratio) is decreasing as core counts increase, further limiting the performance of bandwidth limited applications. Multiplying a sparse matrix (as well as its transpose in the unsymme ..."
Abstract

Cited by 22 (0 self)
 Add to MetaCart
Abstract—On multicore architectures, the ratio of peak memory bandwidth to peak floatingpoint performance (byte:flop ratio) is decreasing as core counts increase, further limiting the performance of bandwidth limited applications. Multiplying a sparse matrix (as well as its transpose
A library for parallel sparse matrixvector multiplies
, 2005
"... We provide parallel matrixvector multiply routines for 1D and 2D partitioned sparse square and rectangular matrices. We clearly give pseudocodes that perform necessary initializations for parallel execution. We show how to maximize overlapping between communication and computation through the pro ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
We provide parallel matrixvector multiply routines for 1D and 2D partitioned sparse square and rectangular matrices. We clearly give pseudocodes that perform necessary initializations for parallel execution. We show how to maximize overlapping between communication and computation through
Reconfigurable Sparse/Dense MatrixVector Multiplier
"... We propose an ANSI/IEEE754 double precision floatingpoint matrixvector multiplier. Its main feature is the capability to process efficiently both Dense MatrixVector Multiplications (DMVM) and Sparse MatrixVector Multiplications (SMVM). The design is composed of multiple processing elements (PE ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We propose an ANSI/IEEE754 double precision floatingpoint matrixvector multiplier. Its main feature is the capability to process efficiently both Dense MatrixVector Multiplications (DMVM) and Sparse MatrixVector Multiplications (SMVM). The design is composed of multiple processing elements
Modeldriven autotuning of sparse matrixvector multiply on GPUs
 In PPoPP
, 2010
"... We present a performance modeldriven framework for automated performance tuning (autotuning) of sparse matrixvector multiply (SpMV) on systems accelerated by graphics processing units (GPU). Our study consists of two parts. First, we describe several carefully handtuned SpMV implementations for G ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
We present a performance modeldriven framework for automated performance tuning (autotuning) of sparse matrixvector multiply (SpMV) on systems accelerated by graphics processing units (GPU). Our study consists of two parts. First, we describe several carefully handtuned SpMV implementations
BBCS Based Sparse MatrixVector Multiplication: Initial Evaluation
 16th IMACS World Congress on Scientific Computation, Applied Mathematics and Simulation
, 2000
"... This paper presents an evaluation of the BBCS scheme meant to alleviate the performance degradation experienced byVector Processors (VPs) when manipulating sparse matrices. In particular we address the execution of Sparse Matrix Vector Multiplication (SMVM) algorithms on VPs. First weintroduce a B ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
Block Based Compressed Storage (BBCS) sparse matrix representation format variants, and a BBCS based SMVM algorithm. Subsequently, we consider a set of benchmark matrices, report some preliminary performance evaluations, and compare our scheme with the Jagged Diagonal (JD) scheme. Our experiments
Results 1  10
of
261