Results 1  10
of
128,273
MatrixMatrix Multiplication on Heterogeneous Platforms
, 2000
"... In this paper, we address the issue of implementing matrixmatrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations, and collections of heterogeneous clusters. Intuitively, the problem is to load ba ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
In this paper, we address the issue of implementing matrixmatrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations, and collections of heterogeneous clusters. Intuitively, the problem is to load
Understanding the Efficiency of GPU Algorithms for MatrixMatrix Multiplication
, 2004
"... Utilizing graphics hardware for general purpose numerical computations has become a topic of considerable interest. The implementation of streaming algorithms, typified by highly parallel computations with little reuse of input data, has been widely explored on GPUs. We relax the streaming model&a ..."
Abstract

Cited by 95 (1 self)
 Add to MetaCart
's constraint on input reuse and perform an indepth analysis of dense matrixmatrix multiplication, which reuses each element of input matrices O(n) times. Its regular data access pattern and highly parallel computational requirements suggest matrixmatrix multiplication as an obvious candidate
Challenges and advances in parallel sparse matrixmatrix multiplication
 In The 37th International Conference on Parallel Processing (ICPP’08
, 2008
"... We identify the challenges that are special to parallel sparse matrixmatrix multiplication (PSpGEMM). We show that sparse algorithms are not as scalable as their dense counterparts, because in general, there are not enough nontrivial arithmetic operations to hide the communication costs as well as ..."
Abstract

Cited by 24 (6 self)
 Add to MetaCart
We identify the challenges that are special to parallel sparse matrixmatrix multiplication (PSpGEMM). We show that sparse algorithms are not as scalable as their dense counterparts, because in general, there are not enough nontrivial arithmetic operations to hide the communication costs as well
Highly Parallel Sparse MatrixMatrix Multiplication
, 2010
"... Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based
Hierarchical MatrixMatrix Multiplication based on Multiprocessor Tasks
 In Proc. of the International Conference on Computational Science – ICCS 2004, LNCS
, 2004
"... Abstract. We consider the realization of matrixmatrix multiplication and propose a hierarchical algorithm implemented in a taskparallel way using multiprocessor tasks on distributed memory. The algorithm has been designed to minimize the communication overhead while showing large locality of memor ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
Abstract. We consider the realization of matrixmatrix multiplication and propose a hierarchical algorithm implemented in a taskparallel way using multiprocessor tasks on distributed memory. The algorithm has been designed to minimize the communication overhead while showing large locality
GENERATING VECTOR CODE FOR MATRIXMATRIX MULTIPLICATION
"... The current state of the art MatrixMatrixMultiplication (MMM) kernel is known as ATLAS, which generates the best performing MMM code by search. However, today’s computer architecture changes rapidly and it is hard to generate a high performance code without knowing how to use the new instruction s ..."
Abstract
 Add to MetaCart
The current state of the art MatrixMatrixMultiplication (MMM) kernel is known as ATLAS, which generates the best performing MMM code by search. However, today’s computer architecture changes rapidly and it is hard to generate a high performance code without knowing how to use the new instruction
OPTIMIZING MATRIXMATRIX MULTIPLICATION FOR AN EMBEDDED VLIW PROCESSOR
"... The optimization of matrixmatrix multiplication (MMM) performance has been well studied on conventional generalpurpose processors like the Intel Pentium 4. Fast algorithms, such as those in the Goto and ATLAS BLAS libraries, exploit common microarchitectural features including superscalar executio ..."
Abstract
 Add to MetaCart
The optimization of matrixmatrix multiplication (MMM) performance has been well studied on conventional generalpurpose processors like the Intel Pentium 4. Fast algorithms, such as those in the Goto and ATLAS BLAS libraries, exploit common microarchitectural features including superscalar
Variants of matrixmatrix multiplication for Fortran90
"... The Fortran90 standard requires an intrinsic function matmul which multiplies two matrices together to produce a third as the result. However, the standard does not specify which algorithm to use. We consider an extension to the matmul syntax which allows a Winograd variant of Strassen's algo ..."
Abstract
 Add to MetaCart
's algorithm to be added. We discuss an implementation that is in a commercial Fortran90 offering. Key words. BLAS, matrix multiplication, Winograd's variant of Strassen's algorithm, multilevel algorithms AMS(MOS) subject classification. Numerical Analysis: Numerical Linear Algebra 1
General MatrixMatrix Multiplication Using SIMD Features of the PIII
 In European Conference on Parallel Processing
, 2000
"... Generalised matrixmatrix multiplication forms the kernel of many mathematical algorithms. A faster matrixmatrix multiply immediately benets these algorithms. In this paper we implement ecient matrix multiplication for large matrices using the oating point Intel SIMD (Single Instruction Multipl ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Generalised matrixmatrix multiplication forms the kernel of many mathematical algorithms. A faster matrixmatrix multiply immediately benets these algorithms. In this paper we implement ecient matrix multiplication for large matrices using the oating point Intel SIMD (Single Instruction
EXPLOITING MULTIPLE LEVELS OF PARALLELISM IN SPARSE MATRIXMATRIX MULTIPLICATION
"... Abstract. Sparse matrixmatrix multiplication (or SpGEMM) is a key primitive for many highperformance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2. ..."
Abstract
 Add to MetaCart
Abstract. Sparse matrixmatrix multiplication (or SpGEMM) is a key primitive for many highperformance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2
Results 1  10
of
128,273