Results 1  10
of
162,786
Highly Parallel Sparse MatrixMatrix Multiplication
, 2010
"... Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based
An Efficient GPU General Sparse MatrixMatrix Multiplication for Irregular Data
"... Abstract—General sparse matrixmatrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method, breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM algorithm has to handle ext ..."
Abstract
 Add to MetaCart
Abstract—General sparse matrixmatrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method, breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM algorithm has to handle
Parallel Numerical Linear Algebra
, 1993
"... We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We illust ..."
Abstract

Cited by 766 (23 self)
 Add to MetaCart
We survey general techniques and open problems in numerical linear algebra on parallel architectures. We first discuss basic principles of parallel processing, describing the costs of basic operations on parallel machines, including general principles for constructing efficient algorithms. We
LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares
 ACM Trans. Math. Software
, 1982
"... An iterative method is given for solving Ax ~ffi b and minU Ax b 112, where the matrix A is large and sparse. The method is based on the bidiagonalization procedure of Golub and Kahan. It is analytically equivalent to the standard method of conjugate gradients, but possesses more favorable numerica ..."
Abstract

Cited by 649 (21 self)
 Add to MetaCart
An iterative method is given for solving Ax ~ffi b and minU Ax b 112, where the matrix A is large and sparse. The method is based on the bidiagonalization procedure of Golub and Kahan. It is analytically equivalent to the standard method of conjugate gradients, but possesses more favorable
EXPLOITING MULTIPLE LEVELS OF PARALLELISM IN SPARSE MATRIXMATRIX MULTIPLICATION
"... Abstract. Sparse matrixmatrix multiplication (or SpGEMM) is a key primitive for many highperformance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2. ..."
Abstract
 Add to MetaCart
Abstract. Sparse matrixmatrix multiplication (or SpGEMM) is a key primitive for many highperformance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2
Dryad: Distributed DataParallel Programs from Sequential Building Blocks
 In EuroSys
, 2007
"... Dryad is a generalpurpose distributed execution engine for coarsegrain dataparallel applications. A Dryad application combines computational “vertices ” with communication “channels ” to form a dataflow graph. Dryad runs the application by executing the vertices of this graph on a set of availa ..."
Abstract

Cited by 730 (27 self)
 Add to MetaCart
simultaneously on multiple computers, or on multiple CPU cores within a computer. The application can discover the size and placement of data at run time, and modify the graph as the computation progresses to make efficient use of the available resources. Dryad is designed to scale from powerful multicore sin
Optimization of Sparse Matrixvector Multiplication on Emerging Multicore Platforms
 In Proc. SC2007: High performance computing, networking, and storage conference
, 2007
"... We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore spec ..."
Abstract

Cited by 153 (22 self)
 Add to MetaCart
specific optimization methodologies for important scientific computations. In this work, we examine sparse matrixvector multiply (SpMV) – one of the most heavily used kernels in scientific computing – across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD
KSVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation
, 2006
"... In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signalatoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and inc ..."
Abstract

Cited by 930 (41 self)
 Add to MetaCart
In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signalatoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many
Understanding the Efficiency of GPU Algorithms for MatrixMatrix Multiplication
, 2004
"... Utilizing graphics hardware for general purpose numerical computations has become a topic of considerable interest. The implementation of streaming algorithms, typified by highly parallel computations with little reuse of input data, has been widely explored on GPUs. We relax the streaming model&a ..."
Abstract

Cited by 95 (1 self)
 Add to MetaCart
's constraint on input reuse and perform an indepth analysis of dense matrixmatrix multiplication, which reuses each element of input matrices O(n) times. Its regular data access pattern and highly parallel computational requirements suggest matrixmatrix multiplication as an obvious candidate
Results 1  10
of
162,786