Results 1  10
of
62
Optimization of Sparse Matrixvector Multiplication on Emerging Multicore Platforms
 In Proc. SC2007: High performance computing, networking, and storage conference
, 2007
"... We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore spec ..."
Abstract

Cited by 89 (21 self)
 Add to MetaCart
We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrixvector multiply (SpMV) – one of the most heavily used kernels in scientific computing – across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dualcore and Intel quadcore designs, the heterogeneous STI Cell, as well as the first scientific study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing stateoftheart serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memorybound numerical algorithms. 1.
Autotuning Performance on Multicore Computers
, 2008
"... personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires pri ..."
Abstract

Cited by 24 (8 self)
 Add to MetaCart
personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific
On twodimensional sparse matrix partitioning: Models, methods, and a recipe
 SIAM J. Sci. Comput
, 2010
"... Abstract. We consider twodimensional partitioning of general sparse matrices for parallel sparse matrixvector multiply operation. We present three hypergraphpartitioningbased methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces f ..."
Abstract

Cited by 21 (15 self)
 Add to MetaCart
Abstract. We consider twodimensional partitioning of general sparse matrices for parallel sparse matrixvector multiply operation. We present three hypergraphpartitioningbased methods, each having unique advantages. The first one treats the nonzeros of the matrix individually and hence produces finegrain partitions. The other two produce coarser partitions, where one of them imposes a limit on the number of messages sent and received by a single processor, and the other trades that limit for a lower communication volume. We also present a thorough experimental evaluation of the proposed twodimensional partitioning methods together with the hypergraphbased onedimensional partitioning methods, using an extensive set of public domain matrices. Furthermore, for the users of these partitioning methods, we present a partitioning recipe that chooses one of the partitioning methods according to some matrix characteristics.
Revisiting hypergraph models for sparse matrix partitioning
 SIAM Review
, 2007
"... Abstract. We provide an exposition of hypergraph models for parallelizing sparse matrixvector multiplies. Our aim is to emphasize the expressive power of hypergraph models. First, we set forth an elementary hypergraph model for the parallel matrixvector multiply based on onedimensional (1D) matri ..."
Abstract

Cited by 20 (13 self)
 Add to MetaCart
Abstract. We provide an exposition of hypergraph models for parallelizing sparse matrixvector multiplies. Our aim is to emphasize the expressive power of hypergraph models. First, we set forth an elementary hypergraph model for the parallel matrixvector multiply based on onedimensional (1D) matrix partitioning. In the elementary model, the vertices represent the data of a matrixvector multiply, and the nets encode dependencies among the data. We then apply a recently proposed hypergraph transformation operation to devise models for 1Dsparse matrix partitioning. The resulting 1Dpartitioning models are equivalent to the previously proposed computational hypergraph models and are not meant to be replacements for them. Nevertheless, the new models give us insights into the previous ones and help us explain a subtle requirement, known as the consistency condition, of hypergraph partitioning models. Later, we demonstrate the flexibility of the elementary model on a few 1Dpartitioning problems that are hard to solve using the previously proposed models. We also discuss extensions of the proposed elementary model to twodimensional matrix partitioning. Key words. parallel computing, sparse matrixvector multiply, hypergraph models
New Challenges in Dynamic Load Balancing
 APPL. NUMER. MATH
, 2004
"... Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the loadbalancing problem is not yet solved completely; new applications and architectures requi ..."
Abstract

Cited by 19 (4 self)
 Add to MetaCart
Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the loadbalancing problem is not yet solved completely; new applications and architectures require new partitioning features. Existing algorithms must be enhanced to support more complex applications. New models are needed for nonsquare, nonsymmetric, and highly connected systems arising from applications in biology, circuits, and materials simulations. Increased use of heterogeneous computing architectures requires partitioners that account for nonuniform computing, network, and memory resources. And, for greatest impact, these new capabilities must be delivered in toolkits that are robust, easytouse, and applicable to a wide range of applications. In this paper, we discuss our approaches to addressing these issues within the Zoltan Parallel Data Services toolkit.
Multilevel direct Kway hypergraph partitioning with multiple constraints and fixed vertices
, 2008
"... ..."
Partitioning sparse matrices for parallel preconditioned iterative methods
 SIAM Journal on Scientific Computing
, 2004
"... Abstract. This paper addresses the parallelization of the preconditioned iterative methods that use explicit preconditioners such as approximate inverses. Parallelizing a full step of these methods requires the coefficient and preconditioner matrices to be well partitioned. We first show that differ ..."
Abstract

Cited by 15 (9 self)
 Add to MetaCart
Abstract. This paper addresses the parallelization of the preconditioned iterative methods that use explicit preconditioners such as approximate inverses. Parallelizing a full step of these methods requires the coefficient and preconditioner matrices to be well partitioned. We first show that different methods impose different partitioning requirements for the matrices. Then we develop hypergraph models to meet those requirements. In particular, we develop models that enable us to obtain partitionings on the coefficient and preconditioner matrices simultaneously. Experiments on a set of unsymmetric sparse matrices show that the proposed models yield effective partitioning results. A parallel implementation of the right preconditioned BiCGStab method on a PC cluster verifies that the theoretical gains obtained by the models hold in practice.
Cacheoblivious sparse matrixvector multiplication by using sparse matrix partitioning methods
 SIAM Journal on Scientific Computing
, 2009
"... Abstract. In this article, we introduce a cacheoblivious method for sparse matrix–vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraphbased sparse matrix partitioning scheme so that the resulting matrix induces cachefriendly b ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Abstract. In this article, we introduce a cacheoblivious method for sparse matrix–vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraphbased sparse matrix partitioning scheme so that the resulting matrix induces cachefriendly behavior during sparse matrix–vector multiplication. Matrices are assumed to be stored in rowmajor format, by means of the compressed row storage (CRS) or its variants incremental CRS and zigzag CRS. The zigzag CRS data structure is shown to fit well with the hypergraph metric used in partitioning sparse matrices for the purpose of parallel computation. The separated blockdiagonal (SBD) form is shown to be the appropriate matrix structure for cache enhancement. We have implemented a runtime cache simulation library enabling us to analyze cache behavior for arbitrary matrices and arbitrary cache properties during matrix–vector multiplication within a kway setassociative idealized cache model. The results of these simulations are then verified by actual experiments run on various cache architectures. In all these experiments, we use the Mondriaan sparse matrix partitioner in onedimensional mode. The savings in computation time achieved by our matrix reorderings reach up to 50 percent, in the case of a large link matrix.
Parallel sparse matrixvector and matrixtransposevector multiplication using compressed sparse blocks
 IN SPAA
, 2009
"... This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A T x to be computed efficiently in parallel, where A is an n × n sparse matrix with nnz ≥ n nonzeros and x is a dense nvector. Our algorithms use Θ(nnz) work (serial running ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A T x to be computed efficiently in parallel, where A is an n × n sparse matrix with nnz ≥ n nonzeros and x is a dense nvector. Our algorithms use Θ(nnz) work (serial running time) and Θ ( √ nlgn) span (criticalpath length), yielding a parallelism of Θ(nnz / √ nlgn), which is amply high for virtually any large matrix. The storage requirement for CSB is esssentially the same as that for the morestandard compressedsparserows (CSR) format, for which computing Ax in parallel is easy but A T x is difficult. Benchmark results indicate that on one processor, the CSB algorithms for Ax and A T x run just as fast as the CSR algorithm for Ax, but the CSB algorithms also scale up linearly with processors until limited by offchip memory bandwidth.