• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A two-dimensional data distribution method for parallel sparse matrix-vector multiplication (2005)

by Brendan Vastenhouw, Rob H Bisseling
Venue:SIAM Review
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 33
Next 10 →

Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms

by Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, James Demmel - In Proc. SC2007: High performance computing, networking, and storage conference , 2007
"... We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore spec ..."
Abstract - Cited by 54 (15 self) - Add to MetaCart
We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) – one of the most heavily used kernels in scientific computing – across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientific study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms. 1.

New Challenges in Dynamic Load Balancing

by Karen D. Devine, Erik G. Boman, Robert T. Heaphy, Bruce A. Hendrickson, James D. Teresco, Jamal Faik, Joseph E. Flaherty, Luis G. Gervasio - APPL. NUMER. MATH , 2004
"... Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the load-balancing problem is not yet solved completely; new applications and architectures requi ..."
Abstract - Cited by 15 (4 self) - Add to MetaCart
Data partitioning and load balancing are important components of parallel computations. Many different partitioning strategies have been developed, with great effectiveness in parallel applications. But the load-balancing problem is not yet solved completely; new applications and architectures require new partitioning features. Existing algorithms must be enhanced to support more complex applications. New models are needed for non-square, non-symmetric, and highly connected systems arising from applications in biology, circuits, and materials simulations. Increased use of heterogeneous computing architectures requires partitioners that account for non-uniform computing, network, and memory resources. And, for greatest impact, these new capabilities must be delivered in toolkits that are robust, easy-to-use, and applicable to a wide range of applications. In this paper, we discuss our approaches to addressing these issues within the Zoltan Parallel Data Services toolkit.

Revisiting hypergraph models for sparse matrix partitioning

by Bora Uçar, Cevdet Aykanat - J.), and , 2006
"... Abstract. We provide an exposition of hypergraph models for parallelizing sparse matrix-vector multiplies. Our aim is to emphasize the expressive power of hypergraph models. First, we set forth an elementary hypergraph model for parallel matrix-vector multiply based on one-dimensional (1D) matrix pa ..."
Abstract - Cited by 10 (7 self) - Add to MetaCart
Abstract. We provide an exposition of hypergraph models for parallelizing sparse matrix-vector multiplies. Our aim is to emphasize the expressive power of hypergraph models. First, we set forth an elementary hypergraph model for parallel matrix-vector multiply based on one-dimensional (1D) matrix partitioning. In the elementary model, the vertices represent the data of a matrix-vector multiply, and the nets encode dependencies among the data. We then apply a recently proposed hypergraph transformation operation to devise models for 1D sparse matrix partitioning. The resulting 1D partitioning models are equivalent to the previously proposed computational hypergraph models and are not meant to be replacements for them. Nevertheless, the new models give us insights into the previous ones and help us explain a subtle requirement, known as the consistency condition, of the hypergraph partitioning models. Later, we demonstrate the flexibility of the elementary model on a few 1D partitioning problems that are hard to solve using the previously proposed models. We also discuss extensions of the proposed elementary model to two-dimensional matrix partitioning. Key words. parallel computing, sparse matrix-vector multiply, hypergraph models AMS subject classifications. 05C50, 05C65, 65F10, 65F50, 65Y05 1. Introduction. Hypergraph-partitioning-based models for parallel sparse matrix-vector

Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices

by Cevdet Aykanat , B. Barla Cambazoglu , Bora Uçar , 2008
"... ..."
Abstract - Cited by 7 (3 self) - Add to MetaCart
Abstract not found

Partitioning sparse matrices for parallel preconditioned iterative methods

by Bora Uçar, Cevdet Aykanat - SIAM Journal on Scientific Computing , 2004
"... Abstract. This paper addresses the parallelization of the preconditioned iterative methods that use explicit preconditioners such as approximate inverses. Parallelizing a full step of these methods requires the coefficient and preconditioner matrices to be well partitioned. We first show that differ ..."
Abstract - Cited by 5 (4 self) - Add to MetaCart
Abstract. This paper addresses the parallelization of the preconditioned iterative methods that use explicit preconditioners such as approximate inverses. Parallelizing a full step of these methods requires the coefficient and preconditioner matrices to be well partitioned. We first show that different methods impose different partitioning requirements for the matrices. Then we develop hypergraph models to meet those requirements. In particular, we develop models that enable us to obtain partitionings on the coefficient and preconditioner matrices simultaneously. Experiments on a set of unsymmetric sparse matrices show that the proposed models yield effective partitioning results. A parallel implementation of the right preconditioned BiCGStab method on a PC cluster verifies that the theoretical gains obtained by the models hold in practice.

Hypergraph partitioning for faster parallel PageRank computation

by Jeremy T. Bradley, Douglas V. Jager, William J. Knottenbelt, Aleksandar Trifunović - Lecture Notes in Computer Science 3670 , 2005
"... Abstract. The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a user-centred model of web-surfing behaviour. As the web has e ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
Abstract. The PageRank algorithm is used by search engines such as Google to order web pages. It uses an iterative numerical method to compute the maximal eigenvector of a transition matrix derived from the web’s hyperlink structure and a user-centred model of web-surfing behaviour. As the web has expanded and as demand for user-tailored web page ordering metrics has grown, scalable parallel computation of PageRank has become a focus of considerable research effort. In this paper, we seek a scalable problem decomposition for parallel Page-Rank computation, through the use of state-of-the-art hypergraph-based partitioning schemes. These have not been previously applied in this context. We consider both one and two-dimensional hypergraph decomposition models. Exploiting the recent availability of the Parkway 2.1 parallel hypergraph partitioner, we present empirical results on a gigabit PC cluster for three publicly available web graphs. Our results show that hypergraph-based partitioning substantially reduces communication volume over conventional partitioning schemes (by up to three orders of magnitude), while still maintaining computational load balance. They also show a halving of the per-iteration runtime cost when compared to the most effective alternative approach used to date. 1

HYPERGRAPH-BASED UNSYMMETRIC NESTED DISSECTION ORDERING FOR SPARSE LU FACTORIZATION

by Laura Grigori, Erik G. Boman, Simplice Donfack, Timothy A. Davis
"... Abstract. In this paper we present HUND, a hypergraph-based unsymmetric nested dissection ordering algorithm for reducing the fill-in incurred during Gaussian elimination. HUND has several important properties. It takes a global perspective of the entire matrix, as opposed to local heuristics. It ta ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
Abstract. In this paper we present HUND, a hypergraph-based unsymmetric nested dissection ordering algorithm for reducing the fill-in incurred during Gaussian elimination. HUND has several important properties. It takes a global perspective of the entire matrix, as opposed to local heuristics. It takes into account the assymetry of the input matrix by using a hypergraph to represent its structure. It is suitable for performing Gaussian elimination in parallel, with partial pivoting. This is possible because the row permutations performed due to partial pivoting do not destroy the column separators identified by the nested dissection approach. Experimental results on 27 medium and large size highly unsymmetric matrices compare HUND to four other well-known reordering algorithms. The results show that HUND provides a robust reordering algorithm, in the sense that it is the best or close to the best (often within 10%) of all the other methods.

Cache-oblivious sparse matrix-vector multiplication by using sparse matrix partitioning methods

by A. N. Yzelman, Rob, H. Bisseling - SIAM Journal on Scientific Computing , 2009
"... Abstract. In this article, we introduce a cache-oblivious method for sparse matrix–vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraph-based sparse matrix partitioning scheme so that the resulting matrix induces cache-friendly b ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
Abstract. In this article, we introduce a cache-oblivious method for sparse matrix–vector multiplication. Our method attempts to permute the rows and columns of the input matrix using a recursive hypergraph-based sparse matrix partitioning scheme so that the resulting matrix induces cache-friendly behavior during sparse matrix–vector multiplication. Matrices are assumed to be stored in row-major format, by means of the compressed row storage (CRS) or its variants incremental CRS and zig-zag CRS. The zig-zag CRS data structure is shown to fit well with the hypergraph metric used in partitioning sparse matrices for the purpose of parallel computation. The separated block-diagonal (SBD) form is shown to be the appropriate matrix structure for cache enhancement. We have implemented a run-time cache simulation library enabling us to analyze cache behavior for arbitrary matrices and arbitrary cache properties during matrix–vector multiplication within a k-way set-associative idealized cache model. The results of these simulations are then verified by actual experiments run on various cache architectures. In all these experiments, we use the Mondriaan sparse matrix partitioner in one-dimensional mode. The savings in computation time achieved by our matrix reorderings reach up to 50 percent, in the case of a large link matrix.

PaToH: partitioning tool for hypergraphs

by Ümit V. Çatalyürek, Cevdet Aykanat , 1999
"... ..."
Abstract - Cited by 4 (3 self) - Add to MetaCart
Abstract not found

A Parallel Matrix Scaling Algorithm ⋆

by Patrick R. Amestoy, Iain S. Duff, Daniel Ruiz, Bora Uçar, Atlas Centre, Ox Qx
"... Abstract. We recently proposed an iterative procedure which asymptotically scales the rows and columns of a given matrix to one in a given norm. In this work, we briefly mention some of the properties of that algorithm and discuss its efficient parallelization. We report on a parallel performance st ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Abstract. We recently proposed an iterative procedure which asymptotically scales the rows and columns of a given matrix to one in a given norm. In this work, we briefly mention some of the properties of that algorithm and discuss its efficient parallelization. We report on a parallel performance study of our implementation on a few computing environments. Key words: sparse matrices; matrix scaling; equilibration; parallel computing 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University