Results 1  10
of
39
A column approximate minimum degree ordering algorithm
, 2000
"... Sparse Gaussian elimination with partial pivoting computes the factorization PAQ = LU of a sparse matrix A, where the row ordering P is selected during factorization using standard partial pivoting with row interchanges. The goal is to select a column preordering, Q, based solely on the nonzero patt ..."
Abstract

Cited by 307 (53 self)
 Add to MetaCart
Sparse Gaussian elimination with partial pivoting computes the factorization PAQ = LU of a sparse matrix A, where the row ordering P is selected during factorization using standard partial pivoting with row interchanges. The goal is to select a column preordering, Q, based solely on the nonzero pattern of A such that the factorization remains as sparse as possible, regardless of the subsequent choice of P. The choice of Q can have a dramatic impact on the number of nonzeros in L and U. One scheme for determining a good column ordering for A is to compute a symmetric ordering that reduces fillin in the Cholesky factorization of ATA. This approach, which requires the sparsity structure of ATA to be computed, can be expensive both in
Fast Sparse Matrix Multiplication
, 2004
"... Let A and B two n n matrices over a ring R (e.g., the reals or the integers) each containing at most m nonzero elements. We present a new algorithm that multiplies A and B using O(m ) algebraic operations (i.e., multiplications, additions and subtractions) over R. The naive matrix multi ..."
Abstract

Cited by 52 (3 self)
 Add to MetaCart
Let A and B two n n matrices over a ring R (e.g., the reals or the integers) each containing at most m nonzero elements. We present a new algorithm that multiplies A and B using O(m ) algebraic operations (i.e., multiplications, additions and subtractions) over R. The naive matrix multiplication algorithm, on the other hand, may need to perform #(mn) operations to accomplish the same task. For , the new algorithm performs an almost optimal number of only n operations. For m the new algorithm is also faster than the best known matrix multiplication algorithm for dense matrices which uses O(n ) algebraic operations. The new algorithm is obtained using a surprisingly straightforward combination of a simple combinatorial idea and existing fast rectangular matrix multiplication algorithms. We also obtain improved algorithms for the multiplication of more than two sparse matrices.
A Revised Proposal for a Sparse BLAS Toolkit
, 1996
"... This paper describes a proposal for a "toolkit" of kernel routines for some of the basic operations in (iterative) sparse numerical methods. In particular, we describe an interface for routines which perform (i) sparse matrix times dense matrix product, (ii) the solution of a sparse triang ..."
Abstract

Cited by 20 (8 self)
 Add to MetaCart
This paper describes a proposal for a "toolkit" of kernel routines for some of the basic operations in (iterative) sparse numerical methods. In particular, we describe an interface for routines which perform (i) sparse matrix times dense matrix product, (ii) the solution of a sparse triangular system with multiple righthandsides, (iii) the right permutation of a sparse matrix and (iv) a check for the integrity of a sparse matrix representation. The interfaces for these four operations are defined for a variety of common data structures and a set of guidelines is given to define interfaces for new data structures. The primary purpose of this toolkit is to provide a set of basic routines upon which the "User Level Sparse BLAS," as described in [9], can be built. This paper is a revision of the original proposal found in [14].
On the Representation and Multiplication of Hypersparse Matrices
, 2008
"... Multicore processors are marking the beginning of a new era of computing where massive parallelism is available and necessary. Slightly slower but easy to parallelize kernels are becoming more valuable than sequentially faster kernels that are unscalable when parallelized. In this paper, we focus on ..."
Abstract

Cited by 19 (9 self)
 Add to MetaCart
Multicore processors are marking the beginning of a new era of computing where massive parallelism is available and necessary. Slightly slower but easy to parallelize kernels are becoming more valuable than sequentially faster kernels that are unscalable when parallelized. In this paper, we focus on the multiplication of sparse matrices (SpGEMM). We first present the issues with existing sparse matrix representations and multiplication algorithms that make them unscalable to thousands of processors. Then, we develop and analyze two new algorithms that overcome these limitations. We consider our algorithms first as the sequential kernel of a scalable parallel sparse matrix multiplication algorithm and second as part of a polyalgorithm for SpGEMM that would execute different kernels depending on the sparsity of the input matrices. Such a sequential kernel requires a new data structure that exploits the hypersparsity of the individual submatrices owned by a single processor after the 2D partitioning. We experimentally evaluate the performance and characteristics of our algorithms and show that they scale significantly better than existing kernels.
Challenges and advances in parallel sparse matrixmatrix multiplication
 In The 37th International Conference on Parallel Processing (ICPP’08
, 2008
"... We identify the challenges that are special to parallel sparse matrixmatrix multiplication (PSpGEMM). We show that sparse algorithms are not as scalable as their dense counterparts, because in general, there are not enough nontrivial arithmetic operations to hide the communication costs as well as ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
We identify the challenges that are special to parallel sparse matrixmatrix multiplication (PSpGEMM). We show that sparse algorithms are not as scalable as their dense counterparts, because in general, there are not enough nontrivial arithmetic operations to hide the communication costs as well as the sparsity overheads. We analyze the scalability of 1D and 2D algorithms for PSpGEMM. While the 1D algorithm is a variant of existing implementations, 2D algorithms presented are completely novel. Most of these algorithms are based on the previous research on parallel dense matrix multiplication. We also provide results from preliminary experiments with 2D algorithms. 1
A Proposal for a Sparse BLAS Toolkit
 In preparation
, 1992
"... This paper describes a proposal for a "toolkit" of kernel routines for some of the basic operations in (iterative) sparse numerical methods. In particular, we describe an interface for routines which perform (i) sparse matrix times dense matrix product, (ii) the solution of a sparse triang ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
This paper describes a proposal for a "toolkit" of kernel routines for some of the basic operations in (iterative) sparse numerical methods. In particular, we describe an interface for routines which perform (i) sparse matrix times dense matrix product, (ii) the solution of a sparse triangular system with multiple righthandsides, (iii) the right permutation of a sparse matrix and (iv) a check for the integrity of a sparse matrix representation. The interfaces for these four operations are defined for a variety of common data structures and a set of guidelines is given to define interfaces for new data structures. The primary purpose of this toolkit is to provide a set of basic routines upon which the "User Level Sparse BLAS," as described in [6], can be built. Keywords Sparse matrices, sparse data structures, programming standards, sparse BLAS. 1 Introduction Standard interfaces for numerical linear algebra software have been shown to be very useful if the interface is simple yet ...
Highly Parallel Sparse MatrixMatrix Multiplication
, 2010
"... Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on twodimensional block distribution of sparse matrices where serial sections use a novel hypersparse kernel for scalability. We give a stateoftheart MPI implementation of one of our algorithms. Our experiments show scaling up to thousands of processors on a variety of test scenarios.
MultiRobot Adversarial Patrolling: Facing a FullKnowledge Opponent
"... The problem of adversarial multirobot patrol has gained interest in recent years, mainly due to its immediate relevance to various security applications. In this problem, robots are required to repeatedly visit a target area in a way that maximizes their chances of detecting an adversary trying to ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
(Show Context)
The problem of adversarial multirobot patrol has gained interest in recent years, mainly due to its immediate relevance to various security applications. In this problem, robots are required to repeatedly visit a target area in a way that maximizes their chances of detecting an adversary trying to penetrate through the patrol path. When facing a strong adversary that knows the patrol strategy of the robots, if the robots use a deterministic patrol algorithm, then in many cases it is easy for the adversary to penetrate undetected (in fact, in some of those cases the adversary can guarantee penetration). Therefore this paper presents a nondeterministic patrol framework for the robots. Assuming that the strong adversary will take advantage of its knowledge and try to penetrate through the patrol’s weakest spot, hence an optimal algorithm is one that maximizes the chances of detection in that point. We therefore present a polynomialtime algorithm for determining an optimal patrol under the Markovian strategy assumption for the robots, such that the probability of detecting the adversary in the patrol’s weakest spot is maximized. We build upon this framework and describe an optimal patrol strategy for several robotic models based on their movement abilities (directed or undirected) and sensing abilities (perfect or imperfect), and in different environment models either patrol around a perimeter (closed polygon) or an open fence (open polyline).
On Automatic Data Structure Selection and Code Generation for Sparse Computations
 Lecture Notes in Computer Science
, 1993
"... Traditionally restructuring compilers were only able to apply program transformations in order to exploit certain characteristics of the target architecture. Adaptation of data structures was limited to e.g. linearization or transposing of arrays. However, as more complex data structures are require ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Traditionally restructuring compilers were only able to apply program transformations in order to exploit certain characteristics of the target architecture. Adaptation of data structures was limited to e.g. linearization or transposing of arrays. However, as more complex data structures are required to exploit characteristics of the data operated on, current compiler support appears to be inappropriate. In this paper we present the implementation issues of a restructuring compiler that automatically converts programs operating on dense matrices into sparse code, i.e. after a suited data structure has been selected for every dense matrix that in fact is sparse, the original code is adapted to operate on these data structures. This simplifies the task of the programmer and, in general, enables the compiler to apply more optimizations. Index Terms: Restructuring Compilers, Sparse Computations, Sparse Matrices. 1 Introduction Development and maintenance of sparse codes is a complex tas...
Exposing finegrained parallelism in algebraic multigrid methods
, 2011
"... Abstract. Algebraic multigrid methods for large, sparse linear systems are a necessity in many computational simulations, yet parallel algorithms for such solvers are generally decomposed into coarsegrained tasks suitable for distributed computers with traditional processing cores. However, acceler ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Algebraic multigrid methods for large, sparse linear systems are a necessity in many computational simulations, yet parallel algorithms for such solvers are generally decomposed into coarsegrained tasks suitable for distributed computers with traditional processing cores. However, accelerating multigrid on massively parallel throughputoriented processors, such as the GPU, demands algorithms with abundant finegrained parallelism. In this paper, we develop a parallel algebraic multigrid method which exposes substantial finegrained parallelism in both the construction of the multigrid hierarchy as well as the cycling or solve stage. Our algorithms are expressed in terms of scalable parallel primitives that are efficiently implemented on the GPU. The resulting solver achieves an average speedup of 1.8 × in the setup phase and 5.7 × in the cycling phase when compared to a representative CPU implementation.