Results 1  10
of
13
Highly scalable parallel algorithms for sparse matrix factorization
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract

Cited by 117 (29 self)
 Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two and threedimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for mediumsize structural engineering and linear programming problems. To the best of our knowledge,
On rectangular partitionings in two dimensions: Algorithms, complexity, and applications
 7th International Conference on Database Theory
, 1999
"... ..."
On Approximating Rectangle Tiling and Packing
 Proc Symp. on Discrete Algorithms (SODA
"... Our study of tiling and packing with rectangles in twodimensional regions is strongly motivated by applications in database mining, histogrambased estimation of query sizes, data partitioning, and motion estimation in video compression by block matching, among others. An example of the problems tha ..."
Abstract

Cited by 38 (6 self)
 Add to MetaCart
Our study of tiling and packing with rectangles in twodimensional regions is strongly motivated by applications in database mining, histogrambased estimation of query sizes, data partitioning, and motion estimation in video compression by block matching, among others. An example of the problems that we tackle is the following: given an n \Theta n array A of positive numbers, find a tiling using at most p rectangles (that is, no two rectangles must overlap, and each array element must fall within some rectangle) that minimizes the maximum weight of any rectangle; here the weight of a rectangle is the sum of the array elements that fall within it. If the array A were onedimensional, this problem could be easily solved by dynamic programming. We prove that in the twodimensional case it is NPhard to approximate this problem to within a factor of 1:25. On the other hand, we provide a nearlinear time algorithm that returns a solution at most 2:5 times the optimal. Other rectangle tiling...
Efficient Array Partitioning
, 1997
"... We consider the problem of partitioning an array of n items into p intervals so that the maximum weight of the intervals is minimized. The currently best known bound for this problem is O(np) [MS95]. In this paper, we present two improved algorithms for this problem: one runs in time O(n + p²(log ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
We consider the problem of partitioning an array of n items into p intervals so that the maximum weight of the intervals is minimized. The currently best known bound for this problem is O(np) [MS95]. In this paper, we present two improved algorithms for this problem: one runs in time O(n + p²(log n)²) and the other runs in time O(n log n). The former is optimal whenever p p n= log n, and the latter is nearoptimal for arbitrary p. We consider the natural generalization of this partitioning to two dimensions, where an n \Theta n array of items is to be partitioned into p² blocks by partitioning the rows and columns into p intervals each and considering the blocks induced by this partition. The problem is to find that partition which minimizes the maximum weight among the resulting blocks. This problem is known to be NPhard [GM96]. Independently, Charikar et. al. have given a simple proof that shows that the problem is in fact NPhard to approximate within a factor of t...
Partitioning an Array onto a Mesh of Processors
 In Proc. of the Workshop on Applied Parallel Computing in Industrial Problems
, 1996
"... . Achieving an even load balance with a low communication overhead is a fundamental task in parallel computing. In this paper we consider the problem of partitioning an array into a number of blocks such that the maximum amount of work in any block is as low as possible. We review different proposed ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
. Achieving an even load balance with a low communication overhead is a fundamental task in parallel computing. In this paper we consider the problem of partitioning an array into a number of blocks such that the maximum amount of work in any block is as low as possible. We review different proposed schemes for this problem and the complexity of their communication pattern. We present new approximation algorithms for computing a well balanced generalized block distribution as well as an algorithm for computing an optimal semigeneralized block distribution. The various algorithms are tested and compared on a number of different matrices. 1 Introduction A basic task in parallel computing is the partitioning and subsequent distribution of data to processors. The problem one faces in this operation is how to balance two often contradictory aims; finding an equal distribution of the computational work and at the same time minimizing the imposed communication. In the data parallel model th...
On the Complexity of the Generalized Block Distribution
, 1996
"... We consider the problem of mapping an array onto a mesh of processors in such a way that locality is preserved. When the computational work associated with the array is distributed in an unstructured way the generalized block distribution has been recognized as an efficient way of achieving an even ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We consider the problem of mapping an array onto a mesh of processors in such a way that locality is preserved. When the computational work associated with the array is distributed in an unstructured way the generalized block distribution has been recognized as an efficient way of achieving an even load balance while at the same time imposing a simple communication pattern. In this paper we consider the problem of computing an optimal generalized block distribution. We show that this problem is NPcomplete even for very simple cost functions. We also classify a number of variants of the general problem.
Approximations for the General Block Distribution of a Matrix
, 1997
"... The general block distribution of a matrix is a rectilinear partition of the matrix into orthogonal blocks such that the maximum sum of the elements within a single block is minimized. This corresponds to partitioning the matrix onto parallel processors so as to minimize processor load while maintai ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
The general block distribution of a matrix is a rectilinear partition of the matrix into orthogonal blocks such that the maximum sum of the elements within a single block is minimized. This corresponds to partitioning the matrix onto parallel processors so as to minimize processor load while maintaining regular communication patterns. Applications of the problem include various parallel sparse matrix computations, compilers for highperformance languages, particle in cell computations, video and image compression, and simulations associated with a communication network. We analyze the performance guarantee of a natural and practical heuristic based on iterative refinement, which has previously been shown to give good empirical results. When p 2 is the number of blocks, we show that the tight performance ratio is `( p p). When the matrix has rows of large cost, the details of the objective function of the algorithm are shown to be important, since a naive implementation can lead to...
Reading, Mass.: AddisonWesley Publishing Company, 1985. Efficient Partitioning of Sequences
"... checkers for optimal tunidirectional error detecting codes,” ..."
unknown title
"... www.elsevier.com/locate/tcs Approximations for the general block distribution of a matrix � ..."
Abstract
 Add to MetaCart
www.elsevier.com/locate/tcs Approximations for the general block distribution of a matrix �
Relations Between Two Common Types of Rectangular Tilings
"... Abstract. Partitioning a multidimensional data set (array) into rectangular regions subject to some constraints (error measures) is an important problem arising from applications in parallel computing, databases, VLSI design, and so on. In this paper, we consider two most common types of partitioni ..."
Abstract
 Add to MetaCart
Abstract. Partitioning a multidimensional data set (array) into rectangular regions subject to some constraints (error measures) is an important problem arising from applications in parallel computing, databases, VLSI design, and so on. In this paper, we consider two most common types of partitioning used in practice: the Arbitrary partitioning and (p × p) partitioning, and study their relationships under three widely used error metrics: MaxSum, SumSVar, and SumSLift. 1