Results 1  10
of
13
Highly scalable parallel algorithms for sparse matrix factorization
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract

Cited by 116 (29 self)
 Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two and threedimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for mediumsize structural engineering and linear programming problems. To the best of our knowledge,
On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications
 In Proceedings of the 7th International Conference on Database Theory
, 1999
"... . Partitioning a multidimensional data set into rectangular partitions subject to certain constraints is an important problem that arises in many database applications, including histogrambased selectivity estimation, loadbalancing, and construction of index structures. While provably optimal ..."
Abstract

Cited by 44 (7 self)
 Add to MetaCart
. Partitioning a multidimensional data set into rectangular partitions subject to certain constraints is an important problem that arises in many database applications, including histogrambased selectivity estimation, loadbalancing, and construction of index structures. While provably optimal and efficient algorithms exist for partitioning onedimensional data, the multidimensional problem has received less attention, except for a few special cases. As a result, the heuristic partitioning techniques that are used in practice are not well understood, and come with no guarantees on the quality of the solution. In this paper, we present algorithmic and complexitytheoretic results for the fundamental problem of partitioning a twodimensional array into rectangular tiles of arbitrary size in a way that minimizes the number of tiles required to satisfy a given constraint. Our main results are approximation algorithms for several partitioning problems that provably approxima...
On Approximating Rectangle Tiling and Packing
 Proc Symp. on Discrete Algorithms (SODA
"... Our study of tiling and packing with rectangles in twodimensional regions is strongly motivated by applications in database mining, histogrambased estimation of query sizes, data partitioning, and motion estimation in video compression by block matching, among others. An example of the problems tha ..."
Abstract

Cited by 37 (6 self)
 Add to MetaCart
Our study of tiling and packing with rectangles in twodimensional regions is strongly motivated by applications in database mining, histogrambased estimation of query sizes, data partitioning, and motion estimation in video compression by block matching, among others. An example of the problems that we tackle is the following: given an n \Theta n array A of positive numbers, find a tiling using at most p rectangles (that is, no two rectangles must overlap, and each array element must fall within some rectangle) that minimizes the maximum weight of any rectangle; here the weight of a rectangle is the sum of the array elements that fall within it. If the array A were onedimensional, this problem could be easily solved by dynamic programming. We prove that in the twodimensional case it is NPhard to approximate this problem to within a factor of 1:25. On the other hand, we provide a nearlinear time algorithm that returns a solution at most 2:5 times the optimal. Other rectangle tiling...
Efficient Array Partitioning
, 1997
"... We consider the problem of partitioning an array of n items into p intervals so that the maximum weight of the intervals is minimized. The currently best known bound for this problem is O(np) [MS95]. In this paper, we present two improved algorithms for this problem: one runs in time O(n + p²(log ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
We consider the problem of partitioning an array of n items into p intervals so that the maximum weight of the intervals is minimized. The currently best known bound for this problem is O(np) [MS95]. In this paper, we present two improved algorithms for this problem: one runs in time O(n + p²(log n)²) and the other runs in time O(n log n). The former is optimal whenever p p n= log n, and the latter is nearoptimal for arbitrary p. We consider the natural generalization of this partitioning to two dimensions, where an n \Theta n array of items is to be partitioned into p² blocks by partitioning the rows and columns into p intervals each and considering the blocks induced by this partition. The problem is to find that partition which minimizes the maximum weight among the resulting blocks. This problem is known to be NPhard [GM96]. Independently, Charikar et. al. have given a simple proof that shows that the problem is in fact NPhard to approximate within a factor of t...
Partitioning an Array onto a Mesh of Processors
 In Proc. of the Workshop on Applied Parallel Computing in Industrial Problems
, 1996
"... . Achieving an even load balance with a low communication overhead is a fundamental task in parallel computing. In this paper we consider the problem of partitioning an array into a number of blocks such that the maximum amount of work in any block is as low as possible. We review different proposed ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
. Achieving an even load balance with a low communication overhead is a fundamental task in parallel computing. In this paper we consider the problem of partitioning an array into a number of blocks such that the maximum amount of work in any block is as low as possible. We review different proposed schemes for this problem and the complexity of their communication pattern. We present new approximation algorithms for computing a well balanced generalized block distribution as well as an algorithm for computing an optimal semigeneralized block distribution. The various algorithms are tested and compared on a number of different matrices. 1 Introduction A basic task in parallel computing is the partitioning and subsequent distribution of data to processors. The problem one faces in this operation is how to balance two often contradictory aims; finding an equal distribution of the computational work and at the same time minimizing the imposed communication. In the data parallel model th...
On the Complexity of the Generalized Block Distribution
, 1996
"... We consider the problem of mapping an array onto a mesh of processors in such a way that locality is preserved. When the computational work associated with the array is distributed in an unstructured way the generalized block distribution has been recognized as an efficient way of achieving an even ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
We consider the problem of mapping an array onto a mesh of processors in such a way that locality is preserved. When the computational work associated with the array is distributed in an unstructured way the generalized block distribution has been recognized as an efficient way of achieving an even load balance while at the same time imposing a simple communication pattern. In this paper we consider the problem of computing an optimal generalized block distribution. We show that this problem is NPcomplete even for very simple cost functions. We also classify a number of variants of the general problem.
Approximations for the General Block Distribution of a Matrix
, 1997
"... The general block distribution of a matrix is a rectilinear partition of the matrix into orthogonal blocks such that the maximum sum of the elements within a single block is minimized. This corresponds to partitioning the matrix onto parallel processors so as to minimize processor load while maintai ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
The general block distribution of a matrix is a rectilinear partition of the matrix into orthogonal blocks such that the maximum sum of the elements within a single block is minimized. This corresponds to partitioning the matrix onto parallel processors so as to minimize processor load while maintaining regular communication patterns. Applications of the problem include various parallel sparse matrix computations, compilers for highperformance languages, particle in cell computations, video and image compression, and simulations associated with a communication network. We analyze the performance guarantee of a natural and practical heuristic based on iterative refinement, which has previously been shown to give good empirical results. When p 2 is the number of blocks, we show that the tight performance ratio is `( p p). When the matrix has rows of large cost, the details of the objective function of the algorithm are shown to be important, since a naive implementation can lead to...
COMPUTATIONFREE PRECONDITIONERS FOR THE PARALLEL SOIJTION OF POWER SYSTEM PROBLEMS
"... Solution of a set of linear equations Ax = b is a recurrent problem in power system analysis. Because of computational dependencies, direct methods have proven of limited value in both parallel and highly vectorized computing environments. The preconditioned conjugate gradient method has been sugge ..."
Abstract
 Add to MetaCart
Solution of a set of linear equations Ax = b is a recurrent problem in power system analysis. Because of computational dependencies, direct methods have proven of limited value in both parallel and highly vectorized computing environments. The preconditioned conjugate gradient method has been suggested as a better alternative to direct methods. The preconditioning step itself is not particularly well suited to parallel processing. Partitioned inverse representations of A are better suited to high performance computation. However, obtaining the partitioned inverse matrices can be expensive. This paper describes two techniques for preconditioning based on the partitioned inverses where the preconditioner matrix is obtained directly from an incomplete factorization without the need for additional numerical computation. Experiments indicate a 50% reduction in solution time in a uarallel environment.
Tiling MultiDimensional Arrays
 In Proceedings of the 12th International Symposium on Fundamentals of Computation Theory
, 1999
"... . We continue the study of the tiling problems introduced in [KMP98]. The rst problem we consider is: given a ddimensional array of nonnegative numbers and a tile limit p, partition the array into at most p rectangular, nonoverlapping subarrays, referred to as tiles, in such a way as to minim ..."
Abstract
 Add to MetaCart
. We continue the study of the tiling problems introduced in [KMP98]. The rst problem we consider is: given a ddimensional array of nonnegative numbers and a tile limit p, partition the array into at most p rectangular, nonoverlapping subarrays, referred to as tiles, in such a way as to minimise the weight of the heaviest tile, where the weight of a tile is the sum of the elements that fall within it. For onedimensional arrays the problem can be solved optimally in polynomial time, whereas for twodimensional arrays it is shown in [KMP98] that the problem is NPhard and an approximation algorithm is given. This paper oers a new (d 2 +2d 1)=(2d 1) approximation algorithm for the ddimensional problem (d 2), which improves the (d + 3)=2 approximation algorithm given in [SS99]. In particular, for twodimensional arrays, our approximation ratio is 7=3 improving on the ratio of 5=2 in [KMP98] and [SS99]. We briey consider the dual tiling problem where, rather than ha...
Approximation Algorithms for MinMax Generalization Problems
"... Abstract. We provide improved approximation algorithms for the minmax generalization problems considered by Du, Eppstein, Goodrich, and Lueker [1]. In minmax generalization problems, the input consists of data items with weights and a lower bound wlb, and the goal is to partition individual items i ..."
Abstract
 Add to MetaCart
Abstract. We provide improved approximation algorithms for the minmax generalization problems considered by Du, Eppstein, Goodrich, and Lueker [1]. In minmax generalization problems, the input consists of data items with weights and a lower bound wlb, and the goal is to partition individual items into groups of weight at least wlb, while minimizing the maximum weight of a group. The rules of legal partitioning are specific to a problem. Du et al. consider several problems in this vein: (1) partitioning a graph into connected subgraphs, (2) partitioning unstructured data into arbitrary classes and (3) partitioning a 2dimensional array into nonoverlapping contiguous rectangles (subarrays) that satisfy the above size requirements. We significantly improve approximation ratios for all the problems considered by Du et al., and provide additional motivation for these problems. Moreover, for the first problem, while Du et al. give approximation algorithms for specific graph families, namely, 3connected and 4connected planar graphs, no approximation algorithm that works for all graphs was known prior to this work. 1