Results 1 - 10
of
12
Highly scalable parallel algorithms for sparse matrix factorization
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract
-
Cited by 100 (29 self)
- Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systems—both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two- and three-dimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for medium-size structural engineering and linear programming problems. To the best of our knowledge,
Analyzing Scalability of Parallel Algorithms and Architectures
- Journal of Parallel and Distributed Computing
, 1994
"... The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithm-architecture combination for a problem under different constraints on the growth of ..."
Abstract
-
Cited by 84 (17 self)
- Add to MetaCart
The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithm-architecture combination for a problem under different constraints on the growth of the problem size and the number of processors. It may be used to predict the performance of a parallel algorithm and a parallel architecture for a large number of processors from the known performance on fewer processors. For a fixed problem size, it may be used to determine the optimal number of processors to be used and the maximum possible speedup that can be obtained. The objective of this paper is to critically assess the state of the art in the theory of scalability analysis, and motivate further research on the development of new and more comprehensive analytical tools to study the scalability of parallel algorithms and architectures. We survey a number of techniques and formalisms t...
Isoefficiency Function: A Scalability Metric for Parallel Algorithms and Architectures
, 1993
"... This paper provides a tutorial introduction to a performance evaluation metric called the isoefficiency function. Traditional methods for evaluating serial algorithms are inadequate for analyzing the performance of parallel algorithm-architecture combinations. Isoefficiency function has proven usef ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
This paper provides a tutorial introduction to a performance evaluation metric called the isoefficiency function. Traditional methods for evaluating serial algorithms are inadequate for analyzing the performance of parallel algorithm-architecture combinations. Isoefficiency function has proven useful for evaluating the performance of a wide variety of such combinations. On a sequential computer, the fastest algorithm for solving a given problem is the best algorithm. However, the performance of a parallel algorithm for a specific problem instance on a given number of processors provides only limited information. The time taken by a parallel algorithm to solve a problem instance depends on the problem size, the number of processors used to solve the problem, and machine characteristics such as: processor speed, speed of communication channels, type of interconnection network, and routing techniques. An algorithm that yields good performance for a selected problem on a fixed number of processors on a given machine may perform poorly if any of these parameters are changed. Hence, the evaluation of a parallel algorithm on a parallel computer requires a more comprehensive analysis, and the study of scalability aids us in this analysis. The
Scalability of Massively Parallel Depth-First Search
- In DIMACS Workshop
, 1994
"... .We analyze and compare the scalabilityoftwo generic schemes for heuristic depth-#rst search on highly parallel MIMD systems. The #rst one employs a task attraction mechanism where the work packets are generated on demand by splitting the donor's stack. Analytical and empirical analyses show tha ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
.We analyze and compare the scalabilityoftwo generic schemes for heuristic depth-#rst search on highly parallel MIMD systems. The #rst one employs a task attraction mechanism where the work packets are generated on demand by splitting the donor's stack. Analytical and empirical analyses show that this stack-splitting scheme works e#ciently on parallel systems with a small communication diameter and a moderate number of processing elements. The second scheme, search-frontier splitting, also employs a task attraction mechanism, but uses pre-computed work packets taken from a search-frontier level of the tree. At the beginning, a search-frontier is generated and stored in the local memories. Then, the processors expand the subtrees of their frontier nodes, communicating only when they run out of work or a solution has been found. Empirical results obtained on a 32 # 32 = 1024 node MIMD system indicate that the search-frontier splitting scheme incurs fewer overheadsand scale...
Performance Evaluation for Parallel Systems: A Survey
, 1997
"... Performance is often a key factor in determining the success of a parallel software system. Performance evaluation... ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Performance is often a key factor in determining the success of a parallel software system. Performance evaluation...
Analysis and Design of Scalable Parallel Algorithms for Scientific Computing
, 1995
"... This dissertation presents a methodology for understanding the performance and scalability of algorithms on parallel computers and the scalability analysis of a variety of numerical algorithms. We demonstrate the analytical power of this technique and show how it can guide the development of better ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
This dissertation presents a methodology for understanding the performance and scalability of algorithms on parallel computers and the scalability analysis of a variety of numerical algorithms. We demonstrate the analytical power of this technique and show how it can guide the development of better parallel algorithms. We present some new highly scalable parallel algorithms for sparse matrix computations that were widely considered to be poorly suitable for large scale parallel computers. We present some laws governing the performance and scalability properties that apply to all parallel systems. We show that our results generalize or extend a range of earlier research results concerning the performance of parallel systems. Our scalability analysis of algorithms such as fast Fourier transform (FFT), dense matrix multiplication, sparse matrix-vector multiplication, and the preconditioned conjugate gradient (PCG) provides many interesting insights into their behavior on parallel computer...
The Optimal Effectiveness Metric for Parallel Application Analysis
- In Special Issue on Parallel Models
, 1998
"... This paper discusses a scalability metric based on the cost effectiveness of parallel algorithms. Unlike other scalability measures, this metric can be used to compare different parallel algorithms and identify specific conditions of problem size and processor allocation that characterize "crossover ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper discusses a scalability metric based on the cost effectiveness of parallel algorithms. Unlike other scalability measures, this metric can be used to compare different parallel algorithms and identify specific conditions of problem size and processor allocation that characterize "crossover" points and intervals where one algorithm becomes more cost effective than another. Finally, this paper presents a series of examples to illustrate the measurement methodology in practice. 1 Introduction The measurement of parallel applications is of significant interest to the evaluation and categorization of various parallel algorithms. This paper argues that a useful metric for parallel algorithm analysis should be consistent, quantitative, predictive, and relevant. A metric is consistent if independent researchers analyzing the same algorithm on the same architecture will arrive at similar conclusions. A metric is quantitative if it can be used to quantify the benefit of disparate algo...
A Metric for Parallel Poly-Algorithm Design
, 1997
"... This paper discusses a scalability metric based on the cost effectiveness of parallel algorithms. Unlike other scalability measures, this metric can be used to compare different parallel algorithms and identify specific conditions of problem size and processor allocation that characterize "crossover ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper discusses a scalability metric based on the cost effectiveness of parallel algorithms. Unlike other scalability measures, this metric can be used to compare different parallel algorithms and identify specific conditions of problem size and processor allocation that characterize "crossover" points and intervals where one algorithm becomes more cost effective than another. Finally, this paper presents a series of examples to illustrate the measurement methodology in practice. 1 Introduction Consider the development of an algorithm that multiplies matrices. Of the many algorithms that might be employed, two of the most popular methods are the naive algorithm and the Strassen algorithm. Asymptotically, the naive algorithm is O(n 3 ) while the Strassen algorithm is O(n 2:81 ). Although the Strassen algorithm is asymptotically better than the naive algorithm, the setup cost of the Strassen algorithm makes it inefficient for small matrices. An optimal algorithm might employ bo...
Parallel Algorithm Scalability Issues in Petaflops Architectures
, 2000
"... The projected design space of petaFLOPS architectures entails exploitation of very large degrees of concurrency, locality of data access, and tolerance to latency. This puts considerable pressure on the design of parallel algorithms capable of eectively utilizing increasing amounts of processing ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The projected design space of petaFLOPS architectures entails exploitation of very large degrees of concurrency, locality of data access, and tolerance to latency. This puts considerable pressure on the design of parallel algorithms capable of eectively utilizing increasing amounts of processing resources in a memory and bandwidth constrained environment. This aspect of algorithm design, also referred to as scalability analysis, is a key component for guiding algorithm designers as well as hardware architects.
A Data Parallel Augmenting Path Algorithm for the Dense Linear Many-To-One Assignment Problem
, 1995
"... . The purpose of this study is to describe a data parallel primal-dual augmenting path algorithm for the dense linear many-to-one assignment problem also known as semi-assignment. This problem could for instance be described as assigning n persons to m( n) job groups. The algorithm is tailored speci ..."
Abstract
- Add to MetaCart
. The purpose of this study is to describe a data parallel primal-dual augmenting path algorithm for the dense linear many-to-one assignment problem also known as semi-assignment. This problem could for instance be described as assigning n persons to m( n) job groups. The algorithm is tailored specifically for massive SIMD parallelism and employs, in this context, a new efficient breadth-first-search augmenting path technique which is shown to be faster than the shortest augmenting path search normally used in sequential algorithms for this problem. We show that the best known sequential computational complexity of O(mn 2 ) for dense problems, is reduced to the parallel complexity of O(mn), on a machine with n processors supporting reductions in O(1) time. The algorithm is easy to implement efficiently on commercially available massively parallel computers. A range of numerical experiments are performed on a Connection Machine CM200 and a MasPar MP-2. The tests show the good performa...

