Results 1  10
of
15
Highly scalable parallel algorithms for sparse matrix factorization
 IEEE Transactions on Parallel and Distributed Systems
, 1994
"... In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algo ..."
Abstract

Cited by 116 (29 self)
 Add to MetaCart
In this paper, we describe a scalable parallel algorithm for sparse matrix factorization, analyze their performance and scalability, and present experimental results for up to 1024 processors on a Cray T3D parallel computer. Through our analysis and experimental results, we demonstrate that our algorithm substantially improves the state of the art in parallel direct solution of sparse linear systemsâ€”both in terms of scalability and overall performance. It is a well known fact that dense matrix factorization scales well and can be implemented efficiently on parallel computers. In this paper, we present the first algorithm to factor a wide class of sparse matrices (including those arising from two and threedimensional finite element problems) that is asymptotically as scalable as dense matrix factorization algorithms on a variety of parallel architectures. Our algorithm incurs less communication overhead and is more scalable than any previously known parallel formulation of sparse matrix factorization. Although, in this paper, we discuss Cholesky factorization of symmetric positive definite matrices, the algorithms can be adapted for solving sparse linear least squares problems and for Gaussian elimination of diagonally dominant matrices that are almost symmetric in structure. An implementation of our sparse Cholesky factorization algorithm delivers up to 20 GFlops on a Cray T3D for mediumsize structural engineering and linear programming problems. To the best of our knowledge,
Analyzing Scalability of Parallel Algorithms and Architectures
 Journal of Parallel and Distributed Computing
, 1994
"... The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithmarchitecture combination for a problem under different constraints on the growth of ..."
Abstract

Cited by 90 (18 self)
 Add to MetaCart
The scalability of a parallel algorithm on a parallel architecture is a measure of its capacity to effectively utilize an increasing number of processors. Scalability analysis may be used to select the best algorithmarchitecture combination for a problem under different constraints on the growth of the problem size and the number of processors. It may be used to predict the performance of a parallel algorithm and a parallel architecture for a large number of processors from the known performance on fewer processors. For a fixed problem size, it may be used to determine the optimal number of processors to be used and the maximum possible speedup that can be obtained. The objective of this paper is to critically assess the state of the art in the theory of scalability analysis, and motivate further research on the development of new and more comprehensive analytical tools to study the scalability of parallel algorithms and architectures. We survey a number of techniques and formalisms t...
Isoefficiency Function: A Scalability Metric for Parallel Algorithms and Architectures
, 1993
"... This paper provides a tutorial introduction to a performance evaluation metric called the isoefficiency function. Traditional methods for evaluating serial algorithms are inadequate for analyzing the performance of parallel algorithmarchitecture combinations. Isoefficiency function has proven usef ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
This paper provides a tutorial introduction to a performance evaluation metric called the isoefficiency function. Traditional methods for evaluating serial algorithms are inadequate for analyzing the performance of parallel algorithmarchitecture combinations. Isoefficiency function has proven useful for evaluating the performance of a wide variety of such combinations. On a sequential computer, the fastest algorithm for solving a given problem is the best algorithm. However, the performance of a parallel algorithm for a specific problem instance on a given number of processors provides only limited information. The time taken by a parallel algorithm to solve a problem instance depends on the problem size, the number of processors used to solve the problem, and machine characteristics such as: processor speed, speed of communication channels, type of interconnection network, and routing techniques. An algorithm that yields good performance for a selected problem on a fixed number of processors on a given machine may perform poorly if any of these parameters are changed. Hence, the evaluation of a parallel algorithm on a parallel computer requires a more comprehensive analysis, and the study of scalability aids us in this analysis. The
Scalability of Massively Parallel DepthFirst Search
 In DIMACS Workshop
, 1994
"... .We analyze and compare the scalabilityoftwo generic schemes for heuristic depth#rst search on highly parallel MIMD systems. The #rst one employs a task attraction mechanism where the work packets are generated on demand by splitting the donor's stack. Analytical and empirical analyses show tha ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
.We analyze and compare the scalabilityoftwo generic schemes for heuristic depth#rst search on highly parallel MIMD systems. The #rst one employs a task attraction mechanism where the work packets are generated on demand by splitting the donor's stack. Analytical and empirical analyses show that this stacksplitting scheme works e#ciently on parallel systems with a small communication diameter and a moderate number of processing elements. The second scheme, searchfrontier splitting, also employs a task attraction mechanism, but uses precomputed work packets taken from a searchfrontier level of the tree. At the beginning, a searchfrontier is generated and stored in the local memories. Then, the processors expand the subtrees of their frontier nodes, communicating only when they run out of work or a solution has been found. Empirical results obtained on a 32 # 32 = 1024 node MIMD system indicate that the searchfrontier splitting scheme incurs fewer overheadsand scale...
Analysis and Design of Scalable Parallel Algorithms for Scientific Computing
, 1995
"... This dissertation presents a methodology for understanding the performance and scalability of algorithms on parallel computers and the scalability analysis of a variety of numerical algorithms. We demonstrate the analytical power of this technique and show how it can guide the development of better ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
This dissertation presents a methodology for understanding the performance and scalability of algorithms on parallel computers and the scalability analysis of a variety of numerical algorithms. We demonstrate the analytical power of this technique and show how it can guide the development of better parallel algorithms. We present some new highly scalable parallel algorithms for sparse matrix computations that were widely considered to be poorly suitable for large scale parallel computers. We present some laws governing the performance and scalability properties that apply to all parallel systems. We show that our results generalize or extend a range of earlier research results concerning the performance of parallel systems. Our scalability analysis of algorithms such as fast Fourier transform (FFT), dense matrix multiplication, sparse matrixvector multiplication, and the preconditioned conjugate gradient (PCG) provides many interesting insights into their behavior on parallel computer...
Performance Evaluation for Parallel Systems: A Survey
, 1997
"... Performance is often a key factor in determining the success of a parallel software system. Performance evaluation... ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Performance is often a key factor in determining the success of a parallel software system. Performance evaluation...
Scalable Parallel Genetic Algorithms
 Artificial Intelligence Review
, 2001
"... Abstract. Genetic algorithms, search algorithms based on the genetic processes observed in natural evolution, have been used to solve difficult problems in many different disciplines. When applied to very largescale problems, genetic algorithms exhibit high computational cost and degradation of the ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
Abstract. Genetic algorithms, search algorithms based on the genetic processes observed in natural evolution, have been used to solve difficult problems in many different disciplines. When applied to very largescale problems, genetic algorithms exhibit high computational cost and degradation of the quality of the solutions because of the increased complexity. One of the most relevant research trends in genetic algorithms is the implementation of parallel genetic algorithms with the goal of obtaining quality of solutions efficiently. This paper first reviews the stateoftheart in parallel genetic algorithms. Parallelization strategies and emerging implementations are reviewed and relevant results are discussed. Second, this paper discusses important issues regarding scalability of parallel genetic algorithms.
The Optimal Effectiveness Metric for Parallel Application Analysis
 In Special Issue on Parallel Models
, 1998
"... This paper discusses a scalability metric based on the cost effectiveness of parallel algorithms. Unlike other scalability measures, this metric can be used to compare different parallel algorithms and identify specific conditions of problem size and processor allocation that characterize "crossover ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
This paper discusses a scalability metric based on the cost effectiveness of parallel algorithms. Unlike other scalability measures, this metric can be used to compare different parallel algorithms and identify specific conditions of problem size and processor allocation that characterize "crossover" points and intervals where one algorithm becomes more cost effective than another. Finally, this paper presents a series of examples to illustrate the measurement methodology in practice. 1 Introduction The measurement of parallel applications is of significant interest to the evaluation and categorization of various parallel algorithms. This paper argues that a useful metric for parallel algorithm analysis should be consistent, quantitative, predictive, and relevant. A metric is consistent if independent researchers analyzing the same algorithm on the same architecture will arrive at similar conclusions. A metric is quantitative if it can be used to quantify the benefit of disparate algo...
A Metric for Parallel PolyAlgorithm Design
, 1997
"... This paper discusses a scalability metric based on the cost effectiveness of parallel algorithms. Unlike other scalability measures, this metric can be used to compare different parallel algorithms and identify specific conditions of problem size and processor allocation that characterize "crossover ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
This paper discusses a scalability metric based on the cost effectiveness of parallel algorithms. Unlike other scalability measures, this metric can be used to compare different parallel algorithms and identify specific conditions of problem size and processor allocation that characterize "crossover" points and intervals where one algorithm becomes more cost effective than another. Finally, this paper presents a series of examples to illustrate the measurement methodology in practice. 1 Introduction Consider the development of an algorithm that multiplies matrices. Of the many algorithms that might be employed, two of the most popular methods are the naive algorithm and the Strassen algorithm. Asymptotically, the naive algorithm is O(n 3 ) while the Strassen algorithm is O(n 2:81 ). Although the Strassen algorithm is asymptotically better than the naive algorithm, the setup cost of the Strassen algorithm makes it inefficient for small matrices. An optimal algorithm might employ bo...
Parallel Algorithm Scalability Issues in Petaflops Architectures
, 2000
"... The projected design space of petaFLOPS architectures entails exploitation of very large degrees of concurrency, locality of data access, and tolerance to latency. This puts considerable pressure on the design of parallel algorithms capable of eectively utilizing increasing amounts of processing ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The projected design space of petaFLOPS architectures entails exploitation of very large degrees of concurrency, locality of data access, and tolerance to latency. This puts considerable pressure on the design of parallel algorithms capable of eectively utilizing increasing amounts of processing resources in a memory and bandwidth constrained environment. This aspect of algorithm design, also referred to as scalability analysis, is a key component for guiding algorithm designers as well as hardware architects.