## Parallelization and Performance of Conjugate Gradient Algorithms on the Cedar hierarchical-memory Multiprocessor (1991)

Venue: | In 3rd Symp. Principles & Practice of Parallel Programming |

Citations: | 6 - 1 self |

### BibTeX

@INPROCEEDINGS{Meier91parallelizationand,

author = {Ulrike Meier and Rudolf Eigenmann},

title = {Parallelization and Performance of Conjugate Gradient Algorithms on the Cedar hierarchical-memory Multiprocessor},

booktitle = {In 3rd Symp. Principles & Practice of Parallel Programming},

year = {1991},

pages = {178--188}

}

### OpenURL

### Abstract

The conjugate gradient method is a powerful algorithm for solving well-structured sparse linear systems that arise from partial differential equations. The broad application range makes it an interesting object for investigating novel architectures and programming systems. In this paper we analyze the computational structure of three different conjugate gradient schemes for solving elliptic partial differential equations. We describe its parallel implementation on the Cedar hierarchical memory multiprocessor from both angles, explicit manual parallelization and automatic compilation. We report performance measurements taken on Cedar, which allow us a number of conclusions on the Cedar architecture, the programming methodology for hierarchical computer structures, and the contrast of manual vs automatic parallelization. 1 Introduction The preconditioned Conjugate Gradient Method is a powerful tool for solving sparse well structured symmetric positive definite linear systems that arise...

### Citations

88 |
Block Preconditioning for the Conjugate Gradient Method
- Concus, Golub, et al.
- 1985
(Show Context)
Citation Context ...y system. Among the preconditioners considered were many variants of Incomplete Cholesky factorization preconditioners which improve the condition number of the preconditioned system very efficiently =-=[CGM85]-=-, are however highly recursive and not suited for parallel computation. Vectorization efforts improved the performance, worsened however the convergence of the method [Meu84]. Polynomial preconditione... |

69 |
Practical use of polynomial preconditioning for the conjugate gradient method
- Saad
- 1985
(Show Context)
Citation Context ...re however highly recursive and not suited for parallel computation. Vectorization efforts improved the performance, worsened however the convergence of the method [Meu84]. Polynomial preconditioners =-=[Saa85]-=- were another attempt to combine higher convergence rates and a higher degree of parallelism, turned however out to be not as efficient as a high convergence rate requires a high degree polynomial whi... |

49 | On the Problem of Optimizing Data Transfers for Complex Memory Systems
- Gallivan, Jalby, et al.
- 1988
(Show Context)
Citation Context ...sor clusters. There are only few known approaches, most of which include user assistance for the partitioning process [], or tackle the problem at a theoretical end, not yet proven useful in practice =-=[GJG88]-=-. Our approach here shall be to find heuristics that deal with significant program patterns. The regular computational structure of the CG allows us to divide the loop index spaces into 4 chunks and t... |

28 |
Cedar Fortran and Other Vector and Parallel Fortran Dialects
- Guzzi, Padua, et al.
- 1990
(Show Context)
Citation Context ...ailable to the user through Cedar Fortran, the main application programming language. Cedar Fortran is basically Fortran77 with a few additional constructs for exploiting Cedar architectural features =-=[GPHL90]-=- . Users program Cedar by either writing directly in Cedar Fortran, or by starting from a sequential Fortran77 code and applying the auto-parallelizing Cedar Restructurer, outputing Cedar Fortran [EHJ... |

13 |
The behavior of conjugate gradient algorithms on amultivector processor with a hierarchical memory
- Meier, Sameh
- 1988
(Show Context)
Citation Context ...luster's processors. Former experiments on an Alliant FX/8 (which is equivalent to a cluster of Cedar) have shown that the performance of iterative methods on one cluster is limited by its cache size =-=[MS88]-=-. But the use of this new architecture showed an improvement in performance. Data locality could be increased significantly by distributing the data across clusters and handling smaller chunks on each... |

13 |
der Vorst. The performance of Fortran implementations for preconditioned conjugate gradients on vector computers
- van
- 1986
(Show Context)
Citation Context ...ntial equations. Many efforts have been made to implement it with a variety of preconditioning techniques on different parallel computers, trying to take advantage of the various architectures [MS88] =-=[vdV86]-=-. As it consists mainly of vector operations, it turns out to be a very efficient method for a vector computer but due to the necessity of evaluating dotproducts in each iteration, it is not as well s... |

11 |
Cedar Fortran and Its Compiler
- Eigenmann, Hoeflinger, et al.
- 1990
(Show Context)
Citation Context ...L90] . Users program Cedar by either writing directly in Cedar Fortran, or by starting from a sequential Fortran77 code and applying the auto-parallelizing Cedar Restructurer, outputing Cedar Fortran =-=[EHJP90]-=-. Important Cedar Fortran constructs, refered to in this paper are: CTSKSTART forks a new task. The fork operation is costly, taking up to 200ms. It is used for initiating long-term parallel activitie... |

10 | The kap/205: An advanced source-to-source vectorizer for the Cyber 205 supercomputer - Huson, Macke, et al. - 1986 |

6 |
Xylem: An Operating System for the Cedar Multiprocessor
- Emrath
- 1985
(Show Context)
Citation Context ...edar configuration is expected to have 8 processors per cluster and a global memory of 64 megabytes. 2.2 Cedar system software The software that coordinates the clusters is the Xylem Operating System =-=[Emr85]-=-, an extension of Unix. Its functionality is made available to the user through Cedar Fortran, the main application programming language. Cedar Fortran is basically Fortran77 with a few additional con... |

3 |
The block preconditioned conjugate gradient algorithm on vector computers
- MEURANT
- 1984
(Show Context)
Citation Context ...d system very efficiently [CGM85], are however highly recursive and not suited for parallel computation. Vectorization efforts improved the performance, worsened however the convergence of the method =-=[Meu84]-=-. Polynomial preconditioners [Saa85] were another attempt to combine higher convergence rates and a higher degree of parallelism, turned however out to be not as efficient as a high convergence rate r... |