Results 1 - 10
of
10
A Programming Methodology for Dual-tier Multicomputers
- IEEE Transactions on Software Engineering
, 1999
"... Hierarchically-organized ensembles of shared memory multiprocessors possess a richer and more complex model of locality than previous generation multicomputers with single processor nodes. These dual-tier computers introduce many new degrees of freedom into the programmer 's performance model. We pr ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
Hierarchically-organized ensembles of shared memory multiprocessors possess a richer and more complex model of locality than previous generation multicomputers with single processor nodes. These dual-tier computers introduce many new degrees of freedom into the programmer 's performance model. We present a methodology for implementing block-structured numerical applications on dual-tier computers, and a run-time infrastructure, called KeLP2, that implements the methodology. KeLP2 supports two levels of locality and parallelism via hierarchical SPMD control flow, run-time geometric meta-data, and asynchronous collective communication. It effectively overlaps communication in cases where non-blocking point-to-point message passing can fail to tolerate communication latency, either due to an incomplete implementation or because the point-to-point model is inappropriate. KeLP's abstractions hide considerable detail without sacrificing performance, and dual-tier applications written in KeLP...
Future Research Directions In Problem Solving Environments For Computational Science
- Center for Supercomputing Research and Development
, 1991
"... this report was partially supported by Grant CCR-90-24549 from the National Science Foundation. This is a report to the National Science Foundation and other agencies; it is not a report by or of the National Science Foundation or any other agency. Participants at the Workshop on Research Directio ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
this report was partially supported by Grant CCR-90-24549 from the National Science Foundation. This is a report to the National Science Foundation and other agencies; it is not a report by or of the National Science Foundation or any other agency. Participants at the Workshop on Research Directions in Integrating Numerical Analysis, Symbolic Computing, Computational Geometry, and Artificial Intelligence for Computational Science Conference Organizers
Runtime Support for Multi-Tier Programming of Block-Structured Applications on SMP Clusters
- International Scientific Computing in Object-Oriented Parallel Environments Conference (ISCOPE ’97
, 1997
"... . We present a small set of programming abstractions to simplify efficient implementations for block-structured scientific calculations on SMP clusters. We have implemented these abstractions in KeLP 2.0, a C++ class library. KeLP 2.0 provides hierarchical SMPD control flow to manage two levels of p ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
. We present a small set of programming abstractions to simplify efficient implementations for block-structured scientific calculations on SMP clusters. We have implemented these abstractions in KeLP 2.0, a C++ class library. KeLP 2.0 provides hierarchical SMPD control flow to manage two levels of parallelism and locality. Additionally, to tolerate slow inter-node communication costs, KeLP 2.0 combines inspector /executor communication analysis with overlap of communication and computation. We illustrate how these programming abstractions hide the low-level details of thread management, scheduling, synchronization, and message-passing, but allow the programmer to express efficient algorithms with intuitive geometric primitives. 1 Introduction Multi-tier parallel computers, such as clusters of symmetric multiprocessors (SMPs), have emerged as important platforms for high-performance computing [1]. A multi-tier computer, with several levels of locality and parallelism, presents a more c...
A Programming Model for Block-Structured Scientific Calculations on SMP Clusters
- Calculations on SMP Clusters. Ph. D. Dissertation, UCSD
, 1998
"... [None] ..."
Success And Limitations In Automatic Parallelization Of The Perfect Benchmarks Programs
, 1992
"... There has been much work on developing new techniques for parallelizing programs yet there has been little empirical analysis to support these techniques. This thesis attempts to close this gap by measuring and analyzing the effectiveness of commercial parallelizing compilers on the Perfect Benchmar ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
There has been much work on developing new techniques for parallelizing programs yet there has been little empirical analysis to support these techniques. This thesis attempts to close this gap by measuring and analyzing the effectiveness of commercial parallelizing compilers on the Perfect Benchmarks TM . The speedups of these codes that result from automatic parallelization will be reported. The performance gains attributed to the individual restructuring techniques, which assist the parallelization of the codes, are also given. The successes and failures of these transformations will be analyzed. This thesis will also closely examine each code in the Perfect Benchmarks TM , so to determine why each code parallelized poorly or well. Finally, potentential improvements will be offered. iv ACKNOWLEDGEMENTS I would like to thank my thesis advisor, Rudolf Eigenmann, for his guidance and suggestions. His support and insight proved to be invaluable in the construction of this thesis. ...
Parallelization and Performance of Conjugate Gradient Algorithms on the Cedar hierarchical-memory Multiprocessor
- In 3rd Symp. Principles & Practice of Parallel Programming
, 1991
"... The conjugate gradient method is a powerful algorithm for solving well-structured sparse linear systems that arise from partial differential equations. The broad application range makes it an interesting object for investigating novel architectures and programming systems. In this paper we analyze t ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
The conjugate gradient method is a powerful algorithm for solving well-structured sparse linear systems that arise from partial differential equations. The broad application range makes it an interesting object for investigating novel architectures and programming systems. In this paper we analyze the computational structure of three different conjugate gradient schemes for solving elliptic partial differential equations. We describe its parallel implementation on the Cedar hierarchical memory multiprocessor from both angles, explicit manual parallelization and automatic compilation. We report performance measurements taken on Cedar, which allow us a number of conclusions on the Cedar architecture, the programming methodology for hierarchical computer structures, and the contrast of manual vs automatic parallelization. 1 Introduction The preconditioned Conjugate Gradient Method is a powerful tool for solving sparse well structured symmetric positive definite linear systems that arise...
Improving Memory Utilization in Cache Coherence Directories
, 1993
"... Efficiently maintaining cache coherence is a major problem in large-scale shared memory multiprocessors. Hardware directory coherence schemes have very high memory requirements, while software-directed schemes must rely on imprecise compile-time memory disambiguation. Recently proposed dynamically t ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Efficiently maintaining cache coherence is a major problem in large-scale shared memory multiprocessors. Hardware directory coherence schemes have very high memory requirements, while software-directed schemes must rely on imprecise compile-time memory disambiguation. Recently proposed dynamically tagged directory schemes allocate pointers to blocks only as they are referenced, which significantly reduces their memory requirements, but they still allocate pointers to blocks that do not need them. We present two compiler optimizations that exploit the high-level sharing information available to the compiler to further reduce the size of a tagged directory by allocating pointers only when necessary. Trace-driven simulations are used to show that the performance of this combined hardware-software approach is comparable to other coherence schemes, but with significantly lower memory requirements. In addition, these simulations suggest that this approach is less sensitive to the quality of ...
PARCEL and MIPRAC: Parallelizers for Symbolic and Numeric Programs
"... Semantics for MIL The abstract interpretation that is MIL's interprocedural analysis algorithm is derived in several steps. We begin with a standard semantics over the following domains: S = Id ! Z ! B stores B = (Z \Theta Y ) ! V objects V = L + C + Z values L = Id \Theta Z \Theta Z locations ..."
Abstract
- Add to MetaCart
Semantics for MIL The abstract interpretation that is MIL's interprocedural analysis algorithm is derived in several steps. We begin with a standard semantics over the following domains: S = Id ! Z ! B stores B = (Z \Theta Y ) ! V objects V = L + C + Z values L = Id \Theta Z \Theta Z locations C = Proc \Theta E closures E = Id ! Z environments Z integers Y ae Z sizes T booleans Proc procedures Expr expressions Id identifiers The domains Z, Y , T , Expr , Proc, and Id are flat: each has a least element (?Z , ?T , etc.), and its other elements are incomparable. The structure of the non-bottom elements of Expr and Proc is given by the grammar of MIL above. None of the above domains is reflexive, since each is defined in terms of those strictly below it in the table. Said another way, the equation for each can be written (finitely) as products (\Theta), sums (+) and function spaces (!) of the flat domains Z, T , Expr , Proc, and Id . For example, B is equal to (Z \Theta Y ) ...
The Design of Automatic Parallelizers for Symbolic and Numeric Programs
, 1989
"... this paper we will describe the strengths and weaknesses of Parcel, and outline the design of Miprac in light of our experience with Parcel. ..."
Abstract
- Add to MetaCart
this paper we will describe the strengths and weaknesses of Parcel, and outline the design of Miprac in light of our experience with Parcel.
Improving Memory Utilization in Cache . . .
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 10, OCTOBER 1993, PP. 1130-1146.
, 1993
"... Efficiently maintaining cache coherence is a major problem in large-scale shared memory multiprocessors. Hardware directory coherence schemes have very high memory requirements, while software-directed schemes must rely on imprecise compile-time memory disambiguation. Recently proposed dynamically t ..."
Abstract
- Add to MetaCart
Efficiently maintaining cache coherence is a major problem in large-scale shared memory multiprocessors. Hardware directory coherence schemes have very high memory requirements, while software-directed schemes must rely on imprecise compile-time memory disambiguation. Recently proposed dynamically tagged directory schemes allocate pointers to blocks only as they are referenced, which significantly reduces their memory requirements, but they still allocate pointers to blocks that do not need them. We present two compiler optimizations that exploit the high-level sharing information available to the compiler to further reduce the size of a tagged directory by allocating pointers only when necessary. Trace-driven simulations are used to show that the performance of this combined hardware-software approach is comparable to other coherence schemes, but with significantly lower memory requirements. In addition, these simulations suggest that this approach is less sensitive to the quality of the memory disambiguation and interprocedural analysis performed by the compiler than software-only coherence schemes.

