Results 1 - 10
of
18
The PARADIGM Compiler for Distributed-Memory Message Passing Multicomputers
- IEEE Computer
, 1994
"... The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. In addition to performing traditional compiler optimizations, PARADIGM is unique in that it addresses many other is ..."
Abstract
-
Cited by 98 (9 self)
- Add to MetaCart
The PARADIGM compiler project provides an automated means to parallelize programs, written in a serial programming model, for efficient execution on distributed-memory multicomputers. In addition to performing traditional compiler optimizations, PARADIGM is unique in that it addresses many other issues within a unified platform: automatic data distribution, synthesis of high-level communication, communication optimizations, irregular computations, functional and data parallelism, and multithreaded execution. This paper describes the techniques used and provides experimental evidence of their effectiveness. 1 Introduction Distributed-memory massively parallel multicomputers can provide the high levels of performance required to solve the Grand Challenge computational science problems [16]. Distributed-memory multicomputers such as the Intel iPSC/860, the Intel Paragon, the IBM SP-1 and the Thinking Machines CM-5 offer significant advantages over shared-memory multiprocessors in terms...
Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers
, 1994
"... The PARADIGM (PARAllelizing compiler for DIstributed-memory General-purpose Multicomputers) project at the University of Illinois provides a fully automated means to parallelize programs, written in a serial programming model, for execution on distributedmemory multicomputers. To provide efficient e ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
The PARADIGM (PARAllelizing compiler for DIstributed-memory General-purpose Multicomputers) project at the University of Illinois provides a fully automated means to parallelize programs, written in a serial programming model, for execution on distributedmemory multicomputers. To provide efficient execution, PARADIGM automatically performs various optimizations to reduce the overhead and idle time caused by interprocessor communication. Optimizations studied in this paper include message coalescing, message vectorization, message aggregation, and coarse grain pipelining. To separate the optimization algorithms from machine-specific details, parameterized models are used to estimate communication and computation costs for a given machine. The models are also used in coarse grain pipelining to automatically select a task granularity that balances the available parallelism with the costs of communication. To determine the applicability of the optimizations on different machines, we analyzed their performance on an Intel iPSC/860, an Intel iPSC/2, and a Thinking Machines CM-5.
Processor Tagged Descriptors: A Data Structure for Compiling for Distributed-Memory Multicomputers
- the Proceedings of the Parallel Architectures and Compiler Technology Conference
, 1994
"... The computation partitioning, communication analysis, and optimization phases performed during compilation for distributed-memory multicomputers require an efficient way of describing distributed sets of iterations and regions of data. Processor Tagged Descriptors (PTDs) provide these capabilities t ..."
Abstract
-
Cited by 14 (9 self)
- Add to MetaCart
The computation partitioning, communication analysis, and optimization phases performed during compilation for distributed-memory multicomputers require an efficient way of describing distributed sets of iterations and regions of data. Processor Tagged Descriptors (PTDs) provide these capabilities through a single set representation parameterized by the processor location for each dimension of a virtual mesh. A uniform representation is maintained for every processor in the mesh, whether it is a boundary or an interior node. As a result, operations on the sets are very efficient because the effect on all processors in a dimension can be captured in a single symbolic operation. In addition, PTDs are easily extended to an arbitrary number of dimensions, necessary for describing iteration sets in multiply nested loops as well as sections of multidimensional arrays. Using the symbolic features of PTDs it is also possible to generate code for variable numbers of processors, thereby allowi...
Thal: An Actor System For Efficient And Scalable Concurrent Computing
, 1997
"... Actors are a model of concurrent objects which unify synchronization and data abstraction boundaries. Because they hide details of parallel execution and present an abstract view of the computation, actors provide a promising building block for easy-to-use parallel programming systems. However, the ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Actors are a model of concurrent objects which unify synchronization and data abstraction boundaries. Because they hide details of parallel execution and present an abstract view of the computation, actors provide a promising building block for easy-to-use parallel programming systems. However, the practical success of the concurrent object model requires two conditions be satisfied. Flexible communication abstractions and their efficient implementations are the necessary conditions for the success of actors. This thesis studies how to support communication between actors efficiently. First, we discuss communication patterns commonly arising in many parallel applications in the context of an experimental actor-based language, THAL. The language provides as communication abstractions concurrent call/return communication, delegation, broadcast, and local synchronization constraints. The thesis shows how the abstractions are efficiently implemented on stock-hardware distributed memory mul...
On the Utility of Threads for Data Parallel Programming
"... Threads provide a useful programming model for asynchronous behavior because of their ability to encapsulate units of work that can then be scheduled for execution at runtime, based on the dynamic state of a system. Recently, the threaded model has been applied to the domain of data parallel scienti ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Threads provide a useful programming model for asynchronous behavior because of their ability to encapsulate units of work that can then be scheduled for execution at runtime, based on the dynamic state of a system. Recently, the threaded model has been applied to the domain of data parallel scientific codes, and initial reports indicate that the threaded model can produce performance gains over non-threaded approaches, primarily through the use of overlapping useful computation with communication latency. However, overlapping computation with communication is possible without the benefit of threads if the communication system supports asynchronous primitives, and this comparison has not been made in previous papers. This paper provides a critical look at the utility of lightweight threads as applied to data parallel scientific programming.
Load balancing HPF programs by migrating virtual processors
- IN SECOND INTERNATIONAL WORKSHOP ON HIGH-LEVEL PROGRAMMING MODELS AND SUPPORTIVE ENVIRONMENTS, HIPS '97
, 1997
"... This paper explores the integration of load balancing features in the data parallel language HPF targeting semi-regular applications. We show that the HPF virtual processors are good candidates to be the unit of migration. Then, we compare 3 possible implementations and show that threads provide a g ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
This paper explores the integration of load balancing features in the data parallel language HPF targeting semi-regular applications. We show that the HPF virtual processors are good candidates to be the unit of migration. Then, we compare 3 possible implementations and show that threads provide a good trade-off between efficiency and ease of implementation. We finally describe a preliminary implementation. The experimental results, obtained with the Gaussian elimination with partial pivoting are promising.
Actor Based Parallel VHDL Simulation Using Time Warp
- in Proceedings of the 1996 Workshop on Parallel and Distributed Simulation
, 1996
"... One of the methods used to reduce the time spent simulating VHDL designs is by parallelizing the simulation. In this paper, we describe the implementation of an objectoriented Time Warp simulator for VHDL on an actor based environment. The actor model of computation allows the exploitation of fine g ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
One of the methods used to reduce the time spent simulating VHDL designs is by parallelizing the simulation. In this paper, we describe the implementation of an objectoriented Time Warp simulator for VHDL on an actor based environment. The actor model of computation allows the exploitation of fine grained parallelism in a truly asynchronous manner and allows for the overlap of computation with communication. Some preliminary results obtained by simulating a set of multipliers and some ISCAS benchmark circuits are provided. In addition, the importance of placing processes based on circuit partitioning techniques for improving runtimes and scalability is demonstrated. Results are reported on a Sun SPARCServer 1000 and an Intel Paragon. 1 Introduction The design of a digital VLSI system commonly begins with a description of the system being written in a Hardware Description Language, an example of which is VHDL [1]. Subsequent to verifying the functionality of the description, it is give...
Software-Based Communication Latency Hiding for Commodity Workstation Networks
- IEEE International Conference on Parallel Processing
, 1996
"... A variety of latency hiding techniques has been investigated at the hardware level. However, except multithreading, which may require substantial program structuring effort, other software-based latency hiding methods have not been investigated. In this paper, we consider design alternatives for lat ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
A variety of latency hiding techniques has been investigated at the hardware level. However, except multithreading, which may require substantial program structuring effort, other software-based latency hiding methods have not been investigated. In this paper, we consider design alternatives for latency hiding other than multithreading. Furthermore, we present experimental evidence for the validity of a new technique for software-based communication latency hiding for commodity workstation networks: Up to 80 percent of useful computational power can be squeezed out of a workstation CPU while communicating with TCP/IP via an Ethernet and almost 90 percent while communicating across the Internet. 1 Introduction Computer networks are becoming a primary compute resource for engineers and scientists. Technological advances and the wide-spread availability of these costeffective systems are making them attractive for daily use. Although CPU speed and memory capacity of current workstations ...
Distributed object oriented data structures and algorithms for VLSI
- Parallel Algorithms for Irregularly Structured Problems, volume LNCS 1117
, 1996
"... Abstract. ProperCAD II is a C++ object oriented library supporting actor based parallel program design. The library easily allows the design of data structures with parallel semantics for use in irregular applications. Inheritance mechanisms allow creation of the distributed data structures from sta ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. ProperCAD II is a C++ object oriented library supporting actor based parallel program design. The library easily allows the design of data structures with parallel semantics for use in irregular applications. Inheritance mechanisms allow creation of the distributed data structures from standard ¤¦¥§¥ objects. This paper discusses the use of such distributed data structures in the context of a particular VLSI CAD application, standard cell placement. The library and associated runtime system currently run on a wide range of platforms. 1

