Results 1 - 10
of
19
Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems
"... We describe computation migration, a new technique that is based on compile-time program transformations, for accessing remote data in a distributed-memory parallel system. In contrast with RPC-style access, where the access is performed remotely, and with data migration, where the data is moved so ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
We describe computation migration, a new technique that is based on compile-time program transformations, for accessing remote data in a distributed-memory parallel system. In contrast with RPC-style access, where the access is performed remotely, and with data migration, where the data is moved so that it is local, computation migration moves part of the current thread to the processor where the data resides. The access is performed at the remote processor, and the migrated thread portion continues to run on that same processor; this makes subsequent accesses in the thread portion local. We describe an implementation of computation migration that consists of two parts: an implementation that migrates single activation frames, and a high-level language annotation that allows a programmer to express when migration is desired. We performed experiments using two applications; these experiments demonstrate that computation migration is a valuable alternative to RPC and data migration.
Lazy Updates for Distributed Search Structures
- In SIGMOD
, 1993
"... Very large database systems require distributed storage, which means that they need distributed search structures for fast and efficient access to the data. In this paper, we present an approach to maintaining distributed data structures that uses lazy updates, which take advantage of the semantics ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
Very large database systems require distributed storage, which means that they need distributed search structures for fast and efficient access to the data. In this paper, we present an approach to maintaining distributed data structures that uses lazy updates, which take advantage of the semantics of the search structure operations to allow for scalable and low-overhead replication. Lazy updates can be used to design distributed search structures that support very high levels of concurrency. The alternatives to lazy update algorithms (vigorous updates) use synchronization to ensure consistency. Hence, lazy update algorithms are a distributed analogue of shared-memory lock-free search structure algorithms. Since lazy updates avoid the use of synchronization, they are much easier to implement than vigorous update algorithms. We demonstrate the application of lazy updates to the dB-tree, which is a distributed B + tree that replicates its interior nodes for highly parallel access. We d...
PRELUDE: A System for Portable Parallel Software
, 1991
"... In this paper we describe PRELUDE, a programming language and accompanying system support for writing portable MIMD parallel programs. PRELUDE supports a methodology for designing and orga. nizing parallel programs that makes them easier to tune for particular architectures and to port to new archit ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
In this paper we describe PRELUDE, a programming language and accompanying system support for writing portable MIMD parallel programs. PRELUDE supports a methodology for designing and orga. nizing parallel programs that makes them easier to tune for particular architectures and to port to new architectures. It builds on earlier work on Emerald, Amber, and vaxious Fortran extensions to allow the programmer to divide programs into architecture-dependent and architecture-independent parts, and then to change the architecture-dependent parts to port the program to a new machine or to tune its performance on a single machine. The architecture-dependent parts of a program are specified by annotations that describe the mapping of a program onto a machine. PRELUDE provides a variety of mapping mechanisms similar to those in other systems, including remote procedure call, object migration, and data replication and partitioning. In addition, PRELUDE includes novel migration mechanisms for computations based on a form of continuation passing. The implementation of object migration in PRELUDE uses a novel approach based on fixup blocks that is more efficient than previous approaches, and amortizes the cost of each migration so that the cost per migration drops as the frequency of mi- grations increases.
An Overview of Mermera: A System and Formalism for Non-coherent Distributed Parallel Memory
- IN PROC. 26TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, MAUI, HAWAII
, 1993
"... Several non-coherent memories have been proposed to address the problem of poor scalability of traditional coherent shared memory systems. However, such memories make programming much more difficult than coherent memory. We give an overview of Mermera a system that gives the programmer the choice of ..."
Abstract
-
Cited by 15 (5 self)
- Add to MetaCart
Several non-coherent memories have been proposed to address the problem of poor scalability of traditional coherent shared memory systems. However, such memories make programming much more difficult than coherent memory. We give an overview of Mermera a system that gives the programmer the choice of coherent and non-coherent behavior in the same program. We sketch a formal model that describes Mermera's non-coherent behavior. This model helps us identify a new non-coherent behavior that we call Local Consistency. We also report our measurements from a pilot implementation on a BBN Butterfly which show the response time of pipelined-RAM operations to be an order of magnitude faster than that of coherent operations. Thus, we begin to illuminate a new trade-off in which the programmer can significantly improve shared memory performance at the cost of tolerating varying degrees of non-coherence.
MERMERA: Non-Coherent Distributed Shared Memory for Parallel Computing
, 1993
"... The proliferation of inexpensive workstations and networks has prompted several researchers to use such distributed systems for parallel computing. Attempts have been made to offer a shared-memory programming model on such distributed memory computers. Most systems provide a shared-memory that is co ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
The proliferation of inexpensive workstations and networks has prompted several researchers to use such distributed systems for parallel computing. Attempts have been made to offer a shared-memory programming model on such distributed memory computers. Most systems provide a shared-memory that is coherent in that all processes that use it agree on the order of all memory events. This dissertation explores the possibility of a significant improvement in the performance of some applications when they use non-coherent memory. First, a new formal model to describe existing non-coherent memories is developed. I use this model to prove that certain problems can be solved using asynchronous iterative algorithms on shared-memory in which the coherence constraints are substantially relaxed. In the course of the development of the model I discovered a new type of non-coherent behavior called Local Consistency. Second,
Multipol: A Distributed Data Structure Library
- in Fifth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming
, 1995
"... Applications with dynamic data structures, unpredictable computational costs, and irregular data access patterns require substantial effort to parallelize. Much of their programming complexity comes from the implementation of distributed data structures. We describe a library of such data structures ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Applications with dynamic data structures, unpredictable computational costs, and irregular data access patterns require substantial effort to parallelize. Much of their programming complexity comes from the implementation of distributed data structures. We describe a library of such data structures, Multipol, which includes parallel versions of classic data structures such as trees, sets, lists, graphs, and queues. The library is built on a portable runtime layer that provides basic communication, synchronization, and caching. The data structures address the classic trade-off between locality and load balance through a combination of replication, partitioning, and dynamic caching. To tolerate remote communication latencies, some of the operations are split into a separate initiation and completion phase, allowing for computation and communication overlap at the library interface level. This leads to a form of relaxed consistency semantics for the data types. In this paper we give an o...
Distributed data structures and algorithms for Gröbner basis computation
- Lisp and Symbolic Computation
, 1994
"... We present the design and implementation of a parallel algorithm for computing Gröbner bases on distributed memory multiprocessors. The parallel algorithm is irregular both in space and time: the data structures are dynamic pointer-based structures and the computations on the structures have unpre ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
We present the design and implementation of a parallel algorithm for computing Gröbner bases on distributed memory multiprocessors. The parallel algorithm is irregular both in space and time: the data structures are dynamic pointer-based structures and the computations on the structures have unpredictable duration. The algorithm is presented as a series of refinements on a transition rule program, in which computation proceeds by nondeterministic invocations of guarded commands. Two key data structures, a set and a priority queue, are distributed across processors in the parallel algorithm. The data structures are designed for high throughput and latency tolerance, as appropriate for distributed memory machines. The programming style represents a compromise between shared-memory and message-passing models. The distributed nature of the data structures shows through their interface in that the semantics are weaker than with shared atomic objects, but they still provide a shared abstraction that can be used for reasoning about program correctness. In the data structure design there is a classic trade-off between locality and load balance. We argue that this is best solved by designing scheduling structures in tandem with the state data structures, since the decision to replicate or partition state affects the overhead of dynamically moving tasks.
Using Abstraction in Explicitly Parallel Programs
- Dept. of Electrical Engineering and Computer Science, MIT
, 1990
"... ion in Explicitly Parallel Programs by Katherine Anne Yelick c fl Massachusetts Institute of Technology, 1990 This report is a revised version of the author's thesis, which was submitted to the Department of Electrical Engineering and Computer Science on December 31, 1990 in partial fulfillment of ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
ion in Explicitly Parallel Programs by Katherine Anne Yelick c fl Massachusetts Institute of Technology, 1990 This report is a revised version of the author's thesis, which was submitted to the Department of Electrical Engineering and Computer Science on December 31, 1990 in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the Massachusetts Institute of Technology. The thesis was supervised by John V. Guttag. The author's current address is the Computer Science Division, University of California, Berkeley, CA 94720. 2 Abstract It is well-known that writing parallel programs that are both fast and correct is significantly harder than writing sequential ones. In this thesis we introduce a transition-based approach to the design and implementation of parallel programs. This approach is aimed at applications whose complex data and control structures make them hard to parallelize by conventional means. It is based on a programming model with explicit pa...
A Distributed, Replicated, Data-Balanced Search Structure
- International Journal of High Speed Computing
, 1994
"... Many concurrent dictionary data structures have been proposed, but usually in the context of shared memory multiprocessors. In this paper, we present an algorithm for a concurrent distributed B-tree that can be implemented on message passing computer systems. Our distributed B-tree (the dB-tree) ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Many concurrent dictionary data structures have been proposed, but usually in the context of shared memory multiprocessors. In this paper, we present an algorithm for a concurrent distributed B-tree that can be implemented on message passing computer systems. Our distributed B-tree (the dB-tree) replicates the interior nodes in order to improve parallelism and reduce message passing. The dB-tree stores some redundant information in its nodes to permit the use of lazy updates to maintain replica coherency. We show how the dB-tree algorithm can be used to build an efficient implementation of a highly parallel, data-balanced distributed dictionary, the dE-tree. Keywords: Concurrent dictionary data structures, Message passing multiprocessor systems, Balanced search trees, B-link trees, Replica coherency. 1. Introduction. We introduce a new balanced search tree algorithm for distributed memory architectures. The search tree uses the B-link tree [27] as a base, and distributes ow...
An Implementation of Mermera: A Shared Memory System That Mixes . . .
, 1993
"... Coherent shared memory is a convenient, but inefficient, method of inter-process communication for parallel programs. By contrast, message passing can be less convenient, but more efficient. To get the benefits ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Coherent shared memory is a convenient, but inefficient, method of inter-process communication for parallel programs. By contrast, message passing can be less convenient, but more efficient. To get the benefits

