Results 1 - 10
of
26
Orca: A language for parallel programming of distributed systems
- IEEE Transactions on Software Engineering
, 1992
"... Orca is a language for implementing parallel applications on loosely coupled distributed systems. Unlike most languages for distributed programming, it allows processes on different machines to share data. Such data are encapsulated in data-objects, which are instances of user-defined abstract data ..."
Abstract
-
Cited by 307 (43 self)
- Add to MetaCart
Orca is a language for implementing parallel applications on loosely coupled distributed systems. Unlike most languages for distributed programming, it allows processes on different machines to share data. Such data are encapsulated in data-objects, which are instances of user-defined abstract data types. The implementation of Orca takes care of the physical distribution of objects among the local memories of the processors. In particular, an implementation may replicate and/or migrate objects in order to decrease access times to objects and increase parallelism. This paper gives a detailed description of the Orca language design and motivates the design choices. Orca is intended for applications programmers rather than systems programmers. This is reflected in its design goals to provide a simple, easy to use language that is type-secure and provides clean semantics. The paper discusses three example parallel applications in Orca, one of which is described in detail. It also describes one of the existing implementations, which is based on reliable broadcasting. Performance measurements of this system are given for three parallel applications. The measurements show that significant speedups can be obtained for all three applications. Finally, the paper compares Orca with several related languages and systems. 1.
Lazy Release Consistency for Distributed Shared Memory
, 1995
"... A software distributed shared memory (DSM) system allows shared memory parallel programs to execute on networks of workstations. This thesis presents a new class of protocols that has lower communication requirements than previous DSM protocols, and can consequently achieve higher performance. The l ..."
Abstract
-
Cited by 95 (0 self)
- Add to MetaCart
A software distributed shared memory (DSM) system allows shared memory parallel programs to execute on networks of workstations. This thesis presents a new class of protocols that has lower communication requirements than previous DSM protocols, and can consequently achieve higher performance. The lazy release consistent protocols achieve this reduction in communication by piggybacking consistency information on top of existing synchronization transfers. Some of the protocols also improve performance by speculatively moving data. We evaluate the impact of these features by comparing the performance of a software DSM using lazy protocols with that of a DSM using previous eager protocols. We found that seven of our eight applications performed better on the lazy system, and four of the applications showed performance speedups of at least 18%. As part of this comparison, we show that the cost of executing the slightly more complex code of the lazy protocols is far less important than the ...
Performance Evaluation of the Orca Shared Object System
- ACM Transactions on Computer Systems
, 1998
"... Orca is a portable, object-based distributed shared memory system. This paper studies and evaluates the design choices made in the Orca system and compares Orca with other DSMs. The paper gives a quantitative analysis of Orca's coherence protocol (based on write-updates with function shipping), the ..."
Abstract
-
Cited by 63 (42 self)
- Add to MetaCart
Orca is a portable, object-based distributed shared memory system. This paper studies and evaluates the design choices made in the Orca system and compares Orca with other DSMs. The paper gives a quantitative analysis of Orca's coherence protocol (based on write-updates with function shipping), the totally-ordered group communication protocol, the strategy for object placement, and the all-software, user-space architecture. Performance measurements for ten parallel applications illustrate the tradeoffs made in the design of Orca, and also show that essentially the right design decisions have been made. A write-update protocol with function shipping is effective for Orca, especially since it is used in combination with techniques that avoid replicating objects that have a low read/write ratio. The overhead of totally-ordered group communication on application performance is low. The Orca system is able to make near-optimal decisions for object placement and replication. In addition, the...
Relaxing Consistency in Recoverable Distributed Shared Memory
- In Proceedings of the Twenty-Third Annual International Symposium on Fault-Tolerant Computing: Digest of Papers
, 1993
"... Relaxed memory consistency models tolerate increased memory access latency in both hardware and software distributed shared memory systems. In recoverable systems, relaxing consistency has the added benefit of reducing the number of checkpoints needed to avoid rollback propagation. In this paper, we ..."
Abstract
-
Cited by 37 (3 self)
- Add to MetaCart
Relaxed memory consistency models tolerate increased memory access latency in both hardware and software distributed shared memory systems. In recoverable systems, relaxing consistency has the added benefit of reducing the number of checkpoints needed to avoid rollback propagation. In this paper, we introduce new checkpointing algorithms that take advantage of relaxed consistency to reduce the performance overhead of checkpointing. We also introduce a scheme based on lazy relaxed consistency, that reduces both checkpointing overhead and the overhead of avoiding error propagation in systems with error latency. We use multiprocessor address traces to evaluate the relaxed consistency approach to checkpointing with distributed shared memory. 1 Introduction Several parallel architectures use distributed shared memory to avoid the programming complexities of message passing. A distinguishing feature of these architectures is the distribution of memory across many processing nodes connected ...
A taxonomy-based comparison of several distributed shared memory systems
- ACM Operating Systems Review
, 1990
"... Two possible modes of Input/Output (I/O)are "sequential " and "random-access", and there is an extremely strong conceptual link between I/O and communication. Sequential communi-cation, typified in the I/O setting by magnetic tape, is typified in the communication setting by a st ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
Two possible modes of Input/Output (I/O)are "sequential " and "random-access", and there is an extremely strong conceptual link between I/O and communication. Sequential communi-cation, typified in the I/O setting by magnetic tape, is typified in the communication setting by a stream, e.g., a UNIX 1 pipe. Random-access communication, typified in the I/O setting by a drum or disk device, is typified in the communication setting by shared memory. In this paper, we study and survey the extension of the random-access model to distributed computer systems. A Distributed Shared Memory (DSM) is a memory area shared by processes running on computers connected by a network. DSM provides direct system support of the shared memory programming model. When assisted by hardware, it can also provide a low-overhead interprocess communication (IPC) mechanism to software. Shared pages are migrated on demand between the hosts. Since computer network latency is typically much larger than that of a shared bus, caching in DSM is necessary for performance. We use caching and issues such as address space structure and page replacement schemes to define a taxonomy. Based on the taxonomy we examine three DSM efforts in detail, namely: IVY, Clouds and MemNet.
A Multi-Level WDM Access Protocol for an Optically Interconnected Multiprocessor System
- IEEE/OSA Journal of Lightwave Technology
, 1999
"... Scalable, hierarchical, all-optical WDM networks for processor interconnection in multiprocessor systems have been recently considered. The principal objective of this paper is to introduce an access protocol for this type of network which supports a distributed shared memory(DSM) environment. The o ..."
Abstract
-
Cited by 33 (14 self)
- Add to MetaCart
Scalable, hierarchical, all-optical WDM networks for processor interconnection in multiprocessor systems have been recently considered. The principal objective of this paper is to introduce an access protocol for this type of network which supports a distributed shared memory(DSM) environment. The objectives of the protocol are reduced averagelatency per packet, support of broadcast/multicast, collisionless communication, and exploitation of inherent DSM traffic characteristics. The protocol is based on a hybrid approach that combines reservation access and pre-allocated reception channels for a WDM system. The proposed approach trades maximum capacity for reduced communication latency to improve system response. The performance of the protocol is analyzed through semi-markov analytic and simulation models with varying system parameters such as number of nodes and channels. The performance of the new protocol is compared to a TDM-based protocol and their relative merits are examined. ...
Performance evaluation of the orca shared-object system
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1998
"... Orca is a portable, object-based distributed shared memory (DSM) system. This article studies and evaluates the design choices made in the Orca system and compares Orca with other DSMs. The article gives a quantitative analysis of Orca’s coherence protocol (based on write-updates with function shipp ..."
Abstract
-
Cited by 32 (4 self)
- Add to MetaCart
Orca is a portable, object-based distributed shared memory (DSM) system. This article studies and evaluates the design choices made in the Orca system and compares Orca with other DSMs. The article gives a quantitative analysis of Orca’s coherence protocol (based on write-updates with function shipping), the totally ordered group communication protocol, the strategy for object placement, and the all-software, user-space architecture. Performance measurements for 10 parallel applications illustrate the trade-offs made in the design of Orca and show that essentially the right design decisions have been made. A write-update protocol with function shipping is effective for Orca, especially since it is used in combination with techniques that avoid replicating objects that have a low read/write ratio. The overhead of totally ordered group communication on application performance is low. The Orca system is able to make near-optimal decisions for object placement and replication. In addition, the article compares the performance of Orca with that of a page-based DSM (TreadMarks) and another object-based DSM (CRL). It also analyzes the communication overhead of the DSMs for several applications. All performance measurements are done on a 32-node Pentium Pro cluster with Myrinet and Fast Ethernet networks. The results show that the Orca programs
Media Access Protocols for WDM Networks with On-Line Scheduling
- IEEE/OSA Journal of Lightwave Technology
, 1999
"... This paper studies media access protocols which support collision-free broadcast/multi-cast communication for optically connected star-coupled systems with Wavelength Division Multiple Access channels. An early hybrid access protocol, consisting of reservation and of receiver channels pre-allocation ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
This paper studies media access protocols which support collision-free broadcast/multi-cast communication for optically connected star-coupled systems with Wavelength Division Multiple Access channels. An early hybrid access protocol, consisting of reservation and of receiver channels pre-allocation, allows reservation for exactly one WDM channel per node during the reservation phase. We extend this protocol by allowing reservations on multiple channels and applying scheduling algorithms to improve network performance. Existing scheduling algorithms for similar reservation problems, although providing optimal scheduling for network utilization, have unacceptable computational cost and high implementation complexity. We propose two on-line scheduling algorithms which run in linear time and are simple, making them amenable for hardware implementation. Performance of the protocol using scheduling with varying system parameters is evaluated through discrete-event simulation under both uniform and non-uniform traffic patterns. Our simulation results show that our approach achieves lower average packet latency and higher network utilization compared to the early hybrid access protocol with single channel, especially in the client-server environment.
MERMERA: Non-Coherent Distributed Shared Memory for Parallel Computing
, 1993
"... The proliferation of inexpensive workstations and networks has prompted several researchers to use such distributed systems for parallel computing. Attempts have been made to offer a shared-memory programming model on such distributed memory computers. Most systems provide a shared-memory that is co ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
The proliferation of inexpensive workstations and networks has prompted several researchers to use such distributed systems for parallel computing. Attempts have been made to offer a shared-memory programming model on such distributed memory computers. Most systems provide a shared-memory that is coherent in that all processes that use it agree on the order of all memory events. This dissertation explores the possibility of a significant improvement in the performance of some applications when they use non-coherent memory. First, a new formal model to describe existing non-coherent memories is developed. I use this model to prove that certain problems can be solved using asynchronous iterative algorithms on shared-memory in which the coherence constraints are substantially relaxed. In the course of the development of the model I discovered a new type of non-coherent behavior called Local Consistency. Second,
Virtual Shared Memory: A Survey of Techniques and Systems
, 1992
"... Shared memory abstraction on distributed memory hardware has become very popular recently. The abstraction can be provided at various levels in the architecture e.g. hardware, software, employing special mechanisms to maintain coherence of data. In this paper we present a survey of basic techniques ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Shared memory abstraction on distributed memory hardware has become very popular recently. The abstraction can be provided at various levels in the architecture e.g. hardware, software, employing special mechanisms to maintain coherence of data. In this paper we present a survey of basic techniques and review a large number of architectures that provide such an abstraction. We also propose new terminology which is more consistent and orderly as compared with the existing use of terminology for such architectures. 1 Introduction Virtual Shared Memory (VSM) in its most general sense refers to a provision of a shared address space on distributed memory hardware. Such architectures contain no physically shared memory. Instead the distributed local memories collectively provide a virtual address space shared by all the processors. VSM combines the benefits of the ease of programming found in shared-memory multiprocessors with the scalability of message-passing multiprocessors. The implemen...

