Results 1 - 10
of
92
On the Design of Chant: A Talking Threads Package
- PROC.SUPERCOMPUTING 94,PP.350-359, WASHINGTON,D.C
, 1994
"... Lightweight threads are becoming increasingly useful in supporting parallelism and asynchronous control structures in applications and language implementations. However, lightweight thread packages traditionally support only shared memory synchronization and communication primitives, limiting their ..."
Abstract
-
Cited by 71 (9 self)
- Add to MetaCart
Lightweight threads are becoming increasingly useful in supporting parallelism and asynchronous control structures in applications and language implementations. However, lightweight thread packages traditionally support only shared memory synchronization and communication primitives, limiting their use in distributed memory environments. We introduce the design of a runtime interface, called Chant, that supports lightweight threads with the capability of communication using both point-to-point and remote service request primitives, built from standard message passing libraries. This is accomplished by extending the POSIX pthreads interface with global thread identifiers, global thread operations, and message passing primitives. This paper introduces the Chant interface and describes the runtime issues in providing an efficient, portable implementation of such an interface. In particular, we present performance results of the initial portion of our runtime system: point-to-point message passing among threads. We examine the issue of thread scheduling in the presence of polling for messages, and measure the overhead incurred when using this interface as opposed to using the underlying communication layer directly. Weshow that our design can accommodate various polling methods, depending on the level of support present in the underlying thread system, and imposes little overhead in point-to-point message passing over the existing communication layer.
Ariadne: Architecture of a Portable Threads system supporting Mobile Processes
- Software-Practice and Experience
, 1996
"... Threads exhibit a simply expressed and powerful form of concurrency, easily exploitable in applications that run on both uni- and multi-processors, shared- and distributed-memory systems. This paper presents the design and implementation of Ariadne: a layered, C-based software architecture for multi ..."
Abstract
-
Cited by 50 (15 self)
- Add to MetaCart
Threads exhibit a simply expressed and powerful form of concurrency, easily exploitable in applications that run on both uni- and multi-processors, shared- and distributed-memory systems. This paper presents the design and implementation of Ariadne: a layered, C-based software architecture for multi-threaded computing on a variety of platforms. Ariadne is a portable user-space threads system that runs on shared- and distributed-memory multiprocessors. It can be used for parallel and distributed applications. Thread-migration is supported at the application level in homogeneous environments (e.g., networks of SPARCs and Sequent Symmetrys, Intel hypercubes). Threads may migrate between processes to access remote data, preserving locality of reference for computations with a dynamic data space. Ariadne can be tuned to specific applications through a customization layer. Support is provided for scheduling via a built-in or application-specific scheduler, and interfacing with any communicat...
Static Cache Simulation and its Applications
, 1994
"... This work takes a fresh look at the simulation of cache memories. It introduces the technique of static cache simulation that statically predicts a large portion of cache references. To efficiently utilize this technique, a method to perform efficient on-the-fly analysis of programs in general is de ..."
Abstract
-
Cited by 41 (13 self)
- Add to MetaCart
This work takes a fresh look at the simulation of cache memories. It introduces the technique of static cache simulation that statically predicts a large portion of cache references. To efficiently utilize this technique, a method to perform efficient on-the-fly analysis of programs in general is developed and proved correct. This method is combined with static cache simulation for a number of applications. The application of fast instruction cache analysis provides a new framework to evaluate instruction cache memories that outperforms even the fastest techniques published. Static cache simulation is shown to address the issue of predicting cache behavior, contrary to the belief that cache memories introduce unpredictability to real-time systems that cannot be efficiently analyzed. Static cache simulation for instruction caches provides a large degree of predictability for real-time systems. In addition, an architectural modification through bit-encoding is introduced that provides fu...
Experience with a Portability Layer for Implementing Parallel Programming Systems
- In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications
, 1996
"... Panda is a virtual machine designed to support portable implementations of parallel programming systems. It provides communication primitives and thread support to higher-level layers (such as a runtime system). We have used Panda to implement four parallel programming systems: Orca, data parallel O ..."
Abstract
-
Cited by 34 (11 self)
- Add to MetaCart
Panda is a virtual machine designed to support portable implementations of parallel programming systems. It provides communication primitives and thread support to higher-level layers (such as a runtime system). We have used Panda to implement four parallel programming systems: Orca, data parallel Orca, PVM, and SR. The paper describes our experiences in implementing these systems using Panda and it evaluates the performance of the Panda-based implementations. 1 Introduction Portability is one of the most important issues in designing parallel software. The portability of parallel applications can be enhanced by using portable programming systems, but this leaves many of the problems to the implementor of such systems. In particular, it is difficult to obtain both portability and efficiency. In our research on the Orca [2] programming system, we use the well-known implementation technique of a virtual machine to achieve portability. We have designed a virtual machine, called Panda [4]...
Monitors and Exceptions: How to implement Java efficiently
- IN ACM 1998 WORKSHOP ON JAVA FOR HIGH-PERFORMANCE NETWORK COMPUTING
, 1998
"... Efficient implementation of monitors and exceptions is crucial for the performance of Java. One implementation of threads showed a factor of 30 difference in run time on some benchmark programs. This article describes an efficient implementation of monitors for Java as used in the CACAO just-in-time ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
Efficient implementation of monitors and exceptions is crucial for the performance of Java. One implementation of threads showed a factor of 30 difference in run time on some benchmark programs. This article describes an efficient implementation of monitors for Java as used in the CACAO just-in-time compiler. With this implementation the thread overhead is less than 40% for typical application programs and can be completely eliminated for some applications. This article also gives the implementation details of the new exception handling scheme in CACAO. The new approach reduces the size of the generated native code by a half and allows null pointers to be checked by hardware. By using these techniques, the CACAO system has become the fastest JavaVM implementation for the Alpha processor.
Space-Efficient Scheduling of Nested Parallelism
- ACM Transactions on Programming Languages and Systems
, 1999
"... This article presents an on-line scheduling algorithm that is provably space e#cient and time e#cient for nested-parallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler spa ..."
Abstract
-
Cited by 28 (5 self)
- Add to MetaCart
This article presents an on-line scheduling algorithm that is provably space e#cient and time e#cient for nested-parallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler space) on p processors. Here, K is a user-adjustable runtime parameter specifying the net amount of memory that a thread may allocate before it is preempted by the scheduler. Adjusting the value of K provides a trade-o# between the running time and the memory requirement of a parallel computation. To allow the scheduler to scale with the number of processors, we also parallelize the scheduler and analyze the space and time bounds of the computation to include scheduling costs. In addition to showing that the scheduling algorithm is space and time e#cient in theory, we demonstrate that it is e#ective in practice. We have implemented a runtime system that uses our algorithm to schedule lightweight parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance
Integrating Coherency and Recoverability in Distributed Systems
- IN PROCEEDINGS OF THE FIRST SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI'94
, 1994
"... We propose a technique for maintaining coherency of a transactional distributed shared memory, used by applications accessing a shared persistent store. Our goal is to improve support for fine-grained distributed data sharing in collaborative design applications, such as CAD systems and software dev ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
We propose a technique for maintaining coherency of a transactional distributed shared memory, used by applications accessing a shared persistent store. Our goal is to improve support for fine-grained distributed data sharing in collaborative design applications, such as CAD systems and software development environments. In contrast, traditional research in distributed shared memory has focused on supporting parallel programs; in this paper, we show how distributed programs can benefit from this shared-memory abstraction as well. Our approach, called log-based coherency, integrates coherency support with a standard mechanism for ensuring recoverability of persistent data. In our system, transaction logs are the basis of both recoverability and coherency. We have prototyped log-based coherency as a set of extensions to RVM [Satyanarayanan et al. 94], a runtime package supporting recoverable virtual memory. Our prototype adds coherency support to RVM in a simple way that does not requir...
An overview of the OPUS language and runtime system
- Institute for Computer
, 1994
"... Wehaverecently introduced a new language, called Opus, which provides a set of Fortran language extensions that allow for integrated support of task and data parallelism. It also provides shared data abstractions (SDAs) as a method for communication and synchronization among these tasks. In this pap ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
Wehaverecently introduced a new language, called Opus, which provides a set of Fortran language extensions that allow for integrated support of task and data parallelism. It also provides shared data abstractions (SDAs) as a method for communication and synchronization among these tasks. In this paper, we rst provide a brief description of the language features and then focus on both the language-dependent and language-independent parts of the runtime system that support the language. The language-independent portion of the runtime system supports lightweight threads across multiple address spaces, and is built upon existing lightweight thread and communication systems. The language-dependent portion of the runtime system supports conditional invocation of SDA methods and distributed SDA argument handling. 1
Distributed Shared-Memory Threads: DSM-Threads
- In Workshop on Run-Time Systems for Parallel Programming
, 1997
"... This paper is, to our knowledge, the first description of a system to support distributed threads on top of POSIX Threads (Pthreads) via distributed virtual shared memory (DSM). The aim of DSMThreads is to provide an easy way for a programmer to migrate from a concurrent programming model with share ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
This paper is, to our knowledge, the first description of a system to support distributed threads on top of POSIX Threads (Pthreads) via distributed virtual shared memory (DSM). The aim of DSMThreads is to provide an easy way for a programmer to migrate from a concurrent programming model with shared memory (Pthreads) to a distributed model with minimal changes of the application code. Thus, a programmer may continue to use the shared-memory algorithms and exploit to processing power of a distributed system without dealing with the more complex (and harder to understand) models of distributed algorithms. This paper discusses design goals, design decisions, and implementation choices of DSM-Threads. As the DSM runtime system is itself implemented as a multi-threaded system over Pthreads on each node and copes without compiler or operating system modifications, several problems arise and their solutions are discussed. Several data consistency models are supported to facilitate ports fro...
A Multi-Threaded Architecture for Prefetching in Object Bases
- IN PROC. OF THE INT. CONF. ON EXTENDING DATABASE TECHNOLOGY
, 1994
"... We propose a generic architectural framework, a multithreaded run-time system for client/server architectures, which facilitates the integration, exchange and extension of various prefetching techniques. To demonstrate the viability of this architecture two prefetching techniques are incorporated: ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
We propose a generic architectural framework, a multithreaded run-time system for client/server architectures, which facilitates the integration, exchange and extension of various prefetching techniques. To demonstrate the viability of this architecture two prefetching techniques are incorporated: a predictor-based technique---which consists of a separate predictor component in the run-time system---and a code-based technique---which provides an explicit prefetch statement at the programming interface. Our quantitative analysis indicates that (static) code-based techniques are a promising alternative to expensive monitoring-based predictors.

