Results 1 - 10
of
34
Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations (Extended Abstract)
"... This paper reports on the architecture and design of Starfish, an environment for executing dynamic (and static) MPI-2 programs on a cluster of workstations. Starfish is unique in being efficient, faulttolerant, highly available, and dynamic as a system internally, and in supporting fault-tolerance ..."
Abstract
-
Cited by 84 (6 self)
- Add to MetaCart
This paper reports on the architecture and design of Starfish, an environment for executing dynamic (and static) MPI-2 programs on a cluster of workstations. Starfish is unique in being efficient, faulttolerant, highly available, and dynamic as a system internally, and in supporting fault-tolerance and dynamicity for its application programs as well. Starfish achieves these goals by combining group communication technology with checkpoint/restart, and uses a novel architecture that is both flexible and portable and keeps group communication outside the critical data path, for maximum performance.
JESSICA2: A Distributed Java Virtual Machine with Transparent Thread Migration Support
- In IEEE Fourth International Conference on Cluster Computing
, 2002
"... A distributed Java Virtual Machine (DJVM) spanning multiple cluster nodes can provide a true parallel execution environment for multi-threaded Java applications. Most existing DJVMs suffer from the slow Java execution in interpretive mode and thus may not be efficient enough for solving computation- ..."
Abstract
-
Cited by 39 (6 self)
- Add to MetaCart
A distributed Java Virtual Machine (DJVM) spanning multiple cluster nodes can provide a true parallel execution environment for multi-threaded Java applications. Most existing DJVMs suffer from the slow Java execution in interpretive mode and thus may not be efficient enough for solving computation-intensive problems. We present JESSICA2, a new DJVM running in JIT compilation mode that can execute multi-threaded Java applications transparently on clusters. JESSICA2 provides a single system image (SSI) illusion to Java applications via an embedded global object space (GOS) layer. It implements a cluster-aware Java execution engine that supports transparent Java thread migration for achieving dynamic load balancing. We discuss the issues of supporting transparent Java thread migration in a JIT compilation environment and propose several lightweight solutions. An adaptive migrating-home protocol used in the implementation of the GOS is introduced. The system has been implemented on x86-based Linux clusters, and significant performance improvements over the previous JESSICA system have been observed.
Transparent Adaptive Parallelism on NOWs using OpenMP
, 1999
"... We present a system that allows OpenMP programs to execute on a network of workstations with a variable number of nodes. The ability to adapt to a variable number of nodes allows a program to take advantage of additional nodes that become available after it starts execution, or to gracefully scale d ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
We present a system that allows OpenMP programs to execute on a network of workstations with a variable number of nodes. The ability to adapt to a variable number of nodes allows a program to take advantage of additional nodes that become available after it starts execution, or to gracefully scale down when the number of available nodes is reduced. We demonstrate that the cost of adaptation is modest; the system allows a program to adapt at a moderate rate without much performance loss. Two ideas underlie the efficiency of our design. First, we recognize that OpenMP programs exhibit convenient adaptation points during their execution, points at which the cost of adaptation can be much reduced. Second, by allowing a process a certain grace period before it must leave a node, we insure that most adaptations can occur at these adaptation points, and thus at low cost. Migration of a process, a much more expensive method for providing adaptivity, is used only as a back-up solution, when the...
DSM-PM2: A portable implementation platform for multithreaded DSM consistency protocols
- In Proc.ofthe6thIntl.HIPSWorkshop, number 2026 in LNCS
, 2001
"... DSM-PM2 is a platform for designing, implementing and experimenting multithreaded DSM consistency protocols. It provides a generic toolbox which facilitates protocol design and allows for easy experimentation with alternative protocols for a given consistency model. DSM-PM2 is portable across a wide ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
DSM-PM2 is a platform for designing, implementing and experimenting multithreaded DSM consistency protocols. It provides a generic toolbox which facilitates protocol design and allows for easy experimentation with alternative protocols for a given consistency model. DSM-PM2 is portable across a wide range of clusters. We illustrate its power with figures obtained for different protocols implementing sequential consistency, release consistency and Java consistency, on top of Myrinet, Fast-Ethernet and SCI clusters. 1
JESSICA: Java-Enabled Single-System-Image Computing Architecture
, 2000
"... JESSICA stands for "Java-Enabled Single-System-Image Computing Architecture", a middleware that runs on top of the standard UNIX operating system to support parallel execution of multi-threaded Java applications in a cluster of computers. JESSICA hides the physical boundaries between machines and ma ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
JESSICA stands for "Java-Enabled Single-System-Image Computing Architecture", a middleware that runs on top of the standard UNIX operating system to support parallel execution of multi-threaded Java applications in a cluster of computers. JESSICA hides the physical boundaries between machines and makes the cluster appear as a single computer to applications--a single-system-image. JESSICA supports preemptive thread migration which allows a thread to freely move between machines during its execution, and global object sharing through the help of a distributed shared-memory subsystem. JESSICA implements location-transparency through a message-redirection mechanism. The result is a parallel execution environment where threads are automatically redistributed across the cluster for achieving the maximal possible parallelism. A JESSICA prototype that runs on a Linux cluster has been implemented and considerable speedups have been obtained for all the experimental applications tested.
Compile/Run-time Support for Thread Migration
- In Proceedings of the 16th International Parallel and Distributed Processing Symposium, Fort Lauderdale
, 2002
"... This paper describes a generic mechanism to migrate threads in heterogeneous distributed environments. To maintain high portability and flexibility, thread migration is implemented at language level. At compile-time, a preprocessor scans the C and C++ programs to build thread state, detects possible ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
This paper describes a generic mechanism to migrate threads in heterogeneous distributed environments. To maintain high portability and flexibility, thread migration is implemented at language level. At compile-time, a preprocessor scans the C and C++ programs to build thread state, detects possible thread migration points, and transforms the source code accordingly. Run-time support helps migrate threads physically. Since the physical thread state is transformed into a logical form, and pointers and dynamically allocated memory in heap are supported, the proposed solution places no restriction on thread types and migrationenabled systems. We implemented this approach in Strings: a multithreaded software distributed shared memory system. Some microbenchmarks and performance measurements on SPLASH-2 suite are reported.
Efficient Fine-Grain Thread Migration with Active Threads
- In Proceedings of the 12th International Parallel Processing Symposium
, 1998
"... Thread migration is established as a mechanism for achieving dynamic load sharing. However, fine-grained migration has not been used due to the high thread and messaging overheads. This paper describes a fine-grained thread migration system whose extensible event mechanism permits an efficient inter ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Thread migration is established as a mechanism for achieving dynamic load sharing. However, fine-grained migration has not been used due to the high thread and messaging overheads. This paper describes a fine-grained thread migration system whose extensible event mechanism permits an efficient interface between threads and communications without compromising the modularity and performance of either. Migration is supported by user level primitives based on which applications may implement different migration policies. The system is portable and can be used directly or serve as a compilation target for parallel languages. The system runs on a cluster of SMPs and observed performance is orders of magnitude better than other reported measurements. 1 Introduction In recent years, the multi-threaded programming model has grown increasingly popular, propelled in part by the availability of SMPs. Programming languages make use of threads both for expressiveness (particularly in GUI) and to e...
On Improving Thread Migration: Safety and Performance
- In Proceedings: 9th International Conference on High Performance Computing 2002, volume 2552 of LNCS
, 2002
"... Application-level migration schemes have been paid more attention recently because of their great potential for heterogeneous migration. ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Application-level migration schemes have been paid more attention recently because of their great potential for heterogeneous migration.
Generic Distributed Shared Memory: the DSM-PM2 Approach
, 2000
"... This paper describes DSM-PM2, a generic, multi-protocol distributed shared memory library built for PM2, a multithreaded runtime system with preemptive thread migration. DSM-PM2 allows threads running on dierent nodes to communicate via a virtually shared address space. DSM-PM2 supports multiple ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
This paper describes DSM-PM2, a generic, multi-protocol distributed shared memory library built for PM2, a multithreaded runtime system with preemptive thread migration. DSM-PM2 allows threads running on dierent nodes to communicate via a virtually shared address space. DSM-PM2 supports multiple consistency models, which may coexist within the same application. For a given model, the user can select among several alternative protocols, based for instance on page migration, thread migration, or an adaptative combination of the two approaches. Moreover, new consistency protocols can be easily added using the available library routines. DSM-PM2 is available on top of several UNIX systems and can use a large variety of network protocols (BIP, SCI, VIA, MPI, TCP, etc.). We report performance gures for three platforms using dierent network protocols: SISCI/SCI, BIP/Myrinet and TCP/Myrinet. 1 Introduction Distributed Shared Memory (DSM) libraries have been available for a dozen y...
Dynamic data replication: an approach to providing fault-tolerant shared memory clusters
- In Proceedings of the Ninth Annual Symposium on High Performance Computer Architecture
, 2003
"... A challenging issue in today’s server systems is to transparently deal with failures and application-imposed requirements for continuous operation. In this paper we address this problem in shared virtual memory (SVM) clusters at the programming abstraction layer. We design extensions to an existing ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
A challenging issue in today’s server systems is to transparently deal with failures and application-imposed requirements for continuous operation. In this paper we address this problem in shared virtual memory (SVM) clusters at the programming abstraction layer. We design extensions to an existing SVM protocol that has been tuned for lowlatency, high-bandwidth interconnects and SMP nodes and we achieve reliability through dynamic replication of application shared data and protocol information. Our extensions allow us to tolerate single (or multiple, but not simultaneous) node failures. We implement our extensions on a stateof-the-art cluster and we evaluate the common, failure-free case. We find that, although the complexity of our protocol is substantially higher than its failure-free counterpart, by taking advantage of architectural features of modern systems our approach imposes low overhead and can be employed for transparently dealing with system failures. 1.

