Results 1 - 10
of
52
SoftFLASH: Analyzing the Performance of Clustered Distributed Virtual Shared Memory
- In Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems
, 1996
"... One potentially attractive way to build large-scale shared-memory machines is to use small-scale to medium-scale shared-memory machines as clusters that are interconnected with an off-the-shelf network. To create a shared-memory programming environment across the clusters, it is possible to use a vi ..."
Abstract
-
Cited by 81 (0 self)
- Add to MetaCart
(Show Context)
One potentially attractive way to build large-scale shared-memory machines is to use small-scale to medium-scale shared-memory machines as clusters that are interconnected with an off-the-shelf network. To create a shared-memory programming environment across the clusters, it is possible to use a virtual shared-memory software layer. Because of the low latency and high bandwidth of the interconnect available within each cluster, there are clear advantages in making the clusters as large as possible. The critical question then becomes whether the latency and bandwidth of the top-level network and the software system are sufficient to support the communication demands generated by the clusters. To explore these questions, we have built an aggressive kernel implementation of a virtual shared-memory system using SGI multiprocessors and 100Mbyte/sec HIPPI interconnects. The system obtains speedups on 32 processors (four nodes, eight
COMET: Code Offload by Migrating Execution Transparently
"... In this paper we introduce a runtime system to allow unmodified multi-threaded applications to use multiple machines. The system allows threads to migrate freely between machines depending on the workload. Our prototype, COMET (Code Offload by Migrating Execution Transparently), is a realization of ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
(Show Context)
In this paper we introduce a runtime system to allow unmodified multi-threaded applications to use multiple machines. The system allows threads to migrate freely between machines depending on the workload. Our prototype, COMET (Code Offload by Migrating Execution Transparently), is a realization of this design built on top of the Dalvik Virtual Machine. COMET leverages the underlying memory model of our runtime to implement distributed shared memory (DSM) with as few interactions between machines as possible. Making use of a new VM-synchronization primitive, COMET imposes little restriction on when migration can occur. Additionally, enough information is maintained so one machine may resume computation after a network failure. We target our efforts towards augmenting smartphones or tablets with machines available in the network. We demonstrate the effectiveness of COMET on several real applications available on Google Play. These applications include image editors, turn-based games, a trip planner, and math tools. Utilizing a server-class machine, COMET can offer significant speed-ups on these real applications when run on a modern smartphone. With WiFi and 3G networks, we observe geometric mean speed-ups of 2.88X and 1.27X relative to the Dalvik interpreter across the set of applications with speed-ups as high as 15X on some applications. 1
Strings: A High-Performance Distributed Shared Memory for Symmetrical Multiprocessor Clusters
- in Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing
, 1998
"... This paper describes Strings, a multi-threaded DSM developed by us. The distinguishing feature of Strings is that it incorporates Posix1.c threads multiplexed on kernel light-weight processes for better performance. The kernel can schedule multiple threads across multiple processors using these ligh ..."
Abstract
-
Cited by 21 (11 self)
- Add to MetaCart
(Show Context)
This paper describes Strings, a multi-threaded DSM developed by us. The distinguishing feature of Strings is that it incorporates Posix1.c threads multiplexed on kernel light-weight processes for better performance. The kernel can schedule multiple threads across multiple processors using these lightweight processes. Thus, Strings is designed to exploit data parallelism at the application level and task parallelism at the DSM system level. We show how using multiple kernel threads can improve the performance even in the presence of false sharing, using matrix multiplication as a case-study. We also show the performance results with benchmark programs from the SPLASH-2 suite [17]. Though similar work has been demonstrated with SoftFLASH [18], our implementation is completely in user space and thus more portable. Some other researach has studied the effect of clustering in SMPs suing simulations [19]. We have shown results from runs on an actual network of SMPs
DSM-PM2: A portable implementation platform for multithreaded DSM consistency protocols
- In Proc.ofthe6thIntl.HIPSWorkshop, number 2026 in LNCS
, 2001
"... DSM-PM2 is a platform for designing, implementing and experimenting multithreaded DSM consistency protocols. It provides a generic toolbox which facilitates protocol design and allows for easy experimentation with alternative protocols for a given consistency model. DSM-PM2 is portable across a wide ..."
Abstract
-
Cited by 21 (7 self)
- Add to MetaCart
DSM-PM2 is a platform for designing, implementing and experimenting multithreaded DSM consistency protocols. It provides a generic toolbox which facilitates protocol design and allows for easy experimentation with alternative protocols for a given consistency model. DSM-PM2 is portable across a wide range of clusters. We illustrate its power with figures obtained for different protocols implementing sequential consistency, release consistency and Java consistency, on top of Myrinet, Fast-Ethernet and SCI clusters. 1
A Programming Model for Block-Structured Scientific Calculations on SMP Clusters
- Calculations on SMP Clusters. Ph. D. Dissertation, UCSD
, 1998
"... [None] ..."
(Show Context)
vNUMA: Virtual Shared-Memory Multiprocessors
, 2008
"... Shared memory systems, such as SMP and ccNUMA topologies, simplify programming and administration. On the other hand, systems without hardware support for shared memory, such as clusters of commodity workstations, are commonly used due to cost and flexibility considerations. In this thesis, virtual ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
(Show Context)
Shared memory systems, such as SMP and ccNUMA topologies, simplify programming and administration. On the other hand, systems without hardware support for shared memory, such as clusters of commodity workstations, are commonly used due to cost and flexibility considerations. In this thesis, virtualisation is proposed as a technique that can bridge the gap
Design and Performance Analysis of a Distributed Java Virtual Machine
- IEEE Transactions on Parallel and Distributed Systems
, 2002
"... AbstractÐThis paper introduces DISK, a distributed Java Virtual Machine for networks of heterogenous workstations. Several research issues are addressed. A novelty of the system is its object-based, multiple-writer memory consistency protocol (OMW). The correctness of the protocol and its Java compl ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
(Show Context)
AbstractÐThis paper introduces DISK, a distributed Java Virtual Machine for networks of heterogenous workstations. Several research issues are addressed. A novelty of the system is its object-based, multiple-writer memory consistency protocol (OMW). The correctness of the protocol and its Java compliance is demonstrated by comparing the nonoperational definitions of Release Consistency, the consistency model implemented by OMW, with the Java Virtual Machine memory consistency model (JVMC), as defined in the Java Virtual Machine Specification. An analytical performance model was developed to study and compare the design trade-offs between OMW and the lazy invalidate Release Consistency (LI) protocols as a function of the number of processors, network characteristics, and application types. The DISK system has been implemented and running on a network of 16 Pentium III computers interconnected by a 100Mbps Ethernet network. Experiments performed with two applications: parallel matrix multiplication and traveling salesman problem confirm the analytical model. Index TermsÐObject-oriented distributed shared memory, Java Virtual Machine, performance analysis, memory consistency protocols, consistency models. 1
Distributed Cactus Stacks: Runtime Stack-Sharing Support for Distributed Parallel Programs
, 1998
"... Parallel Programming Systems based on the Distributed Shared Memory technique has been promoted as easy to program, natural and equivalent to multiprocessor systems. However, most programmers find this is not the case. The shared memory in DSM systems do not have the same access and sharing seman ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Parallel Programming Systems based on the Distributed Shared Memory technique has been promoted as easy to program, natural and equivalent to multiprocessor systems. However, most programmers find this is not the case. The shared memory in DSM systems do not have the same access and sharing semantics as shared memory in real multiprocessor systems (shared memory multiprocessors). We present a scheme, which has been implemented as a part of the Chime Parallel Processing system, that provides, true shared-memory multiprocessor semantics in a distributed system. Programs are written as parallel programs with constructs for parallel forloop and parallel compound statement. A runtime system (middleware) provides the above features using a unique multithreaded architecture. In addition to providing the stack sharing support, the runtime system is able to provide nested parallelism, task synchronization, load balancing and fault tolerance. The software is available at http://milan.e...
Preemptive Scheduling for Distributed Systems
, 1998
"... Preemptive scheduling is widespread in operating systems and in parallel processing on symmetric multiprocessors. However, in distributed systems it is practically unheard of. Scheduling in distributed systems is an important issue, and has performance impact on parallel processing, load balancing a ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Preemptive scheduling is widespread in operating systems and in parallel processing on symmetric multiprocessors. However, in distributed systems it is practically unheard of. Scheduling in distributed systems is an important issue, and has performance impact on parallel processing, load balancing and metacomputing. Non-preemptive scheduling can perform well if the task lengths and processor speeds are known in advance and hence job placement is done intelligently Although obtaining optimal schedules is NP-complete, many good heuristics exist. In most practical cases, non-preemptive scheduling leads to poor performance due to excessive idle times or due to a long job getting assigned a slow machine. We show how to use preemptive scheduling in distributed systems. Surprisingly, the benefits outweigh the increased overhead. However the implementation of preemptive scheduling is complicated by the need for process migration. This paper presents preemptive scheduling algorithms, their imp...
Murks - A POSIX Threads Based DSM System
- in: Proceedings of the International Conference on Parallel and Distributed Computing Systems
, 2001
"... The shared memory paradigm provides an easy to use programming model for communicating processes. Its implementation in distributed environments has proofed to be harder than expected. Most distributed shared memory (DSM) systems suffer from either poor performance or they are very complicated to us ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The shared memory paradigm provides an easy to use programming model for communicating processes. Its implementation in distributed environments has proofed to be harder than expected. Most distributed shared memory (DSM) systems suffer from either poor performance or they are very complicated to use. The DSM system Murks, presented in this paper, is the result of the sobering experiences we made by trying to integrate an existing DSM system into a distributed OS.