Results 1 -
5 of
5
Software Versus Hardware Shared-Memory Implementation: A Case Study
- In Proceedings of the 21st Annual International Symposium on Computer Architecture
, 1994
"... We compare the performance of software-supported shared memory on a general-purpose network to hardware-supported shared memory on a dedicated interconnect. Up to eight processors, our results are based on the execution of a set of application programs on a SGI 4D/480 multiprocessor and on TreadMark ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
We compare the performance of software-supported shared memory on a general-purpose network to hardware-supported shared memory on a dedicated interconnect. Up to eight processors, our results are based on the execution of a set of application programs on a SGI 4D/480 multiprocessor and on TreadMarks, a distributed shared memory system that runs on a Fore ATM LAN of DECstation-5000/240s. Since the DECstation and the 4D/480 use the same processor, primary cache, and compiler, the shared-memory implementation is the principal di erence between the systems. Our results show that TreadMarks performs comparably to the 4D/480 for applications with moderate amounts of synchronization, but the di erence in performance grows as the synchronization frequency increases. For applications that require a large amount of memory bandwidth, TreadMarks can perform better than the SGI 4D/480. Beyond eight processors, our results are based on execution-driven simulation. Speci cally, we compare a software implementation on a general-purpose network of uniprocessor nodes, a hardware implementation using a directory-based protocol on a dedicated interconnect, and a combined implementation using software to provide shared memory between multiprocessor nodes with hardware implementing shared memory within a node. For the modest size of the problems that we can simulate, the hardware implementation scales well and the software implementation scales poorly. The combined approach delivers performance close to that of the hardware implementation for applications with small to moderate synchronization rates and good locality. Reductions in communi-
Software Cache Coherence for Large Scale Multiprocessors
- In Proceedings of the First International Symposium on High Performance Computer Architecture
, 1994
"... Shared memory is an appealing abstraction for parallel programming. It must be implemented with caches in order to perform well, however, and caches require a coherence mechanism to ensure that processors reference current data. Hardware coherence mechanisms for large-scale machines are complex and ..."
Abstract
-
Cited by 23 (9 self)
- Add to MetaCart
Shared memory is an appealing abstraction for parallel programming. It must be implemented with caches in order to perform well, however, and caches require a coherence mechanism to ensure that processors reference current data. Hardware coherence mechanisms for large-scale machines are complex and costly, but existing software mechanisms for message-passing machines have not provided a performance-competitive solution. We claim that an intermediate hardware option---memory-mapped network interfaces that support a global physical address space---can provide most of the performance benefits of hardware cache coherence. We present a software coherence protocol that runs on this class of machines and greatly narrows the performance gap between hardware and software coherence. We compare the performance of the protocol to that of existing software and hardware alternatives and evaluate the tradeoffs among various cache-write policies. We also observe that simple program changes can greatly...
Multiprocessor Cache Coherence Based on Virtual Memory Support
, 1995
"... : Virtual memory based cache coherence is a mechanism that relies only on hardware that already exists on the microprocessors of a shared memory multiprocessor system, yet dynamically detects and resolves potential cache inconsistencies using virtualmemory techniques. The key feature of the approac ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
: Virtual memory based cache coherence is a mechanism that relies only on hardware that already exists on the microprocessors of a shared memory multiprocessor system, yet dynamically detects and resolves potential cache inconsistencies using virtualmemory techniques. The key feature of the approach is that the virtual memory translation hardware on each processor is used to detect shared accesses that could lead to memory incoherencies, and VM page fault handlers execute the appropriate actions to maintain cache coherence. VM-based cache coherence basically trades off design simplicity against increased software overheads. The work presented in this paper evaluates this tradeoff. We show that VM-based cache coherence performs well for scientific applications that require significant aggregate memory bandwidth. ffl Keywords: shared memory, multiprocessors, cache coherence, virtual memory, performance evaluation. ffl Biographies: Karin Petersen is a Member of the Research Staff at Xe...
High performance software coherence for current and future architectures
- Journal of Parallel and Distributed Computing
, 1995
"... Shared memory provides an attractive and intuitive pro-gramming model for large-scale parallel computing, but re-quires a coherence mechanism to allow caching for performance while ensuring that processors do not use stale data in their computation. Implementation options range from distributed shar ..."
Abstract
-
Cited by 15 (9 self)
- Add to MetaCart
Shared memory provides an attractive and intuitive pro-gramming model for large-scale parallel computing, but re-quires a coherence mechanism to allow caching for performance while ensuring that processors do not use stale data in their computation. Implementation options range from distributed shared memory emulations on networks of workstations to tightly coupled fully cache-coherent distributed shared memory multiprocessors. Previous work indicates that performance var-ies dramatically from one end of this spectrum to the other. Hardware cache coherence is fast, but also costly and time-consuming to design and implement, while DSM systems pro-vide acceptable performance on only a limit class of applica-tions. We claim that an intermediate hardware option-memory-mapped network interfaces that support a global physical address space, without cache coherence-can provide most of the performance benefits of fully cache-coherent hard-ware, at a fraction of the cost. To support this claim we present a software coherence protocol that runs on this class of ma-chines, and use simulation to conduct a performance study. We look at both programming and architectural issues in the context of software and hardware coherence protocols. Our results suggest that software coherence on NCC-NUMA ma-chines in a more cost-effective approach to large-scale shared-memory multiprocessing than either pure distributed shared memory or hardware cache coherence. a 1995 Academic press, I~C. 1.
Shared Regions: A strategy for efficient cache management in shared-memory multiprocessors
, 1995
"... Dealing effectively with memory access latency is one of the key challenges in the design of shared-memory multiprocessors. Processor caches offer a way to reduce this latency but also give rise to the problem of cache coherence. Existing software solutions to the cache coherence problem are usually ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Dealing effectively with memory access latency is one of the key challenges in the design of shared-memory multiprocessors. Processor caches offer a way to reduce this latency but also give rise to the problem of cache coherence. Existing software solutions to the cache coherence problem are usually inefficient, while hardware solutions are typically complex and expensive to implement. In this thesis, we present a new class of software cache coherence strategies based upon the integration of a program-level abstraction for shared data, called Shared Regions (SR), with run-time cache management. In this approach, user's define a set of shared regions, the unit of sharing in the application. Data is managed at the granularity of shared regions, and coherence enforcement decisions are made dynamically through software. Within the shared regions framework, we present two types of coherence solutions. In the first, called SR-Prog, program annotations mark the beginning and end of a series o...

