Results 1 -
6 of
6
Memory Management for Large-Scale NUMA Multiprocessors," submitted for publication
, 1989
"... Large-scale shared-memory multiprocessors such as the BBN Butterfly and IBM RP3 introduce a new level in the memory hierarchy: multiple physical memories with different memory access times. An operating system for these NUMA (NonUniform Memory Access) multiprocessors should provide traditional virtu ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
(Show Context)
Large-scale shared-memory multiprocessors such as the BBN Butterfly and IBM RP3 introduce a new level in the memory hierarchy: multiple physical memories with different memory access times. An operating system for these NUMA (NonUniform Memory Access) multiprocessors should provide traditional virtual memory management, facilitate dynamic and widespread memory sharing, and minimize the apparent disparity between local and nonlocal memory. In addition, the implementation must be scalable to configurations with hundreds or thousands of processors. This paper describes memory management in the Psyche multiprocessor operating system, under development at the University of Rochester. The Psyche kernel manages a multi-level memory hierarchy consisting of local memory, nonlocal memory, and backing store. Local memory stores private data and serves as a cache for shared data; nonlocal memory stores shared data and serves as a disk cache. The system structure isolates the policies and mechanisms that manage different layers in the memory hierarchy, so that customized data structures and policies can be constructed for each layer. Local memory management policies
UNified Instruction/Translation/Data (UNITD) Coherence: One Protocol to Rule Them All
- In Proc. Fifteenth Int’l Symposium on HighPerformance Computer Architecture
, 2010
"... We propose UNITD, a unified hardware coherence framework that integrates translation coherence into the existing cache coherence protocol. In UNITD coherence protocols, the TLBs participate in the cache coherence protocol just like the instruction and data caches, without requiring any changes to th ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
(Show Context)
We propose UNITD, a unified hardware coherence framework that integrates translation coherence into the existing cache coherence protocol. In UNITD coherence protocols, the TLBs participate in the cache coherence protocol just like the instruction and data caches, without requiring any changes to the existing coherence protocol. UNITD eliminates the need for the software TLB shootdown routine, a procedure known to be performance costly and non-scalable. We evaluate snooping and directory UNITD coherence protocols on multicore processors with 2-16 cores, and we demonstrate that UNITD reduces the performance penalty associated with TLB coherence to almost zero. 1.
Sentry: Light-Weight Auxiliary Memory Access Control ∗ ABSTRACT
"... Light-weight, flexible access control, which allows software to regulate reads and writes to any granularity of memory region, can help improve the reliability of today’s multi-module multiprogrammer applications, as well as the efficiency of software debugging tools. Unfortunately, access control i ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Light-weight, flexible access control, which allows software to regulate reads and writes to any granularity of memory region, can help improve the reliability of today’s multi-module multiprogrammer applications, as well as the efficiency of software debugging tools. Unfortunately, access control in today’s processors is tied to support for virtual memory, making its use both heavy weight and coarse grain. In this paper, we propose Sentry, an auxiliary level of virtual memory tagging that is entirely subordinate to existing virtual memory-based protection mechanisms and can be manipulated at the user level. We implement these tags in a complexity-effective manner using an M-cache (metadata cache) structure that only intervenes on L1 misses, thereby minimizing changes to the processor core. Existing cache coherence states are repurposed to implicitly validate permissions for L1 hits. Sentry achieves its goal of flexible and light-weight access control without
LARGE-SCALE NUMA MULTIPROCESSORS
, 1989
"... Large-scale shared-memory multiprocessors such as the BBN Butterfly and IBM RP3 Introduce a new level In the memory hierarchy: multiple physical memories with different memory access times. An operating system for these NUMA (NonUniform Memory Access) multiprocessors should provide traditional virtu ..."
Abstract
- Add to MetaCart
Large-scale shared-memory multiprocessors such as the BBN Butterfly and IBM RP3 Introduce a new level In the memory hierarchy: multiple physical memories with different memory access times. An operating system for these NUMA (NonUniform Memory Access) multiprocessors should provide traditional virtual memory management. facilitate dynamic and widespread memory sharing. and mInImlze the apparent disparity between local and nonlocal memory. In addition. the Implementation must be scalable to configurations with hundreds or thousands of processors. This paper describes memory management In the Psyche multiprocessor operating system. under development at the University of Rochester. The Psyche kernel manages a multi-level memory hierarchy consisting of local memory. nonlocal memory. and backing store. Local memory stores private data and serves as a cache for shared data: nonlocal memory stores shared data and serves as a disk cache. The system structure IsOlates the policies and mechanisms that manage different layers In the memory hierarchy. so that custOmized data structures and policies can be constructed for each layer. Local memory management policies are Implemented using mechanisms that are Independent of the architectural configuration; global policies are Implemented using multiple processes that Increase In number as the architecture scales. Psyche currently runs on the BBN Butterfly Plus multiprocessor.
Address Translation for Manycore Systems
"... One of the many challenges of designing efficient manycore systems is to determine where and to what degree shared information is cached locally. In this study we specifically address efficient solutions for distributing virtual-to-physical address translations and keeping them coherent throughout a ..."
Abstract
- Add to MetaCart
(Show Context)
One of the many challenges of designing efficient manycore systems is to determine where and to what degree shared information is cached locally. In this study we specifically address efficient solutions for distributing virtual-to-physical address translations and keeping them coherent throughout a chip multiprocessor system with hundreds of cores. We evaluate multiple mechanisms in terms of their performance and overhead with the aid of software simulation. Since TLB information is invalidated rarely, we find that the mecha-nisms with a fast common case performed much better, and that TLB reload overhead (and not communication) was a significant factor in the performance of many benchmarks. 1.
Cost-eective Designs for Supporting Correct Execution and Scalable Performance in Many-core Processors by
, 2010
"... (Computer engineering) ..."
(Show Context)