Results 1 - 10
of
15
Efficient Virtual Memory for Big Memory Servers
"... Our analysis shows that many “big-memory ” server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory. They consume as much as 10 % of execution cycles on TLB misses, even using large pages. On the other hand, we find that these workload ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
(Show Context)
Our analysis shows that many “big-memory ” server workloads, such as databases, in-memory caches, and graph analytics, pay a high cost for page-based virtual memory. They consume as much as 10 % of execution cycles on TLB misses, even using large pages. On the other hand, we find that these workloads use read-write permission on most pages, are provisioned not to swap, and rarely benefit from the full flexibility of page-based virtual memory. To remove the TLB miss overhead for big-memory workloads, we propose mapping part of a process’s linear virtual address space with a direct segment, while page mapping the rest of the virtual address space. Direct segments use minimal hardware—base, limit and offset registers per core—to map contiguous virtual memory regions directly to contiguous physical memory. They eliminate the possibility of TLB misses for key data structures such as database buffer pools and in-memory key-value stores. Memory mapped by a direct segment may be converted back to paging when needed. We prototype direct-segment software support for x86-64 in Linux and emulate direct-segment hardware. For our workloads, direct segments eliminate almost all TLB misses and reduce the execution time wasted on TLB misses to less than 0.5%.
Reducing Memory Reference Energy With Opportunistic Virtual Caching
- Proceedings of the 39th annual international symposium on Computer architecture
, 2012
"... Most modern cores perform a highly-associative translation look aside buffer (TLB) lookup on every memory access. These designs often hide the TLB lookup latency by overlapping it with L1 cache access, but this overlap does not hide the power dissipated by TLB lookups. It can even exacerbate the pow ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
(Show Context)
Most modern cores perform a highly-associative translation look aside buffer (TLB) lookup on every memory access. These designs often hide the TLB lookup latency by overlapping it with L1 cache access, but this overlap does not hide the power dissipated by TLB lookups. It can even exacerbate the power dissipation by requiring higher associativity L1 cache. With today's concern for power dissipation, designs could instead adopt a virtual L1 cache, wherein TLB access power is dissipated only after L1 cache misses. Unfortunately, virtual caches have compatibility issues, such as supporting writeable synonyms and x86’s physical page table walker. This work proposes an Opportunistic Virtual Cache (OVC) that exposes virtual caching as a dynamic optimization by allowing some memory blocks to be cached with virtual addresses and others with physical addresses. OVC relies on small OS changes to signal which pages can use virtual caching (e.g., no writeable synonyms), but defaults to physical caching for compatibility. We show OVC's promise with analysis that finds virtual cache problems exist, but are dynamically rare. We change 240 lines in Linux 2.6.28 to enable OVC. On experiments with Parsec and commercial workloads, the resulting system saves 94-99 % of TLB lookup energy and nearly 23% of L1 cache dynamic lookup energy. 1
Generating physical addresses directly for saving instruction TLB energy
, 2002
"... Power consumption and power density for the Translation Lookaside Buffer (TLB) are important considerations not only in its design, but can have a consequence on cache design as well. This paper embarks on a new philosophy for reducing the number of accesses to the instruction TLB (iTLB) for power a ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
(Show Context)
Power consumption and power density for the Translation Lookaside Buffer (TLB) are important considerations not only in its design, but can have a consequence on cache design as well. This paper embarks on a new philosophy for reducing the number of accesses to the instruction TLB (iTLB) for power and performance optimizations. The overall idea is to keep a translation currently being used in a register and avoid going to the iTLB as far as possible — until there is a page change. We propose four different approaches for achieving this, and experimentally demonstrate that one of these schemes that uses a combination of compiler and hardware enhancements can reduce iTLB dynamic power by over 85 % in most cases. These mechanisms can work with different instructioncache (iL1) lookup mechanisms and achieve significant iTLB power savings without compromising on performance. Their importance grows with higher iL1 miss rates and larger page sizes. They can work very well with large iTLB structures, that can possibly consume more power and take longer to lookup, without the iTLB getting into the common case. Further, we also experimentally demonstrate that they can provide performance savings for virtually-indexed, virtually-tagged iL1 caches, and can even make physicallyindexed, physically-tagged iL1 caches a possible choice for implementation. 1.
Software virtual memory management for MMU-less embedded systems
, 2005
"... For an embedded system designer, the rise in processing speeds of embedded processors and micro-controller evolution has lead to the possibility of running computation and data intensive applications on small embedded devices that earlier only ran on desktop-class systems. From a memory stand point, ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
For an embedded system designer, the rise in processing speeds of embedded processors and micro-controller evolution has lead to the possibility of running computation and data intensive applications on small embedded devices that earlier only ran on desktop-class systems. From a memory stand point, there is a similar need for running larger and more data intensive applications on embedded devices. However, support for large memory adadress spaces, specifically, virtual memory, for MMU-less em-bedded systems is lacking. In this paper, we present a software virtual memory scheme for MMU-less systems based on an application level virtual memory library and a virtual memory aware assembler. Our virtual memory support is transparent to the programmer, can be tuned for a specific application, correct by construction, and fully automated. Our experiements validate the feasibility of virtual memory for MMU-less embedded systems using benchmark programs.
Legba: Fast Hardware Support for Fine-Grained Protection
- In Proceedings of the 8th Australia-Pacific Computer Systems Architecture Conference (ACSAC’2003
, 2003
"... Fine-grained hardware protection, if it can be done without slowing down the processor, could deliver significant benefits to software, enabling the implementation of strongly encapsulated light-weight objects. In this paper we introduce Legba, a new caching architecture that aims at supporting f ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Fine-grained hardware protection, if it can be done without slowing down the processor, could deliver significant benefits to software, enabling the implementation of strongly encapsulated light-weight objects. In this paper we introduce Legba, a new caching architecture that aims at supporting fine-grained memory protection and protected procedure calls without slowing down the processor 's clock speed.
A Survey on the Interaction between Caching, Translation and Protection
, 2003
"... Fine-grained hardware protection could deliver significant benefits to software, enabling the implementation of strongly encapsulated light-weight objects, but only if it can be done without slowing down the processor. ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
Fine-grained hardware protection could deliver significant benefits to software, enabling the implementation of strongly encapsulated light-weight objects, but only if it can be done without slowing down the processor.
Reducing Data TLB Power via CompilerDirected Address Generation
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
, 2007
"... Abstract—Address translation using the translation lookaside buffer (TLB) consumes as much as 16 % of the chip power on some processors because of its high associativity and access frequency. While prior work has looked into optimizing this structure at the circuit and architectural levels, this pap ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Address translation using the translation lookaside buffer (TLB) consumes as much as 16 % of the chip power on some processors because of its high associativity and access frequency. While prior work has looked into optimizing this structure at the circuit and architectural levels, this paper takes a different approach to optimizing its power by reducing the number of data TLB (dTLB) lookups for data references. The main idea is to keep translations in a set of translation registers (TRs) and intelligently use them in software to directly generate the physical addresses without going through the dTLB. The software has to work within the confines of the TRs provided by the hardware and has to maximize the reuse of such translations to be effective. The au-thors propose strategies and code transformations for achieving this in array-based and pointer-based codes, looking to optimize data accesses. Results with a suite of Spec95 array-based and pointer-based codes show dTLB energy savings of up to 73 % and 88%, respectively, compared to directly using the dTLB for all references. Despite the small increase in instructions executed with the mechanisms, the approach can, in fact, provide performance benefits in certain cache-addressing strategies. Index Terms—Address translation, compiler optimizations, embedded systems design, low power, translation lookaside buffers (TLBs). I.
VIRTUAL MEMORY SYSTEMS AND TLB STRUCTURES
"... p designers have focused on improving storage size, and, as a result, memory is now extremely slow compared to processor speeds. Due to rapidly decreasing memory prices, it is usually possible to have enough memory in one's machine to avoid using the disk as a back-up memory 2001 CRC Press. A ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
p designers have focused on improving storage size, and, as a result, memory is now extremely slow compared to processor speeds. Due to rapidly decreasing memory prices, it is usually possible to have enough memory in one's machine to avoid using the disk as a back-up memory 2001 CRC Press. All Rights Reserved. No copying or re-use without the express permission of CRC press is permitted. 2 space. Many of today's machines generate 64-bit addresses, some even larger; most modern machines therefore reference 16 exabytes (16 giga-gigabytes) or more of data in their address space directly. The list goes on. In fact, one of the few things that has not changed since the development of virtual memory is the basic design of the virtual memory mechanism itself, and the one problem it was invented to solve---too little memory---is no longer a factor in most systems. However, the virtual memory mechanism has proven itself valuable in other areas besides extending the
ABSTRACT Segment Protection for Embedded Systems Using Run-time Checks
"... The lack of virtual memory protection is a serious source of unreliability in many embedded systems. Without the segment-level protection it provides, these systems are subject to memory access violations, stemming from programmer error, whose results can be dangerous and catastrophic in safety-crit ..."
Abstract
- Add to MetaCart
(Show Context)
The lack of virtual memory protection is a serious source of unreliability in many embedded systems. Without the segment-level protection it provides, these systems are subject to memory access violations, stemming from programmer error, whose results can be dangerous and catastrophic in safety-critical systems. The traditional method of testing embedded software before its deployment is an insufficient means of detecting and debugging all software errors, and the reliance on this practice is a severe gamble when the reliable performance of the embedded device is critical. Additionally, the use of safe languages and programming semantic restrictions as prevention mechanisms is often infeasible when considering the adoptability and compatibility of these languages since most embedded applications are written in C and C++. This work improves system reliability by providing a completely automatic software technique for guaranteeing segment protection