Results 1 -
6 of
6
Adaptive Cache Compression for High-Performance Processors
- In Proc. ISCA
, 2004
"... Modern processors use two or more levels of cache memories to bridge the rising disparity between processor and memory speeds. Compression can improve cache performance by increasing effective cache capacity and eliminating misses. However, decompressing cache lines also increases cache access laten ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
Modern processors use two or more levels of cache memories to bridge the rising disparity between processor and memory speeds. Compression can improve cache performance by increasing effective cache capacity and eliminating misses. However, decompressing cache lines also increases cache access latency, potentially degrading performance. In this paper, we develop an adaptive policy that dynamically adapts to the costs and benefits of cache compression. We propose a two-level cache hierarchy where the L1 cache holds uncompressed data and the L2 cache dynamically selects between compressed and uncompressed storage. The L2 cache is 8-way set-associative with LRU replacement, where each set can store up to eight compressed lines but has space for only four uncompressed lines. On each L2 reference, the LRU stack depth and compressed size determine whether compression (could have) eliminated a miss or incurs an unnecessary decompression overhead. Based on this outcome, the adaptive policy updates a single global saturating counter, which predicts whether to allocate lines in compressed or uncompressed form. We evaluate adaptive cache compression using full-system simulation and a range of benchmarks. We show that compression can improve performance for memory-intensive commercial workloads by up to 17%. However, always using compression hurts performance for low-miss-rate benchmarks—due to unnecessary decompression overhead—degrading performance by up to 18%. By dynamically monitoring workload behavior, the adaptive policy achieves comparable benefits from compression, while never degrading performance by more than 0.4%. 1
Using Compression to Improve Chip Multiprocessor Performance
, 2006
"... Chip multiprocessors (CMPs) combine multiple processors on a single die, typically with private level-one caches and a shared level-two cache. However, the increasing number of processors cores on a single chip increases the demand on two critical resources: the shared L2 cache capacity and the off- ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Chip multiprocessors (CMPs) combine multiple processors on a single die, typically with private level-one caches and a shared level-two cache. However, the increasing number of processors cores on a single chip increases the demand on two critical resources: the shared L2 cache capacity and the off-chip pin band-width. Demand on these critical resources is further exacerbated by latency-hiding techniques such as hardware prefetching. In this dissertation, we explore using compression to effectively increase cache and pin bandwidth resources and ultimately CMP performance. We identify two distinct and complementary designs where compression can help improve CMP perfor-mance: Cache Compression and Link Compression. Cache compression stores compressed lines in the cache, potentially increasing the effective cache size, reducing off-chip misses and improving perfor-mance. On the downside, decompression overhead can slow down cache hit latencies, possibly degrading performance. Link (i.e., off-chip interconnect) compression compresses communication messages before sending to or receiving from off-chip system components, thereby increasing the effective off-chip pin bandwidth, reducing contention and improving performance for bandwidth-limited configurations. While compression can have a positive impact on CMP performance, practical implementations of compression
Measuring the Compressibility of Metadata and Small Files for Disk/NVRAM Hybrid Storage Systems
- In Proceedings of the 2004 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS ’04
, 2004
"... File systems combining disk storage with non-volatile RAM (NVRAM) promise large improvements in file system performance. However, current technology allows for a relatively limited amount of NVRAM, limiting the effectiveness of such an approach. We are examining in-memory compression techniques that ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
File systems combining disk storage with non-volatile RAM (NVRAM) promise large improvements in file system performance. However, current technology allows for a relatively limited amount of NVRAM, limiting the effectiveness of such an approach. We are examining in-memory compression techniques that allow for significantly more efficient utilization of this limited resource. We focus on small objects - metadata and small files - and we have measured the compressibility of these objects for a set of representative file systems. Our results show that inodes are compressible by at least 76--90% at a rate of 270--900 thousand inodes per second for the best algorithms. For files in the range of 4-- 128KB, we achieved an average compressibility of 40--60% at rates of 20--40 megabytes per second. Based on these measurements, we believe that compression of both metadata and small files should be included in any disk/NVRAM hybrid file system.
Quasi-Static Shared Libraries and XIP for Memory Footprint Reduction in MMU-less Embedded Systems *
"... Despite a rapid decrease in the price of solid state memory devices, system memory is still a very precious resource in embedded systems. The use of shared libraries and XIP (eXecution-In-Place) is known to be effective in significantly reducing memory usage. Unfortunately, many resource-constrained ..."
Abstract
- Add to MetaCart
Despite a rapid decrease in the price of solid state memory devices, system memory is still a very precious resource in embedded systems. The use of shared libraries and XIP (eXecution-In-Place) is known to be effective in significantly reducing memory usage. Unfortunately, many resource-constrained embedded systems lack an MMU, making it extremely difficult to support these techniques. To address this problem, we propose a novel shared library technique called a quasistatic shared library and an XIP, both based on our enhanced position independent code technique. In our quasi-static shared libraries, global symbols are bound to pseudo-addresses at linking time and actual physical addresses are bound at loading time. Unlike conventional shared libraries, they do not require symbol tables that take up valuable memory space and, therefore, allow for expedited address translation at runtime. Our XIP technique is facilitated by our enhanced position independent code where a data section can be arbitrarily located. Both the shared library and XIP techniques are made possible by emulating an MMU’s memory mapping feature with a Data Section Base Register (DSBR) and a Data Section Base Table (DSBT). We have implemented these proposed techniques in a commercial ADSL (Asymmetric Digital Subscriber Line) home network gateway equipped with an MMU-less ARM7TDMI processor core, 2-MB flash memory, and 16-MB RAM. We measured its memory usage and evaluated its performance overhead by conducting a series of experiments. These
2010 International Workshop on Storage Network Architecture and Parallel I/Os ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency
"... Abstract—In this work we examine how transparent compression in the I/O path can improve space efficiency for online storage. We extend the block layer with the ability to compress and decompress data as they flow between the file-system and the disk. Achieving transparent compression requires exten ..."
Abstract
- Add to MetaCart
Abstract—In this work we examine how transparent compression in the I/O path can improve space efficiency for online storage. We extend the block layer with the ability to compress and decompress data as they flow between the file-system and the disk. Achieving transparent compression requires extensive metadata management for dealing with variable block sizes, dynamic block mapping, block allocation, explicit work scheduling and I/O optimizations to mitigate the impact of additional I/Os and compression overheads. Preliminary results show that online transparent compression is a viable option for improving effective storage capacity, it can improve I/O performance by reducing I/O traffic and seek distance, and has a negative impact on performance only when single-thread I/O latency is critical. Keywords-online block-level compression; I/O performance; log-structured block device I.
Content-Based Block Caching
, 2006
"... In this paper we propose a novel cache management mechanism termed the Content-Based Buffer Cache. The Content-Based Buffer Cache (CBBC) attempts to maintain a single copy of any block in memory according to its contents. In the presence of repeated content, this mechanism increases the effective si ..."
Abstract
- Add to MetaCart
In this paper we propose a novel cache management mechanism termed the Content-Based Buffer Cache. The Content-Based Buffer Cache (CBBC) attempts to maintain a single copy of any block in memory according to its contents. In the presence of repeated content, this mechanism increases the effective size of the buffer cache. Overheads for maintaining this extra state information are small and bounded, providing an overall system performance improvement. Additionally, we eliminate writes to blocks where the new and old content are the same, reducing pressure on the I/O subsystems in the presence of these “Silent Writes”. We have logged traces of block-level disk access for a group of workstations over a several month period using a modified Linux kernel designed to boot off of an iSCSI target. We have analyzed single client access, as well as multiple client access to distinct logical disks using a unified block cache. There is significant replication of content and significant numbers of “Silent Writes ” within a single workstation trace, improving the Content-Based Buffer Cache read hit rate as much as 80 % over the traditional buffer cache design. We have also found that there is significant sharing of content between disks, which benefits content-based caching performance in the presence of a unified cache. For our workloads, these results indicate that content-based buffer caches dramatically improve I/O performance when used to manage a cluster of similar storage. 1

