Results 1 -
4 of
4
Zero-content augmented caches
- In ICS ’09: Proceedings of the 23rd annual International Conference on Supercomputing
, 2009
"... It has been observed that some applications manipulate large amounts of null data. Moreover these zero data often exhibit high spatial locality. On some applications more than 20% of the data accesses concern null data blocks. Representing a null block in a cache on a standard cache line appears as ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
It has been observed that some applications manipulate large amounts of null data. Moreover these zero data often exhibit high spatial locality. On some applications more than 20% of the data accesses concern null data blocks. Representing a null block in a cache on a standard cache line appears as a waste of resources. In this paper, we propose the Zero-Content Augmented cache, the ZCA cache. A ZCA cache consists of a conventional cache augmented with a specialized cache for memorizing null blocks, the Zero-Content cache or ZC cache. In the ZC cache, the data block is represented by its address tag and a validity bit. Moreover, as null blocks generally exhibit high spatial locality, several null blocks can be associated with a single address tag in the ZC cache. For instance, a ZC cache mapping 32MB of zero 64-byte lines uses less than 80KB of storage. Decompression of a null block is very simple, therefore read access time on the ZCA cache is in the same range as the one of a conventional cache. On applications manipulating large amount of null data blocks, such a ZC cache allows to significantly reduce the miss rate and memory traffic, and therefore to increase performance for a small hardware overhead. In particular, the write-back traffic on null blocks is limited. For applications with a low null block rate, no performance loss is observed.
Decoupled Zero-Compressed Memory
"... For each computer system generation, there are always applications or workloads for which the main memory size is the major limitation. On the other hand, in many cases, one could free a very significant portion of the memory space by storing data in a compressed form. Therefore, a hardware compress ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
For each computer system generation, there are always applications or workloads for which the main memory size is the major limitation. On the other hand, in many cases, one could free a very significant portion of the memory space by storing data in a compressed form. Therefore, a hardware compressed memory is an attractive way to artificially increase the amount of data accessible in a reasonable delay. Among the data that are highly compressible are null data blocks. Previous work has shown that, on many applications null blocks represent a significant fraction of the working set resident in main memory. We propose to leverage this property through the use of a hardware compressed memory that only targets null data blocks, the decoupled zero-compressed memory. Borrowing ideas from the decoupled sectored cache [12] and the zero-content augmented cache [7], the decoupled zero-compressed memory, or DZC memory, manages the main memory as a decoupled sectored set-associative cache where null blocks are only represented by a validity bit. Our experiments show that for many applications, the DZC memory allows to artificially enlarge the main memory, i.e. it reduces the effective physical memory size needed to accommodate the working set of an application without excessive page swapping. Moreover, the DZC memory can be associated with a zero-content augmented cache to manage null blocks across the whole memory hierarchy. On some applications, such a management significantly decreases the memory traffic and therefore can significantly improve performance. 1
Integrating Memory Compression and Decompression with Coherence Protocols in Distributed Shared Memory Multiprocessors
"... Ever-increasing memory footprint of applications and increasing mainstream popularity of shared memory parallel computing motivate us to explore memory compression potential in distributed shared memory (DSM) multiprocessors. This paper for the first time integrates on-the-fly cache block compressio ..."
Abstract
- Add to MetaCart
Ever-increasing memory footprint of applications and increasing mainstream popularity of shared memory parallel computing motivate us to explore memory compression potential in distributed shared memory (DSM) multiprocessors. This paper for the first time integrates on-the-fly cache block compression/decompression algorithms in the cache coherence protocols by leveraging the directory structure already present in these scalable machines. Our proposal is unique in the sense that instead of employing custom compression/decompression hardware, we use a simple on-die protocol processing core in dual-core nodes for running our directory-based coherence protocol suitably extended with compression/decompression algorithms. We design a lowoverhead compression scheme based on frequent patterns and zero runs present in the evicted dirty L2 cache blocks. Our compression algorithm examines the first eight bytes of an evicted dirty L2 block arriving at the home memory controller and speculates which compression scheme to invoke for the rest of the block. Our customized algorithm for handling completely zero cache blocks helps hide a significant amount of memory access latency. Our simulationbased experiments on a 16-node DSM multiprocessor with seven scientific computing applications show that our best design achieves, on average, 16 % to 73 % storage saving per evicted dirty L2 cache block for four out of the seven applications at the expense of at most 15 % increased parallel execution time. 1.
Scalable Virtual Machine Multiplexing
, 2009
"... quality and form for publication on microfilm ..."

