• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Synergistic Caching in Single-Chip Multiprocessors (2005)

by S Harris
Add To MetaCart

Tools

Sorted by:
Results 1 - 2 of 2

Cooperative caching for chip multiprocessors

by Jichuan Chang - In Proceedings of the 33nd Annual International Symposium on Computer Architecture , 2006
"... Chip multiprocessor (CMP) systems have made the on-chip caches a critical resource shared among co-scheduled threads. Limited off-chip bandwidth, increasing on-chip wire delay, destructive inter-thread interference, and diverse workload characteristics pose key design challenges. To address these ch ..."
Abstract - Cited by 87 (1 self) - Add to MetaCart
Chip multiprocessor (CMP) systems have made the on-chip caches a critical resource shared among co-scheduled threads. Limited off-chip bandwidth, increasing on-chip wire delay, destructive inter-thread interference, and diverse workload characteristics pose key design challenges. To address these challenge, we propose CMP cooperative caching (CC), a unified framework to efficiently organize and manage on-chip cache resources. By forming a globally managed, shared cache using cooperative private caches. CC can effectively support two important caching applications: (1) reduction of average memory access latency and (2) isolation of destructive inter-thread interference. CC reduces the average memory access latency by balancing between cache latency and capacity opti-mizations. Based private caches, CC naturally exploits their access latency benefits. To improve the effective cache capacity, CC forms a “shared ” cache using replication control and LRU-based global replacement policies. Via cooperation throttling, CC provides a spectrum of caching behaviors between the two extremes of private and shared caches, thus enabling dynamic adaptation to suit workload requirements. We show that CC can achieve a robust performance advantage over private and shared cache schemes across different processor, cache and memory configurations, and a wide selection of multithreaded and multiprogrammed

Managing Wire Delay in Chip Multiprocessor Caches

by Bradford M. Beckmann , 2006
"... Increasing on-chip wire delay and growing off-chip miss latency, present two key challenges in designing large Level-2 (L2) CMP caches. Currently, some CMPs use a shared L2 cache to maximize cache capacity and minimize off-chip misses. Others use private L2 caches, replicating data to limit the dela ..."
Abstract - Cited by 4 (1 self) - Add to MetaCart
Increasing on-chip wire delay and growing off-chip miss latency, present two key challenges in designing large Level-2 (L2) CMP caches. Currently, some CMPs use a shared L2 cache to maximize cache capacity and minimize off-chip misses. Others use private L2 caches, replicating data to limit the delay from slow on-chip wires and minimize cache access time. Ideally, to improve performance for a wide variety of work-loads, CMPs prefer both the capacity of a shared cache and the access latency of private caches. In this thesis, we propose three techniques that combine the benefits of shared and private caches. In partic-ular, to reduce access latency in a shared cache, we investigate cache block migration and on-chip trans-mission lines. Migration reduces access latency by moving frequently used blocks towards the lower-latency banks. We show migration successfully reduces latency to blocks requested by only one processor, but doesn’t reduce the latency to shared blocks. In contrast, transmission lines can reduce on-chip wire delay by an order of magnitude versus conventional wires and provide low latency to all shared cache banks. We demonstrate on-chip transmission lines consistently improve performance versus a baseline shared cache, but bandwidth contention can limit them from reaching their full potential. To improve the effective capacity of private caches, we propose Adaptive Selective Replication (ASR). ASR dynamically monitors workload behavior and replicates cache blocks only when it estimates the ben-efit of replication (lower L2 hit latency) exceeds the cost (more L2 misses). When ASR detects replication is less beneficial, processors coordinate writebacks with remote on-chip caches to conserve cache storage. ASR provides a robust CMP cache hierarchy: improving performance versus both shared and private caches. Additionally, ASR can leverage the fast remote cache access latency provided by transmission lines and reduce off-chip misses versus a design using conventional wires. We demonstrate the combina-tion of transmission lines and ASR outperforms either isolated technique and preforms similarly to a shared cache using four times the transmission line bandwidth.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University