Results 1 - 10
of
981
Provably good multicore cache performance for divide-and-conquer algorithms
- In Proc. 19th ACM-SIAM Sympos. Discrete Algorithms
, 2008
"... This paper presents a multicore-cache model that reflects the reality that multicore processors have both per-processor private (L1) caches and a large shared (L2) cache on chip. We consider a broad class of parallel divide-andconquer algorithms and present a new on-line scheduler, controlled-pdf, t ..."
Abstract
-
Cited by 48 (14 self)
- Add to MetaCart
This paper presents a multicore-cache model that reflects the reality that multicore processors have both per-processor private (L1) caches and a large shared (L2) cache on chip. We consider a broad class of parallel divide-andconquer algorithms and present a new on-line scheduler, controlled
Multi-Execution: Multicore Caching for Data-Similar Executions
"... While microprocessor designers turn to multicore architectures to sustain performance expectations, the dramatic increase in parallelism of such architectures will put substantial demands on off-chip bandwidth and make the memory wall more significant than ever. This paper demonstrates that one prof ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
profitable application of multicore processors is the execution of many similar instantiations of the same program. We identify that this model of execution is used in several practical scenarios and term it as “multi-execution.” Often, each such instance utilizes very similar data. In conventional cache
A Software Approach to Unifying Multicore Caches
"... Multicore chips will have large amounts of fast on-chip cache memory, along with relatively slow DRAM interfaces. The onchip cache memory, however, will be fragmented and spread over the chip; this distributed arrangement is hard for certain kinds of applications to exploit efficiently, and can lead ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Multicore chips will have large amounts of fast on-chip cache memory, along with relatively slow DRAM interfaces. The onchip cache memory, however, will be fragmented and spread over the chip; this distributed arrangement is hard for certain kinds of applications to exploit efficiently, and can
Conflict-Avoidance in Multicore Caching for Data-Similar Executions
"... Power density constraints have affected the scaling of clock speed in processors, but following Moore’s law we have entered the multicore domain and we are about to step in the era of manycores. Harnessing the full potential of large number of cores is a challenging problem as shared on-chip resourc ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Power density constraints have affected the scaling of clock speed in processors, but following Moore’s law we have entered the multicore domain and we are about to step in the era of manycores. Harnessing the full potential of large number of cores is a challenging problem as shared on
Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems
- INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE
, 2008
"... Cache partitioning and sharing is critical to the effective utilization of multicore processors. However, almost all existing studies have been evaluated by simulation that often has several limitations, such as excessive simulation time, absence of OS activities and proneness to simulation inaccura ..."
Abstract
-
Cited by 84 (10 self)
- Add to MetaCart
Cache partitioning and sharing is critical to the effective utilization of multicore processors. However, almost all existing studies have been evaluated by simulation that often has several limitations, such as excessive simulation time, absence of OS activities and proneness to simulation
Enabling Software Management for Multicore Caches with a Lightweight Hardware Support
"... The management of shared caches in multicore processors is a critical and challenging task. Many hardware and OS-based methods have been proposed. However, they may be hardly adopted in practice due to their non-trivial overheads, high complexities, and/or limited abilities to handle increasingly co ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The management of shared caches in multicore processors is a critical and challenging task. Many hardware and OS-based methods have been proposed. However, they may be hardly adopted in practice due to their non-trivial overheads, high complexities, and/or limited abilities to handle increasingly
NUcache: An efficient multicore cache organization based on next-use distance
- in IEEE 17th International Symposium on High Performance Computer Architecture
"... Abstract ..."
NUcache: A Multicore Cache Organization Based on Next-Use Distance
"... In this work, we propose a new organization for the last level shared cache of a multicore system. Our design is based on the observation that the Next-Use distance, measured in terms of intervening misses between the eviction of a line and its next use, for lines brought in by a given delinquent PC ..."
Abstract
- Add to MetaCart
In this work, we propose a new organization for the last level shared cache of a multicore system. Our design is based on the observation that the Next-Use distance, measured in terms of intervening misses between the eviction of a line and its next use, for lines brought in by a given delinquent
Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis
- In Proceedings of the ACM SIGPLAN Workshop on Memory System Performance and Correctness
, 2012
"... Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache hierarchies em-ployed in modern CPUs. In today’s hierarchies, performance is determined by complicated thread interactions, such as interference in shared caches and replication and communi-cation in ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Understanding multicore memory behavior is crucial, but can be challenging due to the complex cache hierarchies em-ployed in modern CPUs. In today’s hierarchies, performance is determined by complicated thread interactions, such as interference in shared caches and replication and communi
Understanding Multicore Cache Behavior of Loop-based Parallel Programs via Reuse Distance Analysis
"... Understanding multicore memory behavior is crucial, but can be challenging due to the cache hierarchies employed in modern CPUs. In today’s hierarchies, performance is determined by complex thread interactions, such as interference in shared caches and replication and communication in private caches ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Understanding multicore memory behavior is crucial, but can be challenging due to the cache hierarchies employed in modern CPUs. In today’s hierarchies, performance is determined by complex thread interactions, such as interference in shared caches and replication and communication in private
Results 1 - 10
of
981