Results 1 - 10
of
3,498
Dynamic Partitioning of Shared Cache Memory
- JOURNAL OF SUPERCOMPUTING
, 2002
"... This paper proposes dynamic cache partitioning amongst simultaneously executing processes/threads. We present a general partitioning scheme that can be applied to set-associative caches.
Since memory reference characteristics of processes/threads can change over time, our method collects the cache ..."
Abstract
-
Cited by 99 (0 self)
- Add to MetaCart
IPC over standard LRU. Our results show that smart cache management and scheduling is essential to achieve high performance with shared cache memory.
Control Shared Cache
"... The Setting: Computer science and engineering is undergoing a revolution. Computationally very powerful parallel computers, which used to be the luxury of the government and billion-dollar corporations, are already in the laptops and desktops of millions of ordinary users. Computer architects are bu ..."
Abstract
- Add to MetaCart
The Setting: Computer science and engineering is undergoing a revolution. Computationally very powerful parallel computers, which used to be the luxury of the government and billion-dollar corporations, are already in the laptops and desktops of millions of ordinary users. Computer architects are building existing computer chips with multiple processing cores inside them, as opposed to with solely a single processing core, which used to be the traditional way of designing mainstream computer chips until around 2004. Essentially, a processing core is analogous to the brain of the computing system: the more cores there are, the more tasks the system can perform in parallel. Chips with multiple processing cores are commonly called multi-core chips. Existing Intel and AMD chips in the market already have 4 cores, IBM and Sun Microsystems have chips with respectively 9 and 16 cores, and Intel has demonstrated prototypes of an 80-core chip. Both academic and industrial researchers, including us at Carnegie Mellon, are envisioning and charting out designs of 1000core chips in the 10-20-year timeframe [1, 2]. Soon, unprecedented amounts of computing power will be in the hands of almost every single computer user and programmer. And, both the programmers and the users need to be aware of how to harness this power. To aid understanding, Figure 1 shows an example single-core system and an example multi-core system with nine processing cores. Major differences between the two systems are highlighted in terms of designers’, programmers ’ and users ’ perspectives. Multi−Core Chip
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches
- IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE
, 2006
"... This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demand basis, giving more cache resources to the application that has a high demand and fewer cache resource ..."
Abstract
-
Cited by 260 (5 self)
- Add to MetaCart
This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demand basis, giving more cache resources to the application that has a high demand and fewer cache
Cache pirating: Measuring the curse of the shared cache
- In Proc. of ICPP
, 2011
"... We present a low-overhead method for accurately measuring application performance (CPI) and off-chip bandwidth (GB/s) as a function of its the available shared cache capacity, on real hardware, with no modifications to the application or operating system. We accomplish this by co-running a Pirate ap ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
We present a low-overhead method for accurately measuring application performance (CPI) and off-chip bandwidth (GB/s) as a function of its the available shared cache capacity, on real hardware, with no modifications to the application or operating system. We accomplish this by co-running a Pirate
Summary cache: A scalable wide-area web cache sharing protocol
, 1998
"... The sharing of caches among Web proxies is an important technique to reduce Web traffic and alleviate network bottlenecks. Nevertheless it is not widely deployed due to the overhead of existing protocols. In this paper we propose a new protocol called "Summary Cache"; each proxy keeps a su ..."
Abstract
-
Cited by 894 (3 self)
- Add to MetaCart
The sharing of caches among Web proxies is an important technique to reduce Web traffic and alleviate network bottlenecks. Nevertheless it is not widely deployed due to the overhead of existing protocols. In this paper we propose a new protocol called "Summary Cache"; each proxy keeps a
Performance of Shared Caches on Multithreaded Architectures
- Journal of Information Science and Engineering
, 1996
"... A multithreaded computer maintains multiple program counters and register files to support concurrent or overlapped execution of multiple threads of context, and to provide fast context switching for tolerance of memory latency. In this paper, we apply trace-driven simulation to study the perform ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
the performance impact of a multithreaded architecture on the storage hierarchy. Particularly, we examined the effects of different multithread scheduling techniques on cache performance. Using several program traces representing typical server/workstation workload mix, we found that the cache performance can
Paging for Multi-Core Shared Caches ∗
"... Paging for multi-core processors extends the classical paging problem to a setting in which several processes simultaneously share the cache. Recently, Hassidim proposed a model for multi-core paging [25], studying cache eviction policies for multi-cores under the traditional competitive analysis me ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Paging for multi-core processors extends the classical paging problem to a setting in which several processes simultaneously share the cache. Recently, Hassidim proposed a model for multi-core paging [25], studying cache eviction policies for multi-cores under the traditional competitive analysis
Optimal footprint symbiosis in shared cache
- In CCGRID
, 2015
"... Abstract—On multicore processors, applications are run shar-ing the cache. This paper presents online optimization to co-locate applications to minimize cache interference to maximize performance. The paper formulates the optimization problem and solution, presents a new sampling technique for local ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—On multicore processors, applications are run shar-ing the cache. This paper presents online optimization to co-locate applications to minimize cache interference to maximize performance. The paper formulates the optimization problem and solution, presents a new sampling technique
Exploring the Design Space for a Shared-Cache Multiprocessor
, 1994
"... In the near future, semiconductor technology will allow the integration of multiple processors on a chip or multichipmodule (MCM). In this paper we investigate the architecture and partitioning of resources between processors and cache memory for single chip and MCM-based multiprocessors. We study t ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
the performance of a cluster-based multiprocessor architecture in which processors within a cluster are tightly coupled via a shared cluster cache for various processor-cache configurations. Our results show that for parallel applications, clustering via shared caches provides an effective mechanism
Performance Modeling for Helper Thread on Shared Cache CMPs
"... Abstract---In data intensive applications of Cloud Computing such as XML parsing, large graph traversing and so on, there are a lot of operations to access irregular data. These data need be timely prefetched into the shared cache in CMPs by helper thread. However, a bad prefetching strategy of help ..."
Abstract
- Add to MetaCart
Abstract---In data intensive applications of Cloud Computing such as XML parsing, large graph traversing and so on, there are a lot of operations to access irregular data. These data need be timely prefetched into the shared cache in CMPs by helper thread. However, a bad prefetching strategy
Results 1 - 10
of
3,498