Results 1 - 10
of
1,135
An adaptive, nonuniform cache structure for wire-delay dominated on-chip caches
- In International Conference on Architectural Support for Programming Languages and Operating Systems
, 2002
"... Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a ..."
Abstract
-
Cited by 314 (39 self)
- Add to MetaCart
within the same level of the cache. We show that, for multi-megabyte level-two caches, an adaptive, dynamic NUCA design achieves 1.5 times the IPC of a Uniform Cache Architecture of any size, outperforms the best static NUCA scheme by 11%, outperforms the best three-level hierarchywhile using less
An Adaptive, Non-Uniform Cache Structure for . . .
- IN PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS (ASPLOS
, 2002
"... Growing wire delays will force substantive changes in the designs of large caches. Traditional cache architectures assume that each level in the cache hierarchy has a single, uniform access time. Increases in on-chip communication delays will make the hit time of large on-chip caches a function of a ..."
Abstract
- Add to MetaCart
within the same level of the cache. We show that, for multi-megabyte level-two caches, an adaptive, dynamic NUCA design achieves 1.5 times the IPC of a Uniform Cache Architecture of any size, outperforms the best static NUCA scheme by 11%, outperforms the best three-level hierarchy-- while using less
Effective Use of The Level-Two Cache for Skewed Tiling
, 2001
"... Tiling is a well-known loop transformation technique to enhance temporal data locality. In our previous work, we have developed a skewed tiling technique for relaxation codes, which requires to apply loop skewing before loop tiling. In this paper, we study how to effectively usc the level-two cache ..."
Abstract
- Add to MetaCart
Tiling is a well-known loop transformation technique to enhance temporal data locality. In our previous work, we have developed a skewed tiling technique for relaxation codes, which requires to apply loop skewing before loop tiling. In this paper, we study how to effectively usc the level-two cache
An Analysis of Adding a Backside Level-Two Cache to an Existing Microprocessor
"... Copyright by ..."
Prefetching using Markov predictors
- In ISCA
, 1997
"... Prefetching is one approach to reducing the latency of memory op-erations in modem computer systems. In this paper, we describe the Markov prefetcher. This prefetcher acts as an interface between the on-chip and off-chip cache, and can be added to existing com-puter designs. The Markov prefetcher is ..."
Abstract
-
Cited by 308 (1 self)
- Add to MetaCart
by the processor. In our cycle-level simulations, the Markov Prefetcher reduces the overall execution stalls due to in-struction and data memory operations by an average of 54 % for various commercial benchmarks while only using two thrds the memory of a demand-fetch cache organization. 1
Tempest and Typhoon: User-level Shared Memory
- In Proceedings of the 21st Annual International Symposium on Computer Architecture
, 1994
"... Future parallel computers must efficiently execute not only hand-coded applications but also programs written in high-level, parallel programming languages. Today’s machines limit these programs to a single communication paradigm, either message-passing or shared-memory, which results in uneven perf ..."
Abstract
-
Cited by 309 (27 self)
- Add to MetaCart
-programmable, user-level processor in the network interface. We demonstrate the utility of Tempest with two examples. First, the Stache protocol uses Tempest’s finegrain access control mechanisms to manage part of a processor’s local memory as a large, fully-associative cache for remote data. We simulated Typhoon
The filter cache: An energy efficient memory structure
- In Proceedings of the 1997 International Symposium on Microarchitecture
, 1997
"... Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. These caches are typically implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches often consume a significant amount of power. In many ..."
Abstract
-
Cited by 222 (4 self)
- Add to MetaCart
Most modern microprocessors employ one or two levels of on-chip caches in order to improve performance. These caches are typically implemented with static RAM cells and often occupy a large portion of the chip area. Not surprisingly, these caches often consume a significant amount of power. In many
UNIX Disk Access Patterns
, 1993
"... Disk access patterns are becoming ever more important to understand as the gap between processor and disk performance increases. The study presented here is a detailed characterization of every lowlevel disk access generated by three quite different systems over a two month period. The contributions ..."
Abstract
-
Cited by 277 (20 self)
- Add to MetaCart
of write caching at the disk level, we found that using a small non-volatile cache at each disk allowed writes to be serviced considerably faster than with a regular disk. In particular, short bursts of writes go much faster -- and such bursts are common: writes rarely come singly. Adding even 8KB of non
Evaluating Stream Buffers as a Secondary Cache Replacement
- In Proceedings of the 21st Annual International Symposium on Computer Architecture
, 1994
"... Today’s commodity microprocessors require a low latency memory system to achieve high sustained performance. The conventional high-performance memory system provides fast data access via a large secondary cache. But large secondary caches can be expensive, particularly in large-scale parallel system ..."
Abstract
-
Cited by 204 (0 self)
- Add to MetaCart
systems with many processors (and thus many caches). We evaluate a memory system design that can be both cost-effective as well as provide better performance, particularly for scientific workloads: a single level of (on-chip) cache backed up only by Jouppi’s stream buffers [10] and a main memory
The HP AutoRAID hierarchical storage system
- ACM Transactions on Computer Systems
, 1995
"... Configuring redundant disk arrays is a black art. To configure an array properly, a system administrator must understand the details of both the array and the workload it will support. Incorrect understanding of either, or changes in the workload over time, can lead to poor performance. We present a ..."
Abstract
-
Cited by 263 (15 self)
- Add to MetaCart
a solution to this problem: a two-level storage hierarchy implemented inside a single diskarray controller. In the upper level of this hierarchy, two copies of active data are stored to provide full redundancy and excellent performance. In the lower level, RAID 5 parity protection is used to provide
Results 1 - 10
of
1,135