Results 1 -
8 of
8
Cache Write Policies and Performance
, 1991
"... This paper investigates issues involving writes and caches. First, tradeoffs between write-through and write-back caching when writes hit in a cache are considered. A mixture of these two alternatives, called write caching is proposed. Write caching places a small fully-associative cache behind a wr ..."
Abstract
-
Cited by 122 (3 self)
- Add to MetaCart
This paper investigates issues involving writes and caches. First, tradeoffs between write-through and write-back caching when writes hit in a cache are considered. A mixture of these two alternatives, called write caching is proposed. Write caching places a small fully-associative cache behind a write-through cache. A write cache can eliminate almost as much write traffic as a write-back cache. Second, tradeoffs on writes that miss in the cache are investigated. In particular, whether the missed cache block is fetched on a write miss, whether the missed cache block is allocated in the cache, and whether the cache line accessed is invalidated are considered. Depending on the combination of these polices chosen, the entire cache miss rate can vary by a factor of two on some applications. Furthermore, the combination of no-fetch-on-write and write-allocate can provide better performance than cache line allocation instructions. Finally, the traffic at the back side of write-through and wr...
Efficient Procedure Mapping using Cache Line Coloring
- IN PROCEEDINGS OF THE SIGPLAN'97 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1997
"... As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replace ..."
Abstract
-
Cited by 67 (12 self)
- Add to MetaCart
As the gap between memory and processor performance continues to widen, it becomes increasingly important to exploit cache memory effectively. Both hardware and software approaches can be explored to optimize cache performance. Hardware designers focus on cache organization issues, including replacement policy, associativity, line size and the resulting cache access time. Software writers use various optimization techniques, including software prefetching, data scheduling and code reordering. Our focus is on improving memory usage through code reordering compiler techniques. In this
Experimental Studies of Access Graph Based Heuristics: Beating the LRU standard?
- In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms
, 1997
"... In this paper we devise new paging heuristics motivated by the access graph model of paging [2]. Unlike the access graph model [2, 9, 4] and the related Markov paging model [11] our heuristics are truly online in that we do not assume any prior knowledge of the program just about to be executed. Th ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
In this paper we devise new paging heuristics motivated by the access graph model of paging [2]. Unlike the access graph model [2, 9, 4] and the related Markov paging model [11] our heuristics are truly online in that we do not assume any prior knowledge of the program just about to be executed. The Least Recently Used heuristic for paging is remarkably good, and is known experimentally to be superior to many of the suggested alternatives on real program traces [24]. Experiments we've performed suggest that our heuristics beat LRU consistently, over a wide range of cache sizes and programs. The number of page faults can be as low as 75% less than the number of page faults for LRU and is typically 5%--30% less than that of LRU. We have built a program tracer that gives the page access sequence for real program executions of 200 -- 1,500 thousand page access requests, and our simulations are based on these real program traces. While we have no real evidence to suggest that the programs...
The Impact of Software Structure and Policy on CPU and Memory System Performance
, 1994
"... Operating systems, when compared to application programs, have received disappointingly little benefit from the performance improvements of the most recent generation of microprocessors. This thesis used complete traces of software activity from a RISCbased uniprocessor to expose the dynamic behavio ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Operating systems, when compared to application programs, have received disappointingly little benefit from the performance improvements of the most recent generation of microprocessors. This thesis used complete traces of software activity from a RISCbased uniprocessor to expose the dynamic behavior of operating system execution and explore the sources of poor performance. Traces from both Mach 3.0 and Ultrix implementations of UNIX permitted a study of performance differences between microkernel and monolithic implementations of the same operating system interface. The comparison showed that both system structure and policy implemented in the system have a significant impact on performance. Measurements of X11 workloads showed that memory system behavior for these large workloads differs significantly from the kinds of workloads traditionally used for performance analysis. Structural and behavioral similarities between large X11 workloads and the operating system are reflected in the...
Strategic Directions in Computer Architecture
- ACM Computing Surveys
, 1996
"... Looking back on the last 30 years, we have seen the remarkable developments in semiconductor technology enabling the implementation of ideas that were previously ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Looking back on the last 30 years, we have seen the remarkable developments in semiconductor technology enabling the implementation of ideas that were previously
VLIW Processor Codesign for Video Processing
- University of California, Berkeley, in
, 1997
"... . A codesign approach for complex video compression systems is presented. The system is based on a flexible and programmable VLIW (Very Long Instruction Word) architecture. The design approach can be subdivided into two phases: a quantitative analysis for deriving the main processor structure and a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
. A codesign approach for complex video compression systems is presented. The system is based on a flexible and programmable VLIW (Very Long Instruction Word) architecture. The design approach can be subdivided into two phases: a quantitative analysis for deriving the main processor structure and a cosynthesis for generating the processor hardware and the compiler back-end. The analysis results of different video compression algorithms are summarized. This permits to adapt the processor to a set of related applications rather than to a particular task. A compiled instruction-set simulator for analyzing large data sets is presented. An HTML-based codesign framework is shown which documents and organizes the analysis data. Keywords: Embedded system design, video processing, VLIW processors, performance analysis 1. Introduction A short-time-to market, improvements of design productivity, and a trend towards evermore complex systems are the main motivations for the recent interest in desi...
WRL Research Report 91/2
- In Proc. 19th Annual Intl. Symposium on Computer Architecture
, 1989
"... This paper presents the results of a simulation-based study of various translation lookaside buffer (TLB) architectures, in the context of a modern VLSI RISC processor. The simulators used address traces, generated by instrumented versions of the SPECmarks and several other programs running on a DEC ..."
Abstract
- Add to MetaCart
This paper presents the results of a simulation-based study of various translation lookaside buffer (TLB) architectures, in the context of a modern VLSI RISC processor. The simulators used address traces, generated by instrumented versions of the SPECmarks and several other programs running on a DECstation 5000. The performance of two-level TLBs and fullyassociative TLBs were investigated. The amount of memory mapped was found to be the dominant factor in TLB performance. Small first-level FIFO instruction TLBs can be effective in two level TLB configurations. For some applications, the cycles-per-instruction (CPI) loss due to TLB misses can be reduced from as much as 5 CPI to negligible levels with typical TLB parameters through the use of variable-sized pages. This research report is a preprint of a paper to appear in the Proceedings of the 19th International Symposium on Computer Architecture. Copyright 1992 Digital Equipment Corporation A Simulation-Based Study of TLB Performa...

