Results 1 - 10
of
59
Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers
, 1990
"... Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on conventional caching techniques. This paper prese ..."
Abstract
-
Cited by 747 (4 self)
- Add to MetaCart
Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on conventional caching techniques. This paper presents hardware techniques to improve the performance of caches. Miss caching places a small fully-associative cache between a cache and its refill path. Misses in the cache that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache. Small miss caches of 2 to 5 entries are shown to be very effective in removing mapping conflict misses in first-level direct-mapped caches. Victim caching is an improvement to miss caching that loads the small fully-associative cache with the victim of a miss and not the requested line. Small victim caches of 1 to 5 entries are even more effective at removing conflict misses than miss caching. St...
Limits of instruction-level parallelism
, 1991
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There two other research laboratories located in Palo Al ..."
Abstract
-
Cited by 339 (7 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There two other research laboratories located in Palo Alto, the Network Systems
An enhanced access and cycle time model for on-chip caches
, 1994
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There is a second research laboratory located in Palo Al ..."
Abstract
-
Cited by 230 (5 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There is a second research laboratory located in Palo Alto, the Systems Research Center (SRC). Other Digital research groups are located in Paris (PRL) and in Cambridge,
Potential benefits of delta encoding and data compression for HTTP (Corrected version)
, 1997
"... ..."
The Effect of Context Switches on Cache Performance
- Jeffrey C. Mogul and Anita
, 1990
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There is a second research laboratory located in Palo Al ..."
Abstract
-
Cited by 156 (1 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There is a second research laboratory located in Palo Alto, the Systems Research Center (SRC). Other Digital research groups are located in Paris (PRL) and in Cambridge,
Using the SimOS Machine Simulator to Study Complex Computer Systems
- ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION
, 1997
"... ... This paper identifies two challenges that machine simulators such as SimOS must overcome in order to effectively analyze large complex workloads: handling long workload execution times and collecting data effectively. To study long-running workloads, SimOS includes multiple interchangeable simul ..."
Abstract
-
Cited by 144 (5 self)
- Add to MetaCart
... This paper identifies two challenges that machine simulators such as SimOS must overcome in order to effectively analyze large complex workloads: handling long workload execution times and collecting data effectively. To study long-running workloads, SimOS includes multiple interchangeable simulation models for each hardware component. By selecting the appropriate combination of simulation models, the user can explicitly control the tradeoff between simulation speed and simulation detail. To handle the large amount of low-level data generated by the hardware simulation models, SimOS contains flexible annotation and event classification mechanisms that map the data back to concepts meaningful to the user. SimOS has been extensively used to study new computer hardware designs, to analyze application performance, and to study operating systems. We include two case studies that demonstrate how a low-level machine simulator such as SimOS can be used to study large and complex workloads.
Predicting Program Behavior Using Real or Estimated Profiles
, 1990
"... There is a growing interest in optimizations that depend on or benefit from an execution profile that tells where time is spent. How well does a profile from one run describe the behavior of a different run, and how does this compare with the behavior predicted statically by examining the program ..."
Abstract
-
Cited by 140 (4 self)
- Add to MetaCart
There is a growing interest in optimizations that depend on or benefit from an execution profile that tells where time is spent. How well does a profile from one run describe the behavior of a different run, and how does this compare with the behavior predicted statically by examining the program itself ? This paper defines two abstract measures of how well a profile predicts actual behavior. According to these measures, real profiles indeed do better than estimated profiles, usually. A perfect profile from an earlier run with the same data set, however, does better still, sometimes by a factor of two. Using such a profile is unrealistic, and can lead to inflated expectations of a profile-driven optimization. i 1. Introduction Many people have built or speculated on systems that use a run-time profile to guide code optimization. Applications include the selection of variables to promote to registers [7,8], placement of code sequences to improve cache behavior [3,6], and pre...
Trace-Driven Memory Simulation: A Survey
- ACM Computing Surveys
, 2004
"... This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show t ..."
Abstract
-
Cited by 134 (0 self)
- Add to MetaCart
This article surveys and analyzes these developments by establishing criteria for evaluating trace-driven methods, and then applies these criteria to describe, categorize, and compare over 50 trace-driven simulation tools. We discuss the strengths and weaknesses of different approaches and show that no single method is best when all criteria, including accuracy, speed, memory, flexibility, portability, expense, and ease of use are considered. In a concluding section, we examine fundamental limitations to trace-driven simulation, and survey some recent developments in memory simulation that may overcome these bottlenecks
Observing TCP Dynamics in Real Networks
, 1992
"... The behavior of the TCP protocol in simple situations is well-understood, but when multiple connections share a set of network resources the protocol can exhibit surprising phenomena. Earlier studies have identified several such phenomena, and have analyzed them using simulation or observation of co ..."
Abstract
-
Cited by 106 (0 self)
- Add to MetaCart
The behavior of the TCP protocol in simple situations is well-understood, but when multiple connections share a set of network resources the protocol can exhibit surprising phenomena. Earlier studies have identified several such phenomena, and have analyzed them using simulation or observation of contrived situations. This paper shows how, by analyzing traces of a busy segment of the Internet, it is possible to observe these phenomena in "real life" and measure both their frequency and their effects on performance. A TCP implementation might use similar techniques to support rate-based congestion control.
Tradeoffs in Two-Level On-Chip Caching
- In Proceedings of the 21st Annual International Symposium on Computer Architecture
, 1993
"... The performance of two-level on-chip caching is investigated for a range of technology and architecture assumptions. The area and access time of each level of cache is modeled in detail. The results indicate that for most workloads, twolevel cache configurations (with a set-associative second level) ..."
Abstract
-
Cited by 94 (4 self)
- Add to MetaCart
The performance of two-level on-chip caching is investigated for a range of technology and architecture assumptions. The area and access time of each level of cache is modeled in detail. The results indicate that for most workloads, twolevel cache configurations (with a set-associative second level) perform marginally better than single-level cache configurations that require the same chip area once the first-level cache sizes are 64KB or larger. Two-level configurations become even more important in systems with no off-chip cache and in systems in which the memory cells in the first-level caches are multiported and hence larger than those in the second-level cache. Finally, a new replacement policy called two-level exclusive caching is introduced. Two-level exclusive caching improves the performance of two-level caching organizations by increasing the effective associativity and capacity. d i g i t a l Western Research Laboratory 250 University Avenue Palo Alto, California 94301 USA...

