Results 1 - 10
of
11,070
Table 1: Sparse Cholesky, 494bps|E ects of private cache miss rates.
"... In PAGE 4: ... approach less attractive than the lock mechanism by itself. Varying Private Miss Rate Table1 shows the e ects of varying the miss rate for private data and instructions between 0% and 10%. The model for private ( rst-level) cache misses is a statistical one, which makes no distinction be- tween instruction and data references.... ..."
Table 6. Throughput, response time, and hit rate for the browsing mix for 4 Web servers, with a single shared cache, a private cache on each Web server, and a two-level cache consisting of a private cache on each front-end and a shared cache on a dedicated machine
2005
"... In PAGE 10: ...ocating the cache on the front-end vs. locating it elsewhere. This result has to be re- examined when there are multiple front-ends, because of the need to enforce consis- tency between the front-end caches and because of the fact that each front-end cache only sees the traffic going through its front-end. Table6 shows throughput, response time and hit rate for the browsing mix for a single shared cache on a dedicated ma- chine, a private cache on each of the front-ends, and a two-level cache consisting of a private cache on each front-end and a shared cache on a dedicated machine. Table 7... ..."
Cited by 2
Table 7. Throughput, response time, and hit rate for the shopping mix for 4 Web servers, with a single shared cache, a private cache on each Web server, and a two-level cache consisting of a private cache on each front-end and a shared cache on a dedicated machine
2005
Cited by 2
Table 6. Throughput, response time, and hit rate for the shopping mix for 4 Web servers, with a single shared cache, a private cache on each Web server, and a two-level cache consisting of a private cache on each front-end and a shared cache on a dedicated machine
Table 7. Throughput, response time, and hit rate for the brows- ing mix for 4 Web servers, with a single shared cache, a private cache on each Web server, and a two-level cache consisting of a private cache on each front-end and a shared cache on a dedi- cated machine
Table 1. Benchmarks used in our experiments, their input parameters, and cache energy consumptions (for a private cache based system). We observe that in L1 leakage and dynamic energy consumptions are of similar magnitude, whereas in L2 dynamic energy consumption dominates (due to the leakage control mechanism).
"... In PAGE 4: ... We use a set of benchmarks from the SPLASH-2 suite [13]: barnes, ocean1, ocean2, radix, raytrace, and water. The important characteristics of these benchmarks are listed in Table1 . These codes differ from each other in their degree of instruction and data sharing (as pointed out earlier).... ..."
Table II gives further insight into the individual work- loads that are consolidated in this study. The percentage of misses to the last level of private cache that result in an on-chip cache to cache transfer are given as well as the number of cache line sized blocks that are touched during the simulation. These workloads exhibit a range of misses that are satisfied by cache-to-cache transfers and different working set sizes; combining workloads gives insight into the different stresses placed on an architecture.
2007
Cited by 4
Table 3: Important working sets and their growth rates. DS represents the data set size and C is the number of cores. Working set sizes are taken from Figure 3. Values for native input set are analytically derived estimates. Working sets that grow proportional to the number of cores C are aggregated private working sets and can be split up to fit into correspondingly smaller, private caches.
"... In PAGE 16: ... Our results are pre- sented in Figure 3. In Table3 we summarize the important characteristics of the identified working sets. Most work- loads exhibit well-defined working sets with clearly identifi- able points of inflection.... In PAGE 17: ... Data assumes a shared 4-way associative cache with 64 byte lines. WS1 and WS2 refer to important working sets which we analyze in more detail in Table3 . Cache requirements of PARSEC benchmark programs can reach hundreds of megabytes.... In PAGE 19: ... Figure 6 shows a large amount of writes to shared data, but contrary to intuition its share di- minishes rapidly as the number of cores is increased. This effect is caused by a growth of the working sets of x264: Table3 shows that both working set WS1 and WS2 grow pro- portional to the number of cores. WS1 is mostly composed of thread-private data and is the one which is used more intensely.... ..."
Table 5-2. Microarchitecture configuration. single processor core (PE) slipstream memory hierarchy private L1 instr. cache (see memory hier. column) size = 64 KB caches
"... In PAGE 10: ...able 4-4. IR-misprediction rate, recovery latency, slack, and delay buffer length............................................ 45 Table5 -1.... In PAGE 10: ...able 5-1. Qualitative comparisons of duplication and recovery methods. ........................................................ 64 Table5 -2.... In PAGE 10: ...able 5-2. Microarchitecture configuration....................................................................................................... 67 Table5 -3.... In PAGE 73: ...63 5.4 Qualitative Comparisons of Duplication and Recovery Methods Table5 -1 summarizes the advantages, disadvantages, and required hardware support of the two memory duplication methods (top-half) and three memory recovery methods (bottom-half). Notice the four useful measurements introduced in Sections 5.... In PAGE 73: ... Results in Section 5.6 quantify much of the information summarized in Table5 -1. Note that the cache-based value prediction technique is not listed in Table 5-1, but is used in conjunction with either invalidation-based recovery model to reduce the performance impact of recovery-induced misses.... In PAGE 73: ...kipped-write relate to recovery. Results in Section 5.6 quantify much of the information summarized in Table 5-1. Note that the cache-based value prediction technique is not listed in Table5 -1, but is used in conjunction with either invalidation-based recovery model to reduce the performance impact of recovery-induced misses. Figure 5-4 shows the original slipstream microarchitecture with software-based memory duplication.... In PAGE 74: ...64 Table5 -1. Qualitative comparisons of duplication and recovery methods.... In PAGE 76: ... The functional simulator checks retired R-stream control flow and data flow outcomes. Microarchitecture parameters are listed in Table5 -2. The top-left portion of the table lists parameters for individual processors within a CMP.... In PAGE 77: ...nv.-dirty, or inv./inv.-dirty with value prediction The Simplescalar [5] compiler and ISA are used. We use eight SPEC2000 integer benchmarks compiled with -O3 optimization and run with ref input datasets ( Table5 -3). The first billion instructions are skipped and then 100 million instructions are simulated.... In PAGE 78: ...68 have to maintain several full memory images to measure the number of stale, self-repair, persistent-stale, and persistent-skipped-write references (this is a statistics-gathering issue). Table5 -3. Benchmarks.... ..."
Results 1 - 10
of
11,070