Results 1 -
5 of
5
A Unified Vector/Scalar Floating-Point Architecture
, 1989
"... research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There is a second research laboratory located in Palo Al ..."
Abstract
-
Cited by 31 (9 self)
- Add to MetaCart
research relevant to the design and application of high performance scientific computers. We test our ideas by designing, building, and using real systems. The systems we build are research prototypes; they are not intended to become products. There is a second research laboratory located in Palo Alto, the Systems Research Center (SRC). Other Digital research groups are located in Paris (PRL) and in Cambridge,
Cache Performance in Vector Supercomputers
, 1994
"... Traditional supercomputers use a flat multi-bank SRAM memory organization to supply high bandwidth at low latency. Most other computers use a hierarchical organization with a small SRAM cache and slower, cheaper DRAM for main memory. Such systems rely heavily on data locality for achieving optimum p ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
Traditional supercomputers use a flat multi-bank SRAM memory organization to supply high bandwidth at low latency. Most other computers use a hierarchical organization with a small SRAM cache and slower, cheaper DRAM for main memory. Such systems rely heavily on data locality for achieving optimum performance. This paper evaluates cache-based memory systems for vector supercomputers. We develop a simulation model for a cache-based version of the Cray Research C90 and use the NAS parallel benchmarks to provide a large scale workload. We show that while caches reduce memory traffic and improve the performance of plain DRAM memory, they still lag behind cacheless SRAM. We identify the performance bottlenecks in DRAM-based memory systems and quantify their contribution to program performance degradation. We find the data fetch strategy to be a significant parameter affecting performance, evaluate the performance of several fetch policies, and show that small fetch sizes improve performance...
An Empirical Study of Cross-loop Reuse in the NAS benchmarks
, 1995
"... This paper describes an empirical study designed to quantify the level of crossloop reuse occurring in a set of scientific Fortran programs, the NAS Benchmarks. Cross-loop reuse takes place when a set of data items or cache lines are accessed in a given loop nest and then accessed again within some ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper describes an empirical study designed to quantify the level of crossloop reuse occurring in a set of scientific Fortran programs, the NAS Benchmarks. Cross-loop reuse takes place when a set of data items or cache lines are accessed in a given loop nest and then accessed again within some subsequent portion of the program (usually another outer loop nest). In contrast to intra-loop reuse, which takes place during the execution of a single loop nest, cross-loop reuse is not always detectable by traditional compile-time reuse analysis techniques. In this study, the benchmark programs are instrumented and run through a cache simulator. The simulator gathers statistics on cross-loop reuse using a novel classification scheme that clearly identifies the different types of reuse. According to the simulation data, the level of cross-loop reuse varies widely from program to program, and depends greatly on the problem size and cache size. Some programs exhibit almost no cross-loop reu...
Using A Cache In Place Of A Cedar-Like Vector Prefetch Unit
, 1993
"... CONTENTS CHAPTER PAGE 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Related Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 SYSTEM ORGANIZATION : : : : : : : : : : : : : : : : : : : : : : : 10 2.1 Introduction : : : : : : : : : : : : : : : : : : ..."
Abstract
- Add to MetaCart
CONTENTS CHAPTER PAGE 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Related Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 SYSTEM ORGANIZATION : : : : : : : : : : : : : : : : : : : : : : : 10 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2 Overall System Architecture : : : : : : : : : : : : : : : : : : : : 11 2.3 Processor Model : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 2.3.1 Using a Vector Prefetch Unit : : : : : : : : : : : : : : : : 14 2.3.2 Using a Cache : : : : : : : : : : : : : : : : : : : : : : : : : 15 2.3.3 Overlapping of Vector Load and Computation Instructions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16 2.4 Cache Model<F31.

