Results 1 - 10
of
10
Design and Evaluation of a Compiler Algorithm for Prefetching
- in Proceedings of the Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
, 1992
"... Software-controlled data prefetching is a promising technique for improving the performance of the memory subsystem to match today's high-performance processors. While prefetching is useful in hiding the latency, issuing prefetches incurs an instruction overhead and can increase the load on the memo ..."
Abstract
-
Cited by 451 (21 self)
- Add to MetaCart
Software-controlled data prefetching is a promising technique for improving the performance of the memory subsystem to match today's high-performance processors. While prefetching is useful in hiding the latency, issuing prefetches incurs an instruction overhead and can increase the load on the memory subsystem. As a result, care must be taken to ensure that such overheads do not exceed the benefits.
Compiler-directed Data Prefetching in Multiprocessors with Memory Hierarchies
- In International Conference on Supercomputing
, 1990
"... Memory hierarchies are used by multiprocessor systems to reduce large memory access times. It is necessary to automatically manage such a hierarchy, to obtain effective memory utilization. In this paper, we discuss the various issues involved in obtaining an optimal memory management strategy for a ..."
Abstract
-
Cited by 87 (7 self)
- Add to MetaCart
Memory hierarchies are used by multiprocessor systems to reduce large memory access times. It is necessary to automatically manage such a hierarchy, to obtain effective memory utilization. In this paper, we discuss the various issues involved in obtaining an optimal memory management strategy for a memory hierarchy. We present an algorithm for finding the earliest point in a program that a block of data can be prefetched. This determination is based on the control and data dependences in the program. Such a method is an integral part of more general memory management algorithms. We demonstrate our method's potential by using static analysis to estimate the performance improvement afforded by our prefetching strategy and to analyze the reference patterns in a set of Fortran benchmarks. We also study the effectiveness of prefetching in a realistic shared-memory system using an RTL-level simulator and real codes. This differs from previous studies by considering prefetching benefits in th...
Adaptive And Integrated Data Cache Prefetching For Shared-Memory Multiprocessors
, 1995
"... ... yield a better overall scheme. We give a detailed description of the compiler analysis necessary for integrated prefetching. The performance of integrated prefetching is compared to software and hardware prefetching, and we show the effect of adapting the scheduling of prefetches at compile ti ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
... yield a better overall scheme. We give a detailed description of the compiler analysis necessary for integrated prefetching. The performance of integrated prefetching is compared to software and hardware prefetching, and we show the effect of adapting the scheduling of prefetches at compile time. Finally, we discuss approaches that combine integrated prefetching with the adaptive hardware prefetching technique.
An Integrated Hardware/Software Solution for Effective Management of Local Storage in High-Performance Systems
- In Proceedings of the International Conference on Parallel Processing, volume II
, 1991
"... The potential of high-performance systems, especially vector and parallel machines, is generally limited by the bandwidth between processors and memory. To achieve the performance of which these machines should be capable, greater emphasis must be placed on optimizing array accesses. We propose a pr ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
The potential of high-performance systems, especially vector and parallel machines, is generally limited by the bandwidth between processors and memory. To achieve the performance of which these machines should be capable, greater emphasis must be placed on optimizing array accesses. We propose a practical, integrated hardware/software strategy for increasing the effectiveness of local storage management. Our scheme provides many of the advantages of both compile-time and run-time memory management techniques. In this paper, we describe our local storage facility: the priority data cache. We also describe compile-time techniques for easily and effectively utilizing this level of local storage. 1 Introduction The potential of high-performance systems is generally limited by the bandwidth between processors and memory. To achieve the performance of which these machines should be capable, greater emphasis must be placed on optimizing array accesses. At compile time, many safe and profita...
Efficient Integration of Compiler-directed Cache Coherence and Data Prefetching
, 2000
"... Cache coherence enforcement and memory latency reduction and hiding are very important and challenging problems in the design of large-scale distributed sharedmemory (DSM) multiprocessors. We propose an integrated framework to solve these problems through a compilerdirected cache coherence scheme ca ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Cache coherence enforcement and memory latency reduction and hiding are very important and challenging problems in the design of large-scale distributed sharedmemory (DSM) multiprocessors. We propose an integrated framework to solve these problems through a compilerdirected cache coherence scheme called the Cache Coherence with Data Prefetching (CCDP) scheme. The CCDP scheme enforces cache coherence by prefetching the potentially stale references in a parallel program. It also prefetches the nonstale references to hide their memory latencies. To optimize the performance of the CCDP scheme, some prefetch hardware support is provided to efficiently handle these two forms of data prefetching operations. We also developed the compiler techniques utilized by the CCDP scheme for stale reference detection, prefetch target analysis and prefetch scheduling. We evaluated the performance of the CCDP scheme via execution-driven simulations of several applications from the SPEC CFP95 and the Perfe...
A Compiler-Directed Cache Coherence Scheme Using Data Prefetching
- Proceedings of the 1997 International Parallel Processing Symposium
, 1997
"... Cache coherence enforcement and memory latency reduction and hiding are very important problems in the design of large-scale shared-memory multiprocessors. In this paper, we propose a compiler-directed cache coherence scheme which makes use of data prefetching. The Cache Coherence with Data Prefetch ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Cache coherence enforcement and memory latency reduction and hiding are very important problems in the design of large-scale shared-memory multiprocessors. In this paper, we propose a compiler-directed cache coherence scheme which makes use of data prefetching. The Cache Coherence with Data Prefetching (CCDP) scheme uses compiler analysis techniques to identify potentially-stale data references, which are references to invalid copies of cached data. The key idea of the CCDP scheme is to enforce cache coherence by prefetching the up-to-date data corresponding to these potentially-stale references from the main memory. Application case studies were conducted to gain a quantitative idea of the performance potential of the CCDP scheme on a real system. We applied the CCDP scheme on four benchmark programs from the SPEC CFP95 and CFP92 suites, and executed them on the Cray T3D. The experimental results show that for the programs studied, our scheme provides significant performance improveme...
Maintaining Cache Coherence through Compiler-Directed Data Prefetching
, 1998
"... : In this paper, we propose a compiler-directed cache coherence scheme which makes use of data prefetching to enforce cache coherence in large-scale distributed shared-memory (DSM) systems. The Cache Coherence with Data Prefetching (CCDP) scheme uses compiler analyses to identify potentially-stale ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
: In this paper, we propose a compiler-directed cache coherence scheme which makes use of data prefetching to enforce cache coherence in large-scale distributed shared-memory (DSM) systems. The Cache Coherence with Data Prefetching (CCDP) scheme uses compiler analyses to identify potentially-stale and non-stale data references in a parallel program and enforces cache coherence by prefetching the potentially-stale references. In this manner, the CCDP scheme brings up-to-date data into the caches to avoid stale references and also hides the latency of these memory accesses. Furthermore, the scheme also prefetches the non-stale references to hide their memory latencies. To evaluate the performance impact of the CCDP scheme on a real system, we applied the scheme on five applications from the SPEC CFP95 and CFP92 benchmark suites, and executed the resulting codes on the Cray T3D. The experimental results indicate that for all of the applications studied, our scheme provides significant pe...
Using A Cache In Place Of A Cedar-Like Vector Prefetch Unit
, 1993
"... CONTENTS CHAPTER PAGE 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Related Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 SYSTEM ORGANIZATION : : : : : : : : : : : : : : : : : : : : : : : 10 2.1 Introduction : : : : : : : : : : : : : : : : : : ..."
Abstract
- Add to MetaCart
CONTENTS CHAPTER PAGE 1 INTRODUCTION : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Related Work : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2 SYSTEM ORGANIZATION : : : : : : : : : : : : : : : : : : : : : : : 10 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 2.2 Overall System Architecture : : : : : : : : : : : : : : : : : : : : 11 2.3 Processor Model : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13 2.3.1 Using a Vector Prefetch Unit : : : : : : : : : : : : : : : : 14 2.3.2 Using a Cache : : : : : : : : : : : : : : : : : : : : : : : : : 15 2.3.3 Overlapping of Vector Load and Computation Instructions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 16 2.4 Cache Model<F31.
Compiler Support for Maintaining Cache Coherence Using Data Prefetching (Extended Abstract)
"... ) Hock-Beng Lim 1 , Lynn Choi 2 and Pen-Chung Yew 3 1 Center for Supercomputing R & D, Univ. of Illinois, Urbana, IL 61801 2 Microprocessor Group, Intel Corporation, Santa Clara, CA 95095 3 Dept. of Computer Science, Univ. of Minnesota, Minneapolis, MN 55455 1 Introduction and Motivation ..."
Abstract
- Add to MetaCart
) Hock-Beng Lim 1 , Lynn Choi 2 and Pen-Chung Yew 3 1 Center for Supercomputing R & D, Univ. of Illinois, Urbana, IL 61801 2 Microprocessor Group, Intel Corporation, Santa Clara, CA 95095 3 Dept. of Computer Science, Univ. of Minnesota, Minneapolis, MN 55455 1 Introduction and Motivation A major performance limitation in large-scale shared-memory multiprocessors is the large remote main memory latencies encountered by the processors. Private caches for processors have been used to reduce the number of main memory accesses. However, the use of private caches leads to the classic cache coherence problem. Compiler-directed cache coherence schemes [2] offer a viable solution to this problem for large-scale shared-memory multiprocessors. Although compilerdirected cache coherence schemes can improve multiprocessor cache performance, they cannot totally eliminate main memory accesses. Thus, data prefetching schemes have been developed to hide the memory latency. Actually, data pr...

