Results 1 -
7 of
7
Evaluation Of Programs And Parallelizing Compilers Using Dynamic Analysis Techniques
, 1993
"... results for an unlimited number of processors. Upper and lower bounds of the inherent parallelism, for the case of limited processors, can be derived from the processor activity histogram, which records the number of concurrent operations during each time period. Stress analysis is a derivative of ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
results for an unlimited number of processors. Upper and lower bounds of the inherent parallelism, for the case of limited processors, can be derived from the processor activity histogram, which records the number of concurrent operations during each time period. Stress analysis is a derivative of critical path analysis that determines the locations in a program that have the largest contribution to the critical path. Inductions are a computation that introduce an internal stress. A specific method is presented which measures the effects of removing the serializing effects of inductions on the inherent parallelism. Dependence analysis is crucial to the effective operation of parallelizing compilers. Static and dynamic evaluation of the effectiveness of compile-time data dependence analysis is presented, the evaluation compares the existing techniques against each other, and against the theoretical optimal results. Special attention is paid to the dependences which serialize interproce
Reducing The Impact Of Register Pressure On Software Pipelined Loops
, 1996
"... This work deals with the problems caused by the high register requirements of software pipelined loops. The main contributions of this work are: * Register requirements of software pipelined loops are evaluated. * Several heuristics to perform register-constrained software pipelining are proposed * ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
This work deals with the problems caused by the high register requirements of software pipelined loops. The main contributions of this work are: * Register requirements of software pipelined loops are evaluated. * Several heuristics to perform register-constrained software pipelining are proposed * The effects of register requirements on performance under register constraints are evaluated * HRMS is proposed to perform software pipelining with resource constraints and reduced register requirements * Two new register file organizations are proposed to allow for a large number of registerse with low area cost and fast access time.
Global Value Propagation Through Value Flow Graph and Its Use in Dependence Analysis
"... As recent studies show, state-of-the-art parallelizing compilers produce no noticeable speedup for 9 out of 12 PERFECT benchmark codes, while the speedup that was reached by manually applying certain automatable techniques ranges from 10 to 50. In this paper we introduce the Global Value Propagation ..."
Abstract
- Add to MetaCart
As recent studies show, state-of-the-art parallelizing compilers produce no noticeable speedup for 9 out of 12 PERFECT benchmark codes, while the speedup that was reached by manually applying certain automatable techniques ranges from 10 to 50. In this paper we introduce the Global Value Propagation algorithm that unifies several of these techniques. Global propagation is performed using program abstraction called Value Flow Graph (VFG). VFG is an acyclic graph in which vertices and arcs are parametrically specified using F-relations. The distinctive features of our propagation algorithm are: (1) It propagates not only values carried by scalar variables, but also values carried by individual array elements. (2) We do not have to transform a program in order to use propagation results in program analysis. In this paper we focus on use of the VFG and global value propagation in array dataflow analysis. F-relations are used to represent values produced by uninterpreted function symbols th...
Enhancing Array Dataflow Dependence Analysis with On-Demand Global Value Propagation
- In Proc. International Conference on Supercomputing
, 1995
"... As recent studies show, state-of-the-art parallelizing compilers produce no noticeable speedup for 9 out of 12 PERFECT benchmark codes, while the speedup that was reached by manually applying certain automatable constraint propagation techniques ranges from 10 to 50 times. In this paper we show h ..."
Abstract
- Add to MetaCart
As recent studies show, state-of-the-art parallelizing compilers produce no noticeable speedup for 9 out of 12 PERFECT benchmark codes, while the speedup that was reached by manually applying certain automatable constraint propagation techniques ranges from 10 to 50 times. In this paper we show how a subset of these much-desired techniques can be automated. We describe an algorithm that is a combination of exact array dataflow dependence analysis and on-demand global value propagation. Propagating values to the references that make the dependence problem non-affine, the algorithm in many cases can affinize the dependence problem. Affine dependence problems result in exact dependence information and therefore lead to new opportunities in propagation. We also present three algorithms for global value propagation and discuss their merits and applications. The propagation is performed on the acyclic parametrized value flow graph of the program represented by F-relations (also in...
Memory Latency Rediction via Data Prefetching and Data Forwarding in Shared Memory Multiprocessors
, 1994
"... This dissertation considers the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency due to interprocessor communication in cache coherent, shared memory multiprocessors. The benefits of prefetching and forwarding are considered for large, numerical appl ..."
Abstract
- Add to MetaCart
This dissertation considers the use of data prefetching and an alternative mechanism, data forwarding, for reducing memory latency due to interprocessor communication in cache coherent, shared memory multiprocessors. The benefits of prefetching and forwarding are considered for large, numerical application codes with loop-level and vector parallelism. Data prefetching is applied to these applications using two different multiprocessor prefetching algorithms implemented within a parallelizing compiler. Data forwarding considers array references involved in communication-related accesses between successive parallel loops, rather than within a single loop nest. A hybrid prefetching and forwarding scheme and a compiler algorithm for data forwarding are also presented

