Results 1  10
of
15
A data locality optimizing algorithm
, 1991
"... 1 Introduction As processor speed continues to increase faster than memory speed, optimizations to use the memory hierarchy efficiently become ever more important. Blocking [9] ortiling [18] is a wellknown technique that improves the data locality of numerical algorithms [1, 6, 7, 12, 13].Tiling c ..."
Abstract

Cited by 790 (18 self)
 Add to MetaCart
1 Introduction As processor speed continues to increase faster than memory speed, optimizations to use the memory hierarchy efficiently become ever more important. Blocking [9] ortiling [18] is a wellknown technique that improves the data locality of numerical algorithms [1, 6, 7, 12, 13].Tiling can be used for different levels of memory hierarchy such as physical memory, caches and registers; multileveltiling can be used to achieve locality in multiple levels of the memory hierarchy simultaneously.To illustrate the importance of tiling, consider the example of matrix multiplication: for I1: = 1 to nfor
Automatic Translation of FORTRAN Programs to Vector Form
 ACM Transactions on Programming Languages and Systems
, 1987
"... This paper discusses the theoretical concepts underlying a project at Rice University to develop an automatic translator, called PFC (for Parallel FORTRAN Converter), from FORTRAN to FORTRAN 8x. The Rice project, based initially upon the research of Kuck and others at the University of Illinois [6, ..."
Abstract

Cited by 320 (34 self)
 Add to MetaCart
(Show Context)
This paper discusses the theoretical concepts underlying a project at Rice University to develop an automatic translator, called PFC (for Parallel FORTRAN Converter), from FORTRAN to FORTRAN 8x. The Rice project, based initially upon the research of Kuck and others at the University of Illinois [6, 1721, 24, 32, 36], is a continuation of work begun while on leave at IBM Research in Yorktown Heights, N.Y. Our first implementation was based on the Illinois PARAFRASE compiler [20, 36], but the current version is a completely new program (although it performs many of the same transformations as PARAFRASE). Other projects that have influenced our work are the Texas Instruments ASC compiler [9, 33], the Cray1 FORTRAN compiler [15], and the Massachusetts Computer Associates Vectorizer [22, 25]. The paper is organized into seven sections. Section 2 introduces FORTRAN 8x and gives examples of its use. Section 3 presents an overview of the translation process along with an extended translation example. Section 4 develops the concept of interstatement dependence and shows how it can be applied to the problem of vectorization. Loop carried dependence and loop independent dependence are introduced in this section to extend dependence to multiple statements and multiple loops. Section 5 develops dependencebased algorithms for code generation and transformations for enhancing the parallelism of a statement. Section 6 describes a method for extending the power of data dependence to control statements by the process of IF conversion. Finally, Section 7 details the current state of PFC and our plans for its continued development
Practical Dependence Testing
, 1991
"... Precise and efficient dependence tests are essential to the effectiveness of a parallelizing compiler. This paper proposes a dependence testing scheme based on classifying pairs of subscripted variable references. Exact yet fast dependence tests are presented for certain classes of array references, ..."
Abstract

Cited by 146 (16 self)
 Add to MetaCart
(Show Context)
Precise and efficient dependence tests are essential to the effectiveness of a parallelizing compiler. This paper proposes a dependence testing scheme based on classifying pairs of subscripted variable references. Exact yet fast dependence tests are presented for certain classes of array references, as well as empirical results showing that these references dominate scientific Fortran codes. These dependence tests are being implemented at Rice University in both PFC, a parallelizing compiler, and ParaScope, a parallel programming environment.
Array Expansion
 In ACM Int. Conf. on Supercomputing
, 1988
"... A common problem in restructuring programs for vector or parallel execution is the suppression of false dependencies which originate in the reuse of the same memory cell for unrelated values. The method is simple and well understood in the case of scalars. This paper gives the general solution f ..."
Abstract

Cited by 98 (10 self)
 Add to MetaCart
(Show Context)
A common problem in restructuring programs for vector or parallel execution is the suppression of false dependencies which originate in the reuse of the same memory cell for unrelated values. The method is simple and well understood in the case of scalars. This paper gives the general solution for the case of arrays. The expansion is done in two steps: first, modify all definitions of the offending array in order to obtain the single assignment property. Then, reconstruct the original data flow by adapting all uses of the array. This is done with the help of a new algorithm for solving parametric integer programs. The technique is quite general and may be used for other purposes, including program checking, collecting array predicates, etc... 1 Introduction 1.1 Motivation One of the most striking trends in today's computer architecture is the development of special purpose machines for numerical computations. The idea behind this effort is that by capitalizing on the pecul...
Symbolic Analysis: A Basis for Parallelization, Optimization, and Scheduling of Programs
 In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing
, 1993
"... This paper presents an abstract interpretation framework for parallelizing compilers. Within this framework, symbolic analysis is used to solve various flow analysis problems in a unified way. Symbolic analysis also serves as a basis for code generation optimizations and a tool for derivation of com ..."
Abstract

Cited by 37 (0 self)
 Add to MetaCart
(Show Context)
This paper presents an abstract interpretation framework for parallelizing compilers. Within this framework, symbolic analysis is used to solve various flow analysis problems in a unified way. Symbolic analysis also serves as a basis for code generation optimizations and a tool for derivation of computation cost estimates. A loop scheduling strategy that utilizes symbolic timing information is also presented. 1 Introduction Empirical results indicate that existing parallelizing compilers cause insignificant improvements on the performance of many real application programs [9, 5]. The speedups obtained by manual transformation of these applications [9] show the potential for significantly advancing parallelizing compiler technology. The poor performance of current restructuring compilers can be attributed to two causes: imprecise analysis and inappropriate performancewise transformations. The causes are not completely independent; namely, imprecise information results in inappropriate...
Symbolic Program Analysis and Optimization for Parallelizing Compilers
 Presented at the 5th Annual Workshop on Languages and Compilers for Parallel Computing
, 1992
"... A program flow analysis framework is proposed for parallelizing compilers. Within this framework, symbolic analysis is used as an abstract interpretation technique to solve many of the flow analysis problems in a unified way. Some of these problems are constant propagation, global forward substituti ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
(Show Context)
A program flow analysis framework is proposed for parallelizing compilers. Within this framework, symbolic analysis is used as an abstract interpretation technique to solve many of the flow analysis problems in a unified way. Some of these problems are constant propagation, global forward substitution, detection of loop invariant computations, and induction variable substitution. The solution space of the above problems is much larger than that handled by existing compiler technology. It covers many of the cases in benchmark codes that other parallelizing compilers can not handle. Employing finite difference methods, the symbolic analyzer derives a functional representation of programs, which is used in dependence analysis. A systematic method for generalized strength reduction based on this representation is also presented. This results in an effective scheme for exploitation of parallelism and optimization of the code. Symbolic analysis also serves as a basis for other code generatio...
Automatic Generation of DAG Parallelism
 Proceedings of the ACM SIGPLAN 89 Conference on PRogramming Language Design and Implementation
, 1989
"... This paper extends the notion of shared and private ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
This paper extends the notion of shared and private
Automatic Parallelization Of Prolog Programs
, 1992
"... MACHINE : : : : : : : : : : : : : : : : : : : : : : 18 1.6 DATAFLOW AND DEPENDENCES : : : : : : : : : : : : : : : : : : : : : : : : 22 1.7 SOURCES OF PARALLELISM : : : : : : : : : : : : : : : : : : : : : : : : : : : : 31 2 OR PARALLEL EXECUTION OF PROLOG : : : : : : : : : : : : : : : : : : 33 2.1 T ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart

Detecting ValueBased Scalar Dependence
 Computing. Cornell University
, 1994
"... Precise valuebased data dependence analysis for scalars is useful for advanced compiler optimizations. The new method presented here for flow and output dependence uses Factored Use and Def chains (FUD chains), our interpretation and extension of Static Single Assignment. It is precise with resp ..."
Abstract
 Add to MetaCart
Precise valuebased data dependence analysis for scalars is useful for advanced compiler optimizations. The new method presented here for flow and output dependence uses Factored Use and Def chains (FUD chains), our interpretation and extension of Static Single Assignment. It is precise with respect to conditional control flow and dependence vectors. Our method detects dependences which are independent with respect to arbitrary loop nesting, as well as loopcarried dependences. A loopcarried dependence is further classified as being carried by the previous iteration, with distance 1, or by any previous iteration, with direction !. This precision cannot be achieved by traditional analysis, such as dominator information or reaching definitions. To compute antidependence, we use Factored RedefUse chains, which are related to FUD chains. We are not aware of any prior work which explicitly deals with scalar data dependence utilizing a sparse graph representation. 1 Introduction...