Results 1  10
of
38
Maximizing Multiprocessor Performance with the SUIF Compiler
, 1996
"... This paper presents an overview of the SUIF compiler, which automatically parallelizes and optimizes sequential programs for sharedmemory multiprocessors. We describe new technology in this system for locating coarsegrain parallelism and for optimizing multiprocessor memory behavior essential to ..."
Abstract

Cited by 249 (22 self)
 Add to MetaCart
This paper presents an overview of the SUIF compiler, which automatically parallelizes and optimizes sequential programs for sharedmemory multiprocessors. We describe new technology in this system for locating coarsegrain parallelism and for optimizing multiprocessor memory behavior essential to obtaining good multiprocessor performance. These techniques have a significant impact on the performance of half of the NAS and SPECfp95 benchmark suites. In particular, we achieve the highest SPECfp95 ratio to date of 63.9 on an eightprocessor 440MHz Digital AlphaServer. 1 Introduction Affordable sharedmemory multiprocessors can potentially deliver supercomputerlike performance to the general public. Today, these machines are mainly used in a multiprogramming mode, increasing system throughput by running several independent applications in parallel. The multiple processors can also be used together to accelerate the execution of single applications. Automatic parallelization is a promis...
Symbolic Analysis for Parallelizing Compilers
, 1994
"... Symbolic Domain The objects in our abstract symbolic domain are canonical symbolic expressions. A canonical symbolic expression is a lexicographically ordered sequence of symbolic terms. Each symbolic term is in turn a pair of an integer coefficient and a sequence of pairs of pointers to program va ..."
Abstract

Cited by 105 (4 self)
 Add to MetaCart
Symbolic Domain The objects in our abstract symbolic domain are canonical symbolic expressions. A canonical symbolic expression is a lexicographically ordered sequence of symbolic terms. Each symbolic term is in turn a pair of an integer coefficient and a sequence of pairs of pointers to program variables in the program symbol table and their exponents. The latter sequence is also lexicographically ordered. For example, the abstract value of the symbolic expression 2ij+3jk in an environment that i is bound to (1; (( " i ; 1))), j is bound to (1; (( " j ; 1))), and k is bound to (1; (( " k ; 1))) is ((2; (( " i ; 1); ( " j ; 1))); (3; (( " j ; 1); ( " k ; 1)))). In our framework, environment is the abstract analogous of state concept; an environment is a function from program variables to abstract symbolic values. Each environment e associates a canonical symbolic value e x for each variable x 2 V ; it is said that x is bound to e x. An environment might be represented by...
Detecting CoarseGrain Parallelism Using an Interprocedural Parallelizing Compiler
, 1995
"... This paper presents an extensive empirical evaluation of an interprocedural parallelizing compiler, developed as part of the Stanford SUIF compiler system. The system incorporates a comprehensive and integrated collection of analyses, including privatization and reduction recognition for both array ..."
Abstract

Cited by 91 (22 self)
 Add to MetaCart
This paper presents an extensive empirical evaluation of an interprocedural parallelizing compiler, developed as part of the Stanford SUIF compiler system. The system incorporates a comprehensive and integrated collection of analyses, including privatization and reduction recognition for both array and scalar variables, and symbolic analysis of array subscripts. The interprocedural analysis framework is designed to provide analysis results nearly as precise as full inlining but without its associated costs. Experimentation with this system shows that it is capable of detecting coarser granularity of parallelism than previously possible. Specifically, it can parallelize loops that span numerous procedures and hundreds of lines of codes, frequently requiring modifications to array data structures such as privatization and reduction transformations. Measurements from several standard benchmark suites demonstrate that an integrated combination of interprocedural analyses can substantially ...
Combining Analyses, Combining Optimizations
, 1995
"... This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iter ..."
Abstract

Cited by 74 (4 self)
 Add to MetaCart
This thesis presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the framework provides insight into when a combination yields better results. Also presented is a simple iterative algorithm for solving these frameworks. A framework is shown that combines Constant Propagation, Unreachable Code Elimination, Global Congruence Finding and Global Value Numbering. For these optimizations, the iterative algorithm runs in O(n^2) time.
This thesis then presents an O(n log n) algorithm for combining the same optimizations. This technique also finds many of the common subexpressions found by Partial Redundancy Elimination. However, it requires a global code motion pass to make the optimized code correct, also presented. The global code motion algorithm removes some Partially Dead Code as a sideeffect. An implementation demonstrates that the algorithm has shorter compile times than repeated passes of the separate optimizations while producing runtime speedups of 4%–7%.
While global analyses are stronger, peephole analyses can be unexpectedly powerful. This thesis demonstrates parsetime peephole optimizations that find more than 95% of the constants and common subexpressions found by the best combined analysis. Finding constants and common subexpressions while parsing reduces peak intermediate representation size. This speeds up the later global analyses, reducing total compilation time by 10%. In conjunction with global code motion, these peephole optimizations generate excellent code very quickly, a useful feature for compilers that stress compilation speed over code quality.
SUIF Explorer: an interactive and interprocedural parallelizer
, 1999
"... The SUIF Explorer is an interactive parallelization tool that is more effective than previous systems in minimizing the number of lines of code that require programmer assistance. First, the interprocedural analyses in the SUIF system is successful in parallelizing many coarsegrain loops, thus mini ..."
Abstract

Cited by 63 (5 self)
 Add to MetaCart
The SUIF Explorer is an interactive parallelization tool that is more effective than previous systems in minimizing the number of lines of code that require programmer assistance. First, the interprocedural analyses in the SUIF system is successful in parallelizing many coarsegrain loops, thus minimizing the number of spurious dependences requiring attention. Second, the system uses dynamic execution analyzers to identify those important loops that are likely to be parallelizable. Third, the SUIF Explorer is the first to apply program slicing to aid programmers in interactive parallelization. The system guides the programmer in the parallelization process using a set of sophisticated visualization techniques. This paper demonstrates the effectiveness of the SUIF Explorer with three case studies. The programmer was able to speed up all three programs by examining only a small fraction of the program and privatizing a few variables. 1. Introduction Exploiting coarsegrain parallelism i...
Symbolic Range Propagation
 Proceedings of the 9th International Parallel Processing Symposium
, 1994
"... Many analyses and transformations in a parallelizing compiler can benefit from the abilityto compare arbitrary symbolic expressions. In this paper, we describe how one can compare expressions by using symbolic ranges of variables. A range is a lower and upper bound on a variable. We will also des ..."
Abstract

Cited by 56 (9 self)
 Add to MetaCart
Many analyses and transformations in a parallelizing compiler can benefit from the abilityto compare arbitrary symbolic expressions. In this paper, we describe how one can compare expressions by using symbolic ranges of variables. A range is a lower and upper bound on a variable. We will also describe how these ranges can be efficiently computed from the program text. Symbolic range propagation has been implemented in Polaris, a parallelizing compiler being developed at the University of Illinois, and is used for symbolic dependence testing, detection of zerotrip loops, determining array sections possibly referenced by an access, and loop iterationcount estimation.
Telescoping languages: A strategy for automatic generation of scientific problemsolving systems from annotated libraries. www.netlib.org/utk/people/JackDongarra/PAPERS/ Telescope.pdf
, 2000
"... As machines and programs have become more complex, the process of programming applications that can exploit the power of highperformance systems has become more difficult and correspondingly more laborintensive. This has substantially widened the software gap the discrepancy between the need for n ..."
Abstract

Cited by 46 (7 self)
 Add to MetaCart
As machines and programs have become more complex, the process of programming applications that can exploit the power of highperformance systems has become more difficult and correspondingly more laborintensive. This has substantially widened the software gap the discrepancy between the need for new software and the aggregate capacity of the workforce to produce it. This problem has been compounded by the slow growth of programming productivity, especially for highperformance programs, over the past two decades. One way to bridge this gap is to make it possible for end users to develop programs in highlevel domainspecific programming systems. In the past, a major impediment to the acceptance of such systems has been the poor performance of the resulting applications. To address this problem, we are developing a new compilerbased infrastructure, called
Interprocedural Constant Propagation: A Study of Jump Function Implementations
 IN PROCEEDINGS OF THE ACM SIGPLAN 93 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1993
"... An implementation of interprocedural constant propagation must model the transmission of values through each procedure. In the framework proposed by Callahan, Cooper, Kennedy, and Torczon in 1986, this intraprocedural propagation is modeled with a jump function. While Callahan et al. propose several ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
An implementation of interprocedural constant propagation must model the transmission of values through each procedure. In the framework proposed by Callahan, Cooper, Kennedy, and Torczon in 1986, this intraprocedural propagation is modeled with a jump function. While Callahan et al. propose several kinds of jump functions, they give no data to help choose between them. This paper reports on a comparative study of jump function implementations. It shows that different jump functions produce different numbers of useful constants; it suggests a particular function, called the passthrough parameter jump function, as the most costeffective in practice.
Simplification of Array Access Patterns for Compiler Optimizations
, 1994
"... Existing array region representation techniques are sensitive to the complexityofarray subscripts. In general, these techniques are very accurate and efficient for simple subscript expressions, but lose accuracy or require potentially expensive algorithms for complex subscripts. We found that in sci ..."
Abstract

Cited by 29 (6 self)
 Add to MetaCart
Existing array region representation techniques are sensitive to the complexityofarray subscripts. In general, these techniques are very accurate and efficient for simple subscript expressions, but lose accuracy or require potentially expensive algorithms for complex subscripts. We found that in scientific applications, many access patterns are simple even when the subscript expressions are complex. In this work, we present a new, general array access representation and define operations for it. This allows us to aggregate and simplify the representation enough that precise region operations may be applied to enable compiler optimizations. Our experiments show that these techniques hold promise for speeding up applications.
Efficient Symbolic Analysis for Parallelizing Compilers and Performance Estimators
, 1998
"... . Symbolic analysis is of paramount importance for parallelizing compilers and performance estimators to examine symbolic expressions with program unknowns such as machine and problem sizes and to solve queries based on systems of constraints (equalities and inequalities) . This paper describes nove ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
. Symbolic analysis is of paramount importance for parallelizing compilers and performance estimators to examine symbolic expressions with program unknowns such as machine and problem sizes and to solve queries based on systems of constraints (equalities and inequalities) . This paper describes novel techniques for counting the number of solutions to a system of constraints, simplifying systems of constraints, computing lower and upper bounds of symbolic expressions, and determining the relationship between symbolic expressions. All techniques target wide classes of linear and nonlinear symbolic expressions and systems of constraints. Our techniques have been implemented and are used as part of a parallelizing compiler and a performance estimator to support analysis and optimization of parallel programs. Various examples and experiments demonstrate the effectiveness of our symbolic analysis techniques. Keywords: Program analysis, symbolic expressions, comparing symbolic expressions, ...