Results 1 - 10
of
21
The Superblock: An effective technique for VLIW and superscalar compilation
- THE JOURNAL OF SUPERCOMPUTING
, 1993
"... A compiler for VLIW and superscalar processors must expose sufficient instruction-level parallelism (ILP) to eddectively utilize the parallel hardware. However, ILP within basic blocks is extremely limited for control-intensive programs. We have developed a set of techniques for exploiting ILP acros ..."
Abstract
-
Cited by 249 (26 self)
- Add to MetaCart
A compiler for VLIW and superscalar processors must expose sufficient instruction-level parallelism (ILP) to eddectively utilize the parallel hardware. However, ILP within basic blocks is extremely limited for control-intensive programs. We have developed a set of techniques for exploiting ILP across basic block boundaries. These techniques are based on a novel structure called the superblock. The superblock enables the optimizer and scheduler to extract more ILP along the important execution paths by systematically removing constraints due to the unimportant paths. Superblock optimization and scheduling have been implemented in the IMPACT-I compiler. This implementation gives us a unique opportunity to fully understand the issues involved in incorporating these techniques into a real compiler. Superblock optimizations and scheduling are shown to be useful while taking into account a variety of architectural features.
IMPACT: An architectural framework for multiple-instruction-issue processors
- in Proceedings of the 18th International Symposium on Computer Architecture
, 1991
"... The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate e cient code for concurrent hardware. In the IM-PACT project, we havedeveloped IMPACT-I, a highly optimizing C compiler to exploit instruction level concurrency. The optimization ca ..."
Abstract
-
Cited by 203 (41 self)
- Add to MetaCart
The performance of multiple-instruction-issue processors can be severely limited by the compiler's ability to generate e cient code for concurrent hardware. In the IM-PACT project, we havedeveloped IMPACT-I, a highly optimizing C compiler to exploit instruction level concurrency. The optimization capabilities of the IMPACT-I C compiler are summarized in this paper. Using the IMPACT-I C compiler, we ran experiments to analyze the performance of multiple-instruction-issue processors executing some important non-numerical programs. The multiple-instruction-issue processors achieve solid speedup over high-performance single-instruction-issue processors. We ran experiments to characterize the following architectural design issues: code scheduling model, instruction issue rate, memory load latency, and function unit resource limitations. Based on the experimental results, we propose the IMPACT Architectural Framework, a set of architectural features that best support the IMPACT-I C compiler to generate e cient code for multiple-instructionissue processors. By supporting these architectural features, multiple-instruction-issue implementations of existing and new architectures receive immediate compilation support from the IMPACT-I C compiler. 1
Using Profile Information to Assist Classic Code Optimizations
- SOFTWARE---PRACTICE AND EXPERIENCE
, 1991
"... This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new components, an execution profiler and a profile-based code optimizer, which are not commonly found in tra ..."
Abstract
-
Cited by 116 (13 self)
- Add to MetaCart
This paper describes the design and implementation of an optimizing compiler that automatically generates profile information to assist classic code optimizations. This compiler contains two new components, an execution profiler and a profile-based code optimizer, which are not commonly found in traditional optimizing compilers. The execution profiler inserts probes into the input program, executes the input program for several inputs, accumulates profile information and supplies this information to the optimizer. The profile-based code optimizer uses the profile information to expose new optimization opportunities that are not visible to traditional global optimization methods. Experimental results show that the profile-based code optimizer significantly improves the performance of production C programs that have already been optimized by a high-quality global code optimizer
Profile-guided automatic inline expansion for C programs
- SOFTWARE PRACTICE AND EXPERIENCE
, 1992
"... This paper describes critical implementation issues that must be addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard prevention, expansion sequence control, and program modi cation. An automatic inter- le inliner that uses pro ..."
Abstract
-
Cited by 109 (5 self)
- Add to MetaCart
This paper describes critical implementation issues that must be addressed to develop a fully automatic inliner. These issues are: integration into a compiler, program representation, hazard prevention, expansion sequence control, and program modi cation. An automatic inter- le inliner that uses pro le information has been implemented and integrated into an optimizing C compiler. The experimental results show that this inliner achieves signi cant speedups for production C programs.
A framework for unrestricted whole-program optimization
- In ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation
, 2006
"... Procedures have long been the basic units of compilation in conventional optimization frameworks. However, procedures are typically formed to serve software engineering rather than optimization goals, arbitrarily constraining code transformations. Techniques, such as aggressive inlining and interpro ..."
Abstract
-
Cited by 24 (8 self)
- Add to MetaCart
Procedures have long been the basic units of compilation in conventional optimization frameworks. However, procedures are typically formed to serve software engineering rather than optimization goals, arbitrarily constraining code transformations. Techniques, such as aggressive inlining and interprocedural optimization, have been developed to alleviate this problem, but, due to code growth and compile time issues, these can be applied only sparingly. This paper introduces the Procedure Boundary Elimination (PBE) compilation framework, which allows unrestricted whole-program optimization. PBE allows all intra-procedural optimizations and analyses to operate on arbitrary subgraphs of the program, regardless of the original procedure boundaries and without resorting to inlining. In order to control compilation time, PBE also introduces novel extensions of region formation and encapsulation. PBE enables targeted code specialization, which recovers the specialization benefits of inlining while keeping code growth in check. This paper shows that PBE attains better performance than inlining with half the code growth.
Comparing Static And Dynamic Code Scheduling for Multiple-Instruction-Issue Processors
- In Proc. of the 24th International Symposium on Microarchitecture
, 1991
"... This paper examines two alternative approaches to supporting code scheduling for multiple-instruction-issue processors. One is to provide a set of non-trapping instructions so that the compiler can perform aggressive static code scheduling. The application of this approach to existing commercial arc ..."
Abstract
-
Cited by 18 (2 self)
- Add to MetaCart
This paper examines two alternative approaches to supporting code scheduling for multiple-instruction-issue processors. One is to provide a set of non-trapping instructions so that the compiler can perform aggressive static code scheduling. The application of this approach to existing commercial architectures typically requires extending the instruction set. The other approach is to support out-of-order execution in the microarchitecture so that the hardware can perform aggressive dynamic code scheduling. This approach usually does not require modifying the instruction set but requires complex hardware support. In this paper, we analyze the performance of the two alternative approaches using a set of important nonnumerical C benchmark programs. A distinguishing feature of the experiment is that the code for the dynamic approach has been optimized and scheduled as much as allowed by the architecture. The hardware is only responsible for the additional reordering that cannot be performed...
REGION-BASED COMPILATION
, 1996
"... The increasing amount of instruction-level parallelism (ILP) required to fully utilize high issue-rate processors has forced the compiler to perform more aggressive analysis, optimization, parallelization and scheduling on the input programs. Yet, the compiler designer must scale back the use of agg ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
The increasing amount of instruction-level parallelism (ILP) required to fully utilize high issue-rate processors has forced the compiler to perform more aggressive analysis, optimization, parallelization and scheduling on the input programs. Yet, the compiler designer must scale back the use of aggressive transformations in order to contain compile time and memory usage. The root of the problem lies in the function-oriented framework assumed in conventional compilers. Traditionally the compilation process has been built using the function as a compilation unit, because the function provides a convenient partition of the program. However, the size and contents of a function may not provide the best environment for aggressive analysis and optimization. This dissertation presents a technique in which the compiler is allowed to repartition the program into more desirable compilation units, called regions. Placing the compiler in control of the size and contents of the compilation unit reduces the importance of the algorithmic complexity of the applied transformations, allowing more aggressive transformations to be applied while reducing compilation time. The region concept has been traditionally applied within an ILP compiler only in the context of code scheduling. This dissertation proposes extending the concept of region partitioning to
Profile-Assisted Instruction Scheduling
- International Journal of Parallel Programming
, 1994
"... Instruction schedulers for superscalar and VLIW processors must expose sufficient instruction-level parallelism to the hardware in order to achieve high performance. Traditional compiler instruction scheduling techniques typically take into account the constraints imposed by all execution scenarios ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Instruction schedulers for superscalar and VLIW processors must expose sufficient instruction-level parallelism to the hardware in order to achieve high performance. Traditional compiler instruction scheduling techniques typically take into account the constraints imposed by all execution scenarios in the program. However, there are additional opportunities to increase instruction-level parallelism for the frequent execution scenarios at the expense of the less frequent ones. Profile information identifies these important execution scenarios in a program. In this paper, two major categories of profile information are studied: control-flow and memory-dependence. Profile-assisted code scheduling techniques have been incorporated into the IMPACT-I compiler. These techniques are acyclic global scheduling and software pipelining. This paper describes the scheduling algorithms, highlights the modifications required to use profile information, and explains the hardware and compiler support fo...

