Results 1 - 10
of
14
Iterative modulo scheduling: An algorithm for software pipelining loops
- In Proceedings of the 27th Annual International Symposium on Microarchitecture
, 1994
"... Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characte ..."
Abstract
-
Cited by 263 (2 self)
- Add to MetaCart
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Instruction-Level Parallel Processing: History, Overview and Perspective
, 1992
"... Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a muc ..."
Abstract
-
Cited by 166 (0 self)
- Add to MetaCart
Instruction-level Parallelism CILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Although ILP has appeared in the highest performance uniprocessors for the past 30 years, the 1980s saw it become a much more significant force in computer design. Several systems were built, and sold commercially, which pushed ILP far beyond where it had been before, both in terms of the amount of ILP offered and in the central role ILP played in the design of the system. By the end of the decade, advanced microprocessor design at all major CPU manufacturers had incorporated ILP, and new techniques for ILP have become a popular topic at academic conferences. This article provides an overview and historical perspective of the field of ILP and its development over the past three decades.
Lifetime-Sensitive Modulo Scheduling
- In Proc. of the ACM SIGPLAN '93 Conf. on Programming Language Design and Implementation
, 1993
"... This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many scientific benchmarks. The empirical results---when me ..."
Abstract
-
Cited by 129 (0 self)
- Add to MetaCart
This paper shows how to software pipeline a loop for minimal register pressure without sacrificing the loop's minimum execution time. This novel bidirectional slack-scheduling method has been implemented in a FORTRAN compiler and tested on many scientific benchmarks. The empirical results---when measured against an absolute lower bound on execution time, and against a novel schedule-independent absolute lower bound on register pressure---indicate nearoptimal performance. 1 Introduction Software pipelining increases a loop's throughput by overlapping the loop's iterations; that is, by initiating successive iterations before prior iterations complete. With sufficient overlap, a functional unit can be saturated, at which point the loop initiates iterations at the maximum possible rate. To find an overlapped schedule, a compiler must represent the complex resource constraints that can arise. Efficiently representing these constraints is especially difficult when adjacent iterations do n...
HPL-PD architecture specification: Version 1.1
, 2000
"... instruction-level parallelism, parametric architecture, EPIC, VLIW, superscalar, speculative execution, predicated execution, programmatic cache control, run-time memory disambiguation, branch architecture HPL-PD is a parametric processor architecture conceived for research in instruction-level para ..."
Abstract
-
Cited by 52 (6 self)
- Add to MetaCart
instruction-level parallelism, parametric architecture, EPIC, VLIW, superscalar, speculative execution, predicated execution, programmatic cache control, run-time memory disambiguation, branch architecture HPL-PD is a parametric processor architecture conceived for research in instruction-level parallelism (ILP). Its main purpose is to serve as a vehicle to investigate processor architectures having significant parallelism and to investigate the compiler technology needed to effectively exploit such architectures. The architecture is parametric in that it admits machines of different composition and scale, especially with respect to the nature and amount of parallelism offered. The architecture admits EPIC, VLIW and superscalar implementations so as to provide a basis for understanding the merits and demerits of these different styles of implementation. This report describes those parts of the architecture that are common to all machines in the family. It introduces the basic concepts such as the structure of an instruction, instruction execution semantics, the types of register files, etc. and describes the semantics of the operation repertoire.
Height Reduction of Control Recurrences for ILP Processors
, 1994
"... The performance of applications executing on processors with instruction level parallelism is often limited by control and data dependences. Performance bottlenecks caused by dependences can frequently be eliminated through transformations which reduce the height of critical paths through the progra ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
The performance of applications executing on processors with instruction level parallelism is often limited by control and data dependences. Performance bottlenecks caused by dependences can frequently be eliminated through transformations which reduce the height of critical paths through the program. While height reduction techniques are not always helpful, their utility can be demonstrated in a broad range of important situations. This paper focuses
Path-Sensitive Value-Flow Analysis
- In Symposium on Principles of Programming Languages
, 1998
"... When analyzing programs for value recomputation, one faces the problem of naming the value that flows between equivalent computations with different lexical names. This paper presents a data-flow analysis framework that overcomes this problem by synthesizing a name space tailored for tracing the val ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
When analyzing programs for value recomputation, one faces the problem of naming the value that flows between equivalent computations with different lexical names. This paper presents a data-flow analysis framework that overcomes this problem by synthesizing a name space tailored for tracing the values whose flow is of interest to a given data-flow problem. Furthermore, to exploit recomputation of a value with multiple, synonymous names, path-sensitive value numbering on the synthetic name space is developed. Optimizations that rely on value flow to detect redundant computations, such as partial redundancy elimination and constant propagation, become more powerful when phrased in our framework. The framework is built on a new program representation called Value Name Graph (VNG) which gains its power from integrating three orthogonal techniques: symbolic back-substitution, value numbering, and data-flow analysis. Our experiments with the implementation show that analysis on the VNG is p...
Machine-description driven compilers for EPIC and VLIW processors. Design Automation for Embedded Systems
, 1999
"... retargetable compilers, table-driven compilers, machine description, processor description, instruction-level parallelism, EPIC processors, VLIW processors, EPIC compilers, VLIW compilers, code generation, scheduling, register allocation In the past, due to the restricted gate count available on an ..."
Abstract
-
Cited by 18 (9 self)
- Add to MetaCart
retargetable compilers, table-driven compilers, machine description, processor description, instruction-level parallelism, EPIC processors, VLIW processors, EPIC compilers, VLIW compilers, code generation, scheduling, register allocation In the past, due to the restricted gate count available on an inexpensive chip, embedded DSPs have had limited parallelism, few registers and irregular, incomplete interconnectivity. More recently, with increasing levels of integration, embedded VLIW processors have started to appear. Such processors typically have higher levels of instruction-level parallelism, more registers, and a relatively regular interconnect between the registers and the functional units. The central challenges faced by a code generator for an EPIC (Explicitly Parallel Instruction Computing) or VLIW processor are quite different from those for the earlier DSPs and, consequently, so is the structure of a code generator that is designed to be easily retargetable. In this report, we explain the nature of the challenges faced by an EPIC or VLIW compiler and present a strategy for performing code generation in an incremental fashion that is best suited to generating high-quality code efficiently. We also describe the Operation Binding Lattice, a formal model for incrementally binding the opcodes and register assignments in an EPIC code generator. As we show, this reflects the phase structure of the EPIC code generator. It also defines the structure of the machine-description database, which is queried by the code generator for the information that it needs about the target processor. Lastly, we discuss general features of our implementation of these ideas and techniques in Elcor, our EPIC compiler research infrastructure.
Elcor’s Machine Description System: Version 3.0
, 1998
"... retargetable compilers, table-driven compilers, machine description, processor description, instruction-level parallelism, EPIC processors, VLIW processors, ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
retargetable compilers, table-driven compilers, machine description, processor description, instruction-level parallelism, EPIC processors, VLIW processors,
Acceleration of First and Higher Order Recurrences on Processors with Instruction Level Parallelism
- In Sixth International Workshop on Languages and Compilers for Parallel Computing
, 1993
"... This report describes parallelization techniques for accelerating a broad class of recurrences on processors with instruction level parallelism. We introduce a new technique, called blocked back-substitution, which has lower operation count and higher performance than previous methods. The blocked b ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
This report describes parallelization techniques for accelerating a broad class of recurrences on processors with instruction level parallelism. We introduce a new technique, called blocked back-substitution, which has lower operation count and higher performance than previous methods. The blocked back-substitution technique requires unrolling and non-symmetric optimization of innermost loop iterations. We present metrics to characterize the performance of software-pipelined loops and compare these metrics for a range of height reduction techniques and processor architectures.
Acceleration of Algebraic Recurrences on Processors with Instruction Level Parallelism
- In Proceedings of LCPC-6
, 1993
"... Architectures with instruction level parallelism such as VLIW and superscalar processors provide parallelism in the form of a limited number of pipelined functional units. For these architectures, recurrence height reduction techniques provide significant speedups when they are properly applied. Thi ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
Architectures with instruction level parallelism such as VLIW and superscalar processors provide parallelism in the form of a limited number of pipelined functional units. For these architectures, recurrence height reduction techniques provide significant speedups when they are properly applied. This paper introduces a new technique, called blocked back-substitution,

