Results 1 -
6 of
6
Iterative modulo scheduling: An algorithm for software pipelining loops
- In Proceedings of the 27th Annual International Symposium on Microarchitecture
, 1994
"... Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characte ..."
Abstract
-
Cited by 263 (2 self)
- Add to MetaCart
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Unrolling-Based Optimizations for Modulo Scheduling
- in Proceedings of the 28th International Symposium on Microarchitecture
, 1995
"... Modulo scheduling is a method for overlapping successive iterations of a loop in order to find sufficient instruction-level parallelism to fully utilize high-issue-rate processors. The achieved throughput modulo scheduled loop depends on the resource requirements, the dependence pattern, and the reg ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
Modulo scheduling is a method for overlapping successive iterations of a loop in order to find sufficient instruction-level parallelism to fully utilize high-issue-rate processors. The achieved throughput modulo scheduled loop depends on the resource requirements, the dependence pattern, and the register requirements of the computation in the loop body. Traditionally, unrolling followed by acyclic scheduling of the unrolled body has been an alternative to modulo scheduling. However, there are benefits to unrolling even if the loop is to be modulo scheduled. Unrolling can improve the throughput by allowing a smaller non-integral effective initiation interval to be achieved. After unrolling, optimizations can be applied to the loop that reduce both the resource requirements and the height of the critical paths. Together, unrolling and unrolling-based optimizations can enable the completion of multiple iterations per cycle in some cases. This paper describes the benefits of unrolling and ...
Height Reduction of Control Recurrences for ILP Processors
, 1994
"... The performance of applications executing on processors with instruction level parallelism is often limited by control and data dependences. Performance bottlenecks caused by dependences can frequently be eliminated through transformations which reduce the height of critical paths through the progra ..."
Abstract
-
Cited by 29 (1 self)
- Add to MetaCart
The performance of applications executing on processors with instruction level parallelism is often limited by control and data dependences. Performance bottlenecks caused by dependences can frequently be eliminated through transformations which reduce the height of critical paths through the program. While height reduction techniques are not always helpful, their utility can be demonstrated in a broad range of important situations. This paper focuses
The Program Decision Logic Approach to Predicated Execution
- IN PROCEEDINGS OF THE 26TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE
, 1999
"... Modern compilers must expose sufficient amounts of Instruction-Level Parallelism (ILP) to achieve the promised performance increases of superscalar and VLIW processors. One of the major impediments to achieving this goal has been inefficient programmatic control flow. Historically, the compiler has ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
Modern compilers must expose sufficient amounts of Instruction-Level Parallelism (ILP) to achieve the promised performance increases of superscalar and VLIW processors. One of the major impediments to achieving this goal has been inefficient programmatic control flow. Historically, the compiler has translated the programmer's original control structure directly into assembly code with conditional branch instructions. Eliminating inefficiencies in handling branch instructions and exploiting ILP has been the subject of much research. However, traditional branch handling techniques cannot significantly alter the program's inherent control structure. The advent of predication as a program control representation has enabled compilers to manipulate control in a form more closely related to the underlying program logic. This work takes full advantage of the predication paradigm by abstracting the program control flow into a logical form referred to as a program decision logic network. This network is modeled as a Boolean equation and minimized using modified versions of logic synthesis techniques. After minimization, the more efficient version of the program's original control flow is re-expressed in predicated code. Furthermore, this paper proposes extensions to the HPL PlayDoh predication model in support of more effective predicate decision logic network minimization. Finally, this paper shows the ability of the mechanisms presented to overcome limits on ILP previously imposed by rigid program control structure.
Parallelization of Control Recurrences for ILP Processors
- International Journal of Parallel Processing
, 1994
"... The performance of applications executing on processors with instruction level parallelism is often limited by control and data dependences. Performance bottlenecks caused by dependences can frequently be eliminated through transformations which reduce the height of critical paths through the progra ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The performance of applications executing on processors with instruction level parallelism is often limited by control and data dependences. Performance bottlenecks caused by dependences can frequently be eliminated through transformations which reduce the height of critical paths through the program. While height reduction techniques are not always helpful, their utility can be demonstrated in an increasingly broad range of important situations.
Efficient Hardware Generation for Dynamic Programming Problems
"... Abstract—Optimization problems are known to be very hard problems requiring a lot of CPU time. Dynamic Programming (DP) is a powerful method, which is typically used to compute large number of discrete optimization problems. This paper presents an improved approach called RVEP (RVE with precomputati ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—Optimization problems are known to be very hard problems requiring a lot of CPU time. Dynamic Programming (DP) is a powerful method, which is typically used to compute large number of discrete optimization problems. This paper presents an improved approach called RVEP (RVE with precomputation) that allows to design highly parallel hardware accelerators for wide range of DP problems. We applied our approach to three representative DP problems. We estimate speedups to 200 % compared to a pure dataflow approach and at least 25 % to previous RVE approach. I.

