Results 1  10
of
11
Iterative modulo scheduling: An algorithm for software pipelining loops
 In Proceedings of the 27th Annual International Symposium on Microarchitecture
, 1994
"... Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characte ..."
Abstract

Cited by 281 (3 self)
 Add to MetaCart
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Iterative Modulo Scheduling
, 1995
"... Modulo scheduling is a framework within which algorithms for the software pipelining of innermost loops may be defined. The framework specifies a set of constraints that must be met in order to achieve a legal modulo schedule. A wide variety of algorithms and heuristics can be defined within this fr ..."
Abstract

Cited by 84 (6 self)
 Add to MetaCart
Modulo scheduling is a framework within which algorithms for the software pipelining of innermost loops may be defined. The framework specifies a set of constraints that must be met in order to achieve a legal modulo schedule. A wide variety of algorithms and heuristics can be defined within this framework. Little work has been done to evaluate and compare alternative algorithms and heuristics for modulo scheduling from the viewpoints of schedule quality as well as computational complexity. This, along with a vague and unfounded perception that modulo scheduling is computationally expensive as well as difficult to implement, have inhibited its incorporation into product compilers. This report presents iterative modulo scheduling, a practical algorithm that is capable of dealing with realistic machine models. The report also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
A Linear Algebra Framework for Static HPF Code Distribution
, 1995
"... High Performance Fortran (hpf) was developed to support data parallel programming for simd and mimd machines with distributed memory. The programmer is provided a familiar uniform logical address space and specifies the data distribution by directives. The compiler then exploits these directives to ..."
Abstract

Cited by 75 (7 self)
 Add to MetaCart
High Performance Fortran (hpf) was developed to support data parallel programming for simd and mimd machines with distributed memory. The programmer is provided a familiar uniform logical address space and specifies the data distribution by directives. The compiler then exploits these directives to allocate arrays in the local memories, to assign computations to elementary processors and to migrate data between processors when required. We show here that linear algebra is a powerful framework to encode Hpf directives and to synthesize distributed code with spaceefficient array allocation, tight loop bounds and vectorized communications for INDEPENDENT loops. The generated code includes traditional optimizations such as guard elimination, message vectorization and aggregation, overlap analysis... The systematic use of an affine framework makes it possible to prove the compilation scheme correct. An early version of this paper was presented at the Fourth International Workshop on Comp...
Decomposed Software Pipelining: A New Approach to Exploit Instruction Level Parallelism for Loop Programs
 in IFIP
, 1993
"... This paper presents a new view on software pipelining, in which we consider software pipelining as an instruction level transformation from a vector of onedimension to a matrix of twodimensions. Thus, the software pipelining problem can be naturally decomposed into two subproblems, one is to deter ..."
Abstract

Cited by 27 (7 self)
 Add to MetaCart
This paper presents a new view on software pipelining, in which we consider software pipelining as an instruction level transformation from a vector of onedimension to a matrix of twodimensions. Thus, the software pipelining problem can be naturally decomposed into two subproblems, one is to determine the rownumbers of operations in the matrix and another is to determine the columnnumbers. Using this viewpoint as a basis, we develop a new loop scheduling approach, called decomposed software pipelining. Keywords: Loop Scheduling; Instruction Level Parallelism; Software Pipelining; Loopcarried Dependence; Problem Decomposition 1 Introduction Since loop execution dominates total execution time of almost all practical programs, the exploitation of instruction level parallelism for loops is a major challenge in the design of optimizing compilers for highperformance computers such as VLIW, superscalar and pipelined processors [Alm89]. Software pipelining is an effective instruction ...
Cyclic Scheduling on Parallel Processors: An Overview
, 1994
"... A recent research effort has been devoted to cyclic scheduling problems that arise in the design of compilers for parallel architectures as well as in manufacturing systems. This paper is focused on the extensions of the basic cyclic scheduling problem (BCS), that seems to be one of the most suitabl ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
A recent research effort has been devoted to cyclic scheduling problems that arise in the design of compilers for parallel architectures as well as in manufacturing systems. This paper is focused on the extensions of the basic cyclic scheduling problem (BCS), that seems to be one of the most suitable model for parallel processing applications. The properties of the earliest schedule of BCS are recalled and their most recent extensions are presented. Several generalizations of BCS that include resource constraints are then discussed. In particular, structural results and algorithms for periodic versions of jobshop and mmachines problems are reported. 4.1 Introduction Up to now, cyclic scheduling problems have been studied from several points of view depending on the target application. A few theoretical studies have been recently devoted to these problems, in which basic results are often proved independently using different formalisms. We hope that this paper, without pretending to...
Heuristic Algorithms for Scheduling Iterative Task Computations on Distributed Memory Machines
 IEEE Transactions on Parallel and Distributed Systems
, 1995
"... Many partitioned scientific programs can be modeled as iterative execution of computational tasks, represented by iterative task graphs (ITGs). In this paper, we consider the symbolic scheduling of ITGs on distributed memory architectures with nonzero communication overhead without searching the ent ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
Many partitioned scientific programs can be modeled as iterative execution of computational tasks, represented by iterative task graphs (ITGs). In this paper, we consider the symbolic scheduling of ITGs on distributed memory architectures with nonzero communication overhead without searching the entire iteration space. An ITG may or may not have dependence cycles and we propose heuristic algorithms for mapping cyclic and acyclic ITGs, which incorporate techniques of software pipelining, graph unfolding, directed acyclic graph (DAG) scheduling and load balancing. We provide an analysis for computing nearoptimal unfolding factors and comparing the performance of the proposed heuristic algorithms with the optimal solutions. We also study the stability of runtime performance when weights are not estimated accurately at compiletime. Our experiments study the scheduling performance of solving several scientific computing problems and analyze the effectiveness of optimization techniques us...
Mapping Iterative Task Graphs on Distributed Memory Machines
 Proc. 24th International Conference on Parallel Processing
, 1995
"... This paper addresses the problem of scheduling iterative task graphs on distributed memory architectures with nonzero communication overhead. The proposed algorithm incorporates techniques of software pipelining, graph unfolding and directed acyclic graph scheduling. The goal of optimization is to m ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
This paper addresses the problem of scheduling iterative task graphs on distributed memory architectures with nonzero communication overhead. The proposed algorithm incorporates techniques of software pipelining, graph unfolding and directed acyclic graph scheduling. The goal of optimization is to minimize overall parallel time, which is achieved by balancing processor loads, exploring task parallelism within and across iterations, overlapping communication and computation, and eliminating unnecessary communication. This paper gives a method to execute static schedules, studies the sensitivity of runtime performance when weights are not estimated accurately at compiletime, and presents experimental results to demonstrate the effectiveness of this approach. 1 Introduction Many scientific applications can be viewed as the repeated execution of a set of computational tasks and can be modeled by iterative task graphs (ITGs). Mapping weighted iterative task graphs on messagepassing archi...
The Complexity of a Cyclic Scheduling Problem With Identical Machines and Precedence Constraints
 Proceedings of the First Copper Mountain Conference on Iterative Methods
, 1990
"... We consider a set T of tasks with unit processing times. Each of them must be executed infinitely often. A uniform constraint is defined between two tasks and induces a set of precedence constraints on their successive executions. We limit our study to a subset of uniform constraints correspondi ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
We consider a set T of tasks with unit processing times. Each of them must be executed infinitely often. A uniform constraint is defined between two tasks and induces a set of precedence constraints on their successive executions. We limit our study to a subset of uniform constraints corresponding to two hypotheses often verified in practice : each execution of T must end by a special task f , and uniform constraints between executions from different iterations start from f . We have a fixed number of identical machines. The problem is to find a periodic schedule of T which maximizes the throughput. We prove that this problem is NPhard and show that it is polynomial for two machines. We also present another non trivial polynomial subcase which is a restriction of uniform precedence constraints. Keywords: Cyclic Scheduling, Computational Analysis, Optimization. 1 1 Introduction In a cyclic scheduling problem, a finite set T of generic tasks must be executed infinitely o...
Determining Asynchronous Pipeline Execution Times
 Proc. 9th Workshop on Languages and Compilers for Parallel Computing
, 1996
"... Asynchronous pipelining is a form of parallelism in which processors execute different loop tasks (loop statements) as opposed to different loop iterations. An asynchronous pipeline schedule for a loop is an assignment of loop tasks to processors, plus an order on instances of tasks assigned to the ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Asynchronous pipelining is a form of parallelism in which processors execute different loop tasks (loop statements) as opposed to different loop iterations. An asynchronous pipeline schedule for a loop is an assignment of loop tasks to processors, plus an order on instances of tasks assigned to the same processor. This variant of pipelining is particularly relevant in distributed memory systems (since pipeline control may be distributed across processors), but may also be used in shared memory systems. Accurate estimation of the execution time of a pipeline schedule is needed to determine if pipelining is appropriate for a loop, and to compare alternative schedules. Pipeline execution of n iterations of a loop requires time at most a + bn, for some constants a and b. The coefficient b is the iteration interval of the pipeline schedule, and is the primary measure of the performance of a schedule. The startup time a is a secondary performance measure. We generalize previous work on det...
Determining Asynchronous Acyclic Pipeline Execution Times
 Proc. 10th International Parallel Processing Symposium
, 1996
"... Pipeline execution is a form of parallelism in which subcomputations of a repeated computation, such as statements in the body of a loop, are executed in parallel. A measure of the execution time of a pipeline is needed to determine if pipelining is an effective form of parallelism for a loop, and t ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Pipeline execution is a form of parallelism in which subcomputations of a repeated computation, such as statements in the body of a loop, are executed in parallel. A measure of the execution time of a pipeline is needed to determine if pipelining is an effective form of parallelism for a loop, and to evaluate alternative scheduling choices. We derive a formula for precisely determining the asynchronous pipeline execution time of a loop modeled as iterated execution of an acyclic task graph. The formula can be evaluated in time linear in the number of tasks and edges in the graph. We assume that computation and communication times are fixed and known, interprocessor communication and buffering capability are unbounded, and each task is assigned to a distinct processor. 1. Introduction Pipelining is an "assembly line" form of parallelism in which subcomputations of a repeated computation are executed concurrently. Pipeline parallelism has a long history of application in specific domain...