Results 1 -
9 of
9
Iterative modulo scheduling: An algorithm for software pipelining loops
- In Proceedings of the 27th Annual International Symposium on Microarchitecture
, 1994
"... Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characte ..."
Abstract
-
Cited by 263 (2 self)
- Add to MetaCart
Modulo scheduling is a framework within which a wide variety of algorithms and heuristics may be defined for software pipelining innermost loops. This paper presents a practical algorithm, iterative modulo scheduling, that is capable of dealing with realistic machine models. This paper also characterizes the algorithm in terms of the quality of the generated schedules as well the computational expense incurred.
Decomposed Software Pipelining: A New Approach to Exploit Instruction Level Parallelism for Loop Programs
- in IFIP
, 1993
"... This paper presents a new view on software pipelining, in which we consider software pipelining as an instruction level transformation from a vector of one-dimension to a matrix of two-dimensions. Thus, the software pipelining problem can be naturally decomposed into two subproblems, one is to deter ..."
Abstract
-
Cited by 28 (7 self)
- Add to MetaCart
This paper presents a new view on software pipelining, in which we consider software pipelining as an instruction level transformation from a vector of one-dimension to a matrix of two-dimensions. Thus, the software pipelining problem can be naturally decomposed into two subproblems, one is to determine the row-numbers of operations in the matrix and another is to determine the column-numbers. Using this view-point as a basis, we develop a new loop scheduling approach, called decomposed software pipelining. Keywords: Loop Scheduling; Instruction Level Parallelism; Software Pipelining; Loop-carried Dependence; Problem Decomposition 1 Introduction Since loop execution dominates total execution time of almost all practical programs, the exploitation of instruction level parallelism for loops is a major challenge in the design of optimizing compilers for high-performance computers such as VLIW, superscalar and pipelined processors [Alm89]. Software pipelining is an effective instruction ...
Cyclic Scheduling on Parallel Processors: An Overview
, 1994
"... A recent research effort has been devoted to cyclic scheduling problems that arise in the design of compilers for parallel architectures as well as in manufacturing systems. This paper is focused on the extensions of the basic cyclic scheduling problem (BCS), that seems to be one of the most suitabl ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
A recent research effort has been devoted to cyclic scheduling problems that arise in the design of compilers for parallel architectures as well as in manufacturing systems. This paper is focused on the extensions of the basic cyclic scheduling problem (BCS), that seems to be one of the most suitable model for parallel processing applications. The properties of the earliest schedule of BCS are recalled and their most recent extensions are presented. Several generalizations of BCS that include resource constraints are then discussed. In particular, structural results and algorithms for periodic versions of job-shop and m-machines problems are reported. 4.1 Introduction Up to now, cyclic scheduling problems have been studied from several points of view depending on the target application. A few theoretical studies have been recently devoted to these problems, in which basic results are often proved independently using different formalisms. We hope that this paper, without pretending to...
Heuristic Algorithms for Scheduling Iterative Task Computations on Distributed Memory Machines
- IEEE Transactions on Parallel and Distributed Systems
, 1995
"... Many partitioned scientific programs can be modeled as iterative execution of computational tasks, represented by iterative task graphs (ITGs). In this paper, we consider the symbolic scheduling of ITGs on distributed memory architectures with nonzero communication overhead without searching the ent ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Many partitioned scientific programs can be modeled as iterative execution of computational tasks, represented by iterative task graphs (ITGs). In this paper, we consider the symbolic scheduling of ITGs on distributed memory architectures with nonzero communication overhead without searching the entire iteration space. An ITG may or may not have dependence cycles and we propose heuristic algorithms for mapping cyclic and acyclic ITGs, which incorporate techniques of software pipelining, graph unfolding, directed acyclic graph (DAG) scheduling and load balancing. We provide an analysis for computing near-optimal unfolding factors and comparing the performance of the proposed heuristic algorithms with the optimal solutions. We also study the stability of run-time performance when weights are not estimated accurately at compile-time. Our experiments study the scheduling performance of solving several scientific computing problems and analyze the effectiveness of optimization techniques us...
Mapping Iterative Task Graphs on Distributed Memory Machines
- Proc. 24th International Conference on Parallel Processing
, 1995
"... This paper addresses the problem of scheduling iterative task graphs on distributed memory architectures with nonzero communication overhead. The proposed algorithm incorporates techniques of software pipelining, graph unfolding and directed acyclic graph scheduling. The goal of optimization is to m ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper addresses the problem of scheduling iterative task graphs on distributed memory architectures with nonzero communication overhead. The proposed algorithm incorporates techniques of software pipelining, graph unfolding and directed acyclic graph scheduling. The goal of optimization is to minimize overall parallel time, which is achieved by balancing processor loads, exploring task parallelism within and across iterations, overlapping communication and computation, and eliminating unnecessary communication. This paper gives a method to execute static schedules, studies the sensitivity of run-time performance when weights are not estimated accurately at compile-time, and presents experimental results to demonstrate the effectiveness of this approach. 1 Introduction Many scientific applications can be viewed as the repeated execution of a set of computational tasks and can be modeled by iterative task graphs (ITGs). Mapping weighted iterative task graphs on messagepassing archi...
The Complexity of a Cyclic Scheduling Problem With Identical Machines and Precedence Constraints
- Proceedings of the First Copper Mountain Conference on Iterative Methods
, 1990
"... We consider a set T of tasks with unit processing times. Each of them must be executed infinitely often. A uniform constraint is defined between two tasks and induces a set of precedence constraints on their successive executions. We limit our study to a subset of uniform constraints correspondi ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We consider a set T of tasks with unit processing times. Each of them must be executed infinitely often. A uniform constraint is defined between two tasks and induces a set of precedence constraints on their successive executions. We limit our study to a subset of uniform constraints corresponding to two hypotheses often verified in practice : each execution of T must end by a special task f , and uniform constraints between executions from different iterations start from f . We have a fixed number of identical machines. The problem is to find a periodic schedule of T which maximizes the throughput. We prove that this problem is NP-hard and show that it is polynomial for two machines. We also present another non trivial polynomial subcase which is a restriction of uniform precedence constraints. Keywords: Cyclic Scheduling, Computational Analysis, Optimization. 1 1 Introduction In a cyclic scheduling problem, a finite set T of generic tasks must be executed infinitely o...
Determining Asynchronous Pipeline Execution Times
- Proc. 9th Workshop on Languages and Compilers for Parallel Computing
, 1996
"... Asynchronous pipelining is a form of parallelism in which processors execute different loop tasks (loop statements) as opposed to different loop iterations. An asynchronous pipeline schedule for a loop is an assignment of loop tasks to processors, plus an order on instances of tasks assigned to the ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Asynchronous pipelining is a form of parallelism in which processors execute different loop tasks (loop statements) as opposed to different loop iterations. An asynchronous pipeline schedule for a loop is an assignment of loop tasks to processors, plus an order on instances of tasks assigned to the same processor. This variant of pipelining is particularly relevant in distributed memory systems (since pipeline control may be distributed across processors), but may also be used in shared memory systems. Accurate estimation of the execution time of a pipeline schedule is needed to determine if pipelining is appropriate for a loop, and to compare alternative schedules. Pipeline execution of n iterations of a loop requires time at most a + bn, for some constants a and b. The coefficient b is the iteration interval of the pipeline schedule, and is the primary measure of the performance of a schedule. The startup time a is a secondary performance measure. We generalize previous work on det...
Determining Asynchronous Acyclic Pipeline Execution Times
- Proc. 10th International Parallel Processing Symposium
, 1996
"... Pipeline execution is a form of parallelism in which subcomputations of a repeated computation, such as statements in the body of a loop, are executed in parallel. A measure of the execution time of a pipeline is needed to determine if pipelining is an effective form of parallelism for a loop, and t ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Pipeline execution is a form of parallelism in which subcomputations of a repeated computation, such as statements in the body of a loop, are executed in parallel. A measure of the execution time of a pipeline is needed to determine if pipelining is an effective form of parallelism for a loop, and to evaluate alternative scheduling choices. We derive a formula for precisely determining the asynchronous pipeline execution time of a loop modeled as iterated execution of an acyclic task graph. The formula can be evaluated in time linear in the number of tasks and edges in the graph. We assume that computation and communication times are fixed and known, interprocessor communication and buffering capability are unbounded, and each task is assigned to a distinct processor. 1. Introduction Pipelining is an "assembly line" form of parallelism in which subcomputations of a repeated computation are executed concurrently. Pipeline parallelism has a long history of application in specific domain...
Integrating Software Pipelining and Graph Scheduling for Iterative Scientific Computations
- Lecture Notes in Computer Science, Proc. of Irregular '95
, 1995
"... . Graph scheduling has been shown effective for solving irregular problems represented as directed acyclic graphs(DAGs) on distributed memory systems. Many scientific applications can also be modeled as iterative task graphs(ITGs). In this paper, we model the SOR computation for solving sparse matri ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
. Graph scheduling has been shown effective for solving irregular problems represented as directed acyclic graphs(DAGs) on distributed memory systems. Many scientific applications can also be modeled as iterative task graphs(ITGs). In this paper, we model the SOR computation for solving sparse matrix systems in terms of ITGs and address the optimization issues for scheduling ITGs when communication overhead is not zero. We present an approach that incorporates techniques of software pipelining and graph scheduling. We demonstrate the effectiveness of our approach in mapping SOR computation and compare it with the multi-coloring method. 1 Introduction Many irregular computations can be represented as Directed Acyclic Graphs (DAG)[3, 15] and graph scheduling has been shown effective for such computations. There are also a class of problems that can be viewed as the repeated execution of a set of computational tasks and can be modeled by iterative task graphs (ITGs). For example, in the ...

