Results 1 
5 of
5
Task Graph Performance Bounds Through Comparison Methods
, 2001
"... When a parallel computation is represented in a formalism that imposes seriesparallel structure on its task graph, it becomes amenable to automated analysis and scheduling. Unfortunately, its execution time will usually also increase as precedence constraints are added to ensure seriesparallel str ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
When a parallel computation is represented in a formalism that imposes seriesparallel structure on its task graph, it becomes amenable to automated analysis and scheduling. Unfortunately, its execution time will usually also increase as precedence constraints are added to ensure seriesparallel structure. Bounding the slowdown ratio would allow an informed tradeoff between the benefits of a restrictive formalism and its cost in loss of performance. This dissertation deals with seriesparallelising task graphs by adding precedence constraints to a task graph, to make the resulting task graph seriesparallel. The weak bounded slowdown conjecture for seriesparallelising task graphs is introduced. This states that the slowdown is bounded if information about the workload can be used to guide the selection of which precedence constraints to add. A theory of best seriesparallelisations is developed to investigate this conjecture. Partial evidence is presented that the weak slowdown bound is likely to be 4/3, and this bound is shown to be tight.
Parallelising Symbolic StateSpace Generators
, 2007
"... Symbolic statespace generators are notoriously hard to parallelise, largely due to the irregular nature of the task. Parallel languages such as Cilk, tailored to irregular problems, have been shown to offer efficient scheduling and load balancing. This paper explores whether Cilk can be used to e ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Symbolic statespace generators are notoriously hard to parallelise, largely due to the irregular nature of the task. Parallel languages such as Cilk, tailored to irregular problems, have been shown to offer efficient scheduling and load balancing. This paper explores whether Cilk can be used to efficiently parallelise a symbolic statespace generator on a sharedmemory architecture. We parallelise the Saturation algorithm implemented in the SMART verification tool using Cilk, and compare it to a parallel implementation of the algorithm using a thread pool. Our experimental studies on a dualprocessor, dualcore PC show that Cilk can improve the runtime efficiency of our parallel algorithm due to its load balancing and scheduling efficiency. We also demonstrate that this incurs a significant memory overhead due to Cilk’s inability to support pipelining, and conclude by pointing to a possible future direction for parallel irregular languages to include pipelining.
Task Scheduling Using a Block Dependency DAG for BlockOriented Sparse Cholesky Factorization
 in: Proceedings of 14th ACM Symposium on Applied Computing
, 2000
"... Blockoriented sparse Cholesky factorization decomposes a sparse matrix into rectangular subblocks; each block can then be handled as a computational unit in order to increase data reuse in a hierarchical memory system. Also, the factorization method increases the degree of concurrency with the red ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Blockoriented sparse Cholesky factorization decomposes a sparse matrix into rectangular subblocks; each block can then be handled as a computational unit in order to increase data reuse in a hierarchical memory system. Also, the factorization method increases the degree of concurrency with the reduction of communication volumes so that it performs more efficiently on a distributedmemory multiprocessor system than the customary columnoriented factorization method. But until now, mapping of blocks to processors has been designed for load balance with restricted communication patterns. In this paper, we represent tasks using a block dependency DAG that shows the execution behavior of block sparse Cholesky factorization in a distributedmemory system. Since the characteristics of tasks for the block Cholesky factorization are different from those of the conventional parallel task model, we propose a new task scheduling algorithm using a block dependency DAG. The proposed algorithm consi...
To Parallelise or To Optimise?
"... Model checking is a popular and successful technique for verifying complex digital systems. Carrying this technique — and its underlying statespace generation algorithms — beyond its current limitations presents itself with a number of alternatives. Our focus is on parallelisation which is made att ..."
Abstract
 Add to MetaCart
(Show Context)
Model checking is a popular and successful technique for verifying complex digital systems. Carrying this technique — and its underlying statespace generation algorithms — beyond its current limitations presents itself with a number of alternatives. Our focus is on parallelisation which is made attractive by the current trend in hardware architectures towards multicore, multiprocessor systems. The main obstacle in this endeavour is that, in particular, symbolic statespace generation algorithms are notoriously hard to parallelise. In this article, we describe the process of taking a sequential symbolic statespace generation algorithm, namely a generic, symbolic BFS algorithm, through a sequence of optimisations that leads up to the Saturation algorithm, and follow the impact these sequential algorithms have on their parallel counterparts. In particular, we develop a parallel version of Saturation, discuss the challenges faced in its design, and conduct extensive experimental studies of its implementation. We employ rigorous analysis tools and techniques for measuring and evaluating parallel overheads and the quality of the parallelisation. The outcome of these studies is that the performance of a parallel symbolic statespace generation algorithm is almost impossible to predict and highly dependent on the model to which it is applied. In most situations, perceivable speedups are hard to achieve, but realworld applications where our technique produces significant improvements do exist. Nevertheless, it appears that time is better invested in optimising sequential symbolic model checking algorithms rather than parallelising them.
Author Retrospective for PYRROS: Static Task Scheduling and Code Generation for Message Passing Multiprocessors
"... Given a program with annotated task parallelism represented as a directed acyclic graph (DAG), the PYRROS project was focused on fast DAG scheduling, code generation and runtime execution on distributed memory architectures. PYRROS scheduling goes through several processing stages including cluste ..."
Abstract
 Add to MetaCart
(Show Context)
Given a program with annotated task parallelism represented as a directed acyclic graph (DAG), the PYRROS project was focused on fast DAG scheduling, code generation and runtime execution on distributed memory architectures. PYRROS scheduling goes through several processing stages including clustering of tasks, cluster mapping, and task execution ordering. Since the publication of the PYRROS project, there have been new advancements in the area of DAG scheduling algorithms, the use of DAG scheduling for irregular and largescale computation, and software system development with annotated task parallelism on modern parallel and cloud architectures. This retrospective describes our experience from this project and the followup work, and reviews representative papers related to DAG scheduling published in the last decade.