Results 1  10
of
29
Complexity Results for Throughput and Latency Optimization of Replicated and Dataparallel Workflow
 ALGORITHMICA
, 2007
"... Mapping applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline or fork graphs. Several antagonist criteria should be optimized for workflow applications, such as throughput and latency (or a combination). In this paper, we consider a si ..."
Abstract

Cited by 19 (16 self)
 Add to MetaCart
Mapping applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline or fork graphs. Several antagonist criteria should be optimized for workflow applications, such as throughput and latency (or a combination). In this paper, we consider a simplified model with no communication cost, and we provide an exhaustive list of complexity results for different problem instances. Pipeline or fork stages can be replicated in order to increase the throughput by sending consecutive data sets onto different processors. In some cases, stages can also be dataparallelized, i.e. the computation of one single data set is shared between several processors. This leads to a decrease of the latency and an increase of the throughput. Some instances of this simple model are shown to be NPhard, thereby exposing the inherent complexity of the mapping problem. We provide polynomial algorithms for other problem instances. Altogether, we provide solid theoretical foundations for the study of monocriterion or bicriteria mapping optimization problems.
Broadcast trees for heterogeneous platforms
 19TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS’05
, 2005
"... ..."
Multicriteria scheduling of pipeline workflows
 In HeteroPar’07, the 6th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
, 2007
"... apport de recherche ISSN 02496399 ISRN INRIA/RR6232FR+ENGMulticriteria scheduling of pipeline workflows ..."
Abstract

Cited by 12 (12 self)
 Add to MetaCart
apport de recherche ISSN 02496399 ISRN INRIA/RR6232FR+ENGMulticriteria scheduling of pipeline workflows
Mapping Linear Workflows with Computation/Communication Overlap
"... This paper presents theoretical results related to mapping and scheduling linear workflows onto heterogeneous platforms. We use a realistic architectural model with bounded communication capabilities and full computation/communication overlap. This model is representative of current multithreaded s ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
This paper presents theoretical results related to mapping and scheduling linear workflows onto heterogeneous platforms. We use a realistic architectural model with bounded communication capabilities and full computation/communication overlap. This model is representative of current multithreaded systems. In these workflow applications, the goal is often to maximize throughput or to minimize latency. We present several complexity results related to both these criteria. To be precise, we prove that maximizing the throughput is NPcomplete even for homogeneous platforms and minimizing the latency is NPcomplete for heterogeneous platforms. Moreover, we present an approximation algorithm for throughput maximization for linear chain applications on homogeneous platforms, and an approximation algorithm for latency minimization for linear chain applications on all platforms where communication is homogeneous (the processor speeds can differ). In addition, we present algorithms for several important special cases for linear chain applications. Finally, we consider the implications of adding feedback loops to linear chain applications.
Optimizing Latency and Reliability of Pipeline Workflow Applications
, 2008
"... Mapping applications onto heterogeneous platforms is a difficult challenge, even for simple application patterns such as pipeline graphs. The problem is even more complex when processors are subject to failure during the execution of the application. In this paper, we study the complexity of a bicr ..."
Abstract

Cited by 10 (8 self)
 Add to MetaCart
Mapping applications onto heterogeneous platforms is a difficult challenge, even for simple application patterns such as pipeline graphs. The problem is even more complex when processors are subject to failure during the execution of the application. In this paper, we study the complexity of a bicriteria mapping which aims at optimizing the latency (i.e., the response time) and the reliability (i.e., the probability that the computation will be successful) of the application. Latency is minimized by using faster processors, while reliability is increased by replicating computations on a set of processors. However, replication increases latency (additional communications, slower processors). The application fails to be executed only if all the processors fail during execution. While simple polynomial algorithms can be found for fully homogeneous platforms, the problem becomes NPhard when tackling heterogeneous platforms. This is yet another illustration of the additional complexity added by heterogeneity.
MatrixProduct on Heterogeneous MasterWorker Platforms
"... This paper is focused on designing efficient parallel matrixproduct algorithms for heterogeneous masterworker platforms. While matrixproduct is wellunderstood for homogeneous 2Darrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses th ..."
Abstract

Cited by 7 (6 self)
 Add to MetaCart
This paper is focused on designing efficient parallel matrixproduct algorithms for heterogeneous masterworker platforms. While matrixproduct is wellunderstood for homogeneous 2Darrays of processors (e.g., Cannon algorithm and ScaLAPACK outer product algorithm), there are three key hypotheses that render our work original and innovative: Centralized data. We assume that all matrix files originate from, and must be returned to, the master. The master distributes data and computations to the workers (while in ScaLAPACK, input and output matrices are supposed to be equally distributed among participating resources beforehand). Typically, our approach is useful in the context of speeding up MATLAB or SCILAB clients running on a server (which acts as the master and initial repository of files). Heterogeneous starshaped platforms. We target fully heterogeneous platforms, where computational resources have different computing powers. Also, the workers are connected to the master by links of different capacities. This framework is realistic when deploying the application from the server, which is responsible for enrolling authorized resources. Limited memory. As we investigate the parallelization of large problems, we cannot assume that full matrix column blocks can be stored in the worker memories and be reused for subsequent updates (as in ScaLAPACK). We have devised efficient algorithms for resource selection (deciding which workers to enroll) and communication ordering (both for input and result messages), and we report a set of numerical experiments on a platform at our site. The experiments show that our matrixproduct algorithm has smaller execution times than existing ones, while it also uses fewer resources.
Highly Parallel Sparse MatrixMatrix Multiplication
, 2010
"... Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Generalized sparse matrixmatrix multiplication is a key primitive for many high performance graph algorithms as well as some linear solvers such as multigrid. We present the first parallel algorithms that achieve increasing speedups for an unbounded number of processors. Our algorithms are based on twodimensional block distribution of sparse matrices where serial sections use a novel hypersparse kernel for scalability. We give a stateoftheart MPI implementation of one of our algorithms. Our experiments show scaling up to thousands of processors on a variety of test scenarios.
On the complexity of mapping linear chain applications onto heterogeneous platforms
 Parallel Processing Letters (PPL
, 2009
"... In this paper, we explore the problem of mapping simple application patterns onto largescale heterogeneous platforms. An important optimization criteria that should be considered in such a framework is the latency, or makespan, which measures the response time of the system in order to process one ..."
Abstract

Cited by 5 (5 self)
 Add to MetaCart
In this paper, we explore the problem of mapping simple application patterns onto largescale heterogeneous platforms. An important optimization criteria that should be considered in such a framework is the latency, or makespan, which measures the response time of the system in order to process one single data set entirely. We focus in this work on linear chain applications, which are representative of a broad class of reallife applications. For such applications, we can consider onetoone mappings, in which each stage is mapped onto a single processor. However, in order to reduce the communication cost, it seems natural to group stages into intervals. The interval mapping problem can be solved in a straightforward way if the platform has homogeneous communications: the whole chain is grouped into a single interval, which in turn is mapped onto the fastest processor. But the problem becomes harder when considering a fully heterogeneous platform. Indeed, we prove the NPcompleteness of this problem. Furthermore, we prove that neither the interval mapping problem nor the similar onetoone mapping problem can be approximated by any constant factor (unless P=NP).