Results 1 - 10
of
20
CPR: Mixed Task and Data Parallel Scheduling for Distributed Systems
- In Proceedings of the 15th International Parallel and Distributed Symposium
, 2001
"... It is well-known that mixing task and data parallelism to solve large computational applications often yields better speedups compared to either applying pure task parallelism or pure data parallelism. Typically, the applications are modeled in terms of a dependence graph of coarse-grain data-parall ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
It is well-known that mixing task and data parallelism to solve large computational applications often yields better speedups compared to either applying pure task parallelism or pure data parallelism. Typically, the applications are modeled in terms of a dependence graph of coarse-grain data-parallel tasks, called a data-parallel task graph. In this paper we present a new compile-time heuristic, named Critical Path Reduction (CPR), for scheduling data-parallel task graphs. Experimental results based on graphs derived from real problems as well as synthetic graphs, show that CPR achieves higher speedup compared to other wellknown existing scheduling algorithms, at the expense of some higher cost. These results are also confirmed by performance measurements of two real applications (i.e., complex matrix multiplication and Strassen matrix multiplication) running on a cluster of workstations.
Scheduling Strategies for Mixed Data and Task Parallelism on Heterogeneous Processor Grids
, 2002
"... In this paper, we consider the execution of a complex application on a heterogeneous "grid" computing platform. The complex application consists of a suite of identical, independent problems to be solved. In turn, each problem consists of a set of tasks. There are dependences (precedence constraints ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
In this paper, we consider the execution of a complex application on a heterogeneous "grid" computing platform. The complex application consists of a suite of identical, independent problems to be solved. In turn, each problem consists of a set of tasks. There are dependences (precedence constraints) between these tasks. A typical example is the repeated execution of the same algorithm on several distinct data samples. We use a non-oriented graph to model...
A Coordination Language for Mixed Task and Data Parallel Programs
- In proceedings of 3rd Annual ACM Symposium on Applied Computing (SAC'99
, 1999
"... We present a coordination model to derive efficient implementations using mixed task and data parallelism. The model provides a specification language in which the programmer defines the available degree of parallelism and a coordination language in which the programmer determines how the potential ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We present a coordination model to derive efficient implementations using mixed task and data parallelism. The model provides a specification language in which the programmer defines the available degree of parallelism and a coordination language in which the programmer determines how the potential parallelism is exploited for a specific implementation. Specification programs depend only on the algorithm whereas coordination programs may be different for different target machines in order to obtain the best performance. The transformation of a specification program into a coordination program is performed in well-defined steps where each step selects a specific implementation detail. Therefore, the transformation can be automated, thus guaranteeing a correct target program. We demonstrate the usefulness of the model by applying it to solution methods for differential equations.
A Data and Task Parallel Image Processing Environment
- Parallel Computing
, 2001
"... The paper presents a data and task paxallel environment for parallelizing low-level image processing applications on distributed memory systems. Image processing operators axe paxallelized by data decomposition using algorithmic skeletons. At the application level we use task decomposition, base ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
The paper presents a data and task paxallel environment for parallelizing low-level image processing applications on distributed memory systems. Image processing operators axe paxallelized by data decomposition using algorithmic skeletons. At the application level we use task decomposition, based on the Image Application Task Graph.
COLT_HPF, a Run-Time Support for the High-Level Coordination of HPF Tasks
- of HPF Tasks, Concurrency: Practice and Experience, Vol
, 1999
"... ions (SDAs), using a syntax similar to that of HPF. Each instance of an SDA encapsulates distributed data and methods, where methods have exclusive access to encapsulated data. Data parallel tasks are thus started by creating instances of specific SDAs, while the inter-task co-operation takes place ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
ions (SDAs), using a syntax similar to that of HPF. Each instance of an SDA encapsulates distributed data and methods, where methods have exclusive access to encapsulated data. Data parallel tasks are thus started by creating instances of specific SDAs, while the inter-task co-operation takes place by means of remote synchronous (or asynchronous) method invocations. Note that SDA instances are started dynamically by a so called coordination task, so that the run-time that implements inter-task communication has to control passing distributed data structures from one task to another, including any possible remapping that might be needed. The run-time accomplishes this through a handshaking protocol, which exchanges the distribution information about the actual argument (on the caller SDA) and the formal one (on the callee SDA) of a given method. Note that this handshaking protocol is very similar to the COLT HPF protocol to create a channel between two tasks. Finally, even though in th...
Distributed Bucket Processing: a Paradigm embedded in a framework for the parallel processing of pixel sets
- Delft University of Technology
"... Large datasets, such as pixels and voxels in 2D and 3D images can usually be reduced during their processing to smaller subsets with less datapoints. Such subsets can be the objects in the image, features-edges or corners- or more general, regions of interest. For instance, the transformation from a ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Large datasets, such as pixels and voxels in 2D and 3D images can usually be reduced during their processing to smaller subsets with less datapoints. Such subsets can be the objects in the image, features-edges or corners- or more general, regions of interest. For instance, the transformation from a set of datapoints representing an image, to one or more subsets of datapoints representing objects in the image, is due to a segmentation algorithm and may involve both the selection of datapoints as well as a change in datastructure. The massive number of pixels in the original image, points to a data parallel approach, whereas the processing of the various objects in the image is more suitable for task parallelism. In this paper we introduce a framework for parallel image processing and we focus on an array of buckets that can be distributed over a number of processors and that contains pointers to the data from the dataset. The benefit of this approach is that the processor activity remains focussed on the datapoints that need processing and, moreover, that the load can be distributed over many processors, even in a heterogeneous computer architecture. Although the method is generally applicable in the processing of sets, in this paper we obtain our examples from the domain of image processing. As this method yields speed-ups that are data-dependent, we derived a run-time evaluation that is able to determine if the use of distributed buckets is beneficial.
Evaluation of a semi-static approach to mapping dynamic iterative tasks onto heterogeneous computing systems
- J. Parallel Distrib. Comput
, 1999
"... Abstract—To minimize the execution time of an iterative application in a heterogeneous parallel computing environment, an appropriate mapping scheme is needed for matching and scheduling the subtasks of the application onto the processors. When some of the characteristics of the application subtasks ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract—To minimize the execution time of an iterative application in a heterogeneous parallel computing environment, an appropriate mapping scheme is needed for matching and scheduling the subtasks of the application onto the processors. When some of the characteristics of the application subtasks are unknown a priori and will change from iteration to iteration during execution-time, a semi-static methodology can be employed, that starts with an initial mapping but dynamically decides whether to perform a remapping between iterations of the application, by observing the effects of these dynamic parameters on the application’s execution time. The objective of this study is to implement and evaluate such a semi-static methodology. For analyzing the effectiveness of the proposed scheme, it is compared with two extreme approaches: a completely dynamic approach using a fast mapping heuristic and an ideal approach that uses a genetic algorithm on-line but ignores the time for remapping. Experimental results indicate that the semi-static approach outperforms the dynamic approach and is reasonably close to the ideal but infeasible approach. 1
Scheduling Mixed-Parallel Applications with Advance Reservations
, 2008
"... This paper investigates the scheduling of mixed-parallel applications, which exhibit both task and data parallelism, in advance reservations settings. Both the problem of minimizing application turn-around time and that of meeting a deadline are studied. For each several scheduling algorithms are pr ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
This paper investigates the scheduling of mixed-parallel applications, which exhibit both task and data parallelism, in advance reservations settings. Both the problem of minimizing application turn-around time and that of meeting a deadline are studied. For each several scheduling algorithms are proposed, some of which borrow ideas from previously published work in non-reservation settings. Algorithms are compared in simulation over a wide range of application and reservation scenarios. The main finding is that schedules computed using the previously published CPA algorithm can be adapted to advance reservation settings, notably resulting in low resource consumption and thus high efficiency.
Coordinating HPF programs to mix task and data parallelism
, 2000
"... Experience in applicative elds, above all deriving from the development of multidisciplinary parallel applications, seems to suggest a model where an outer coordination level is provided to allow data parallel tasks to run concurrently and to cooperate each other. The inner computational level of th ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Experience in applicative elds, above all deriving from the development of multidisciplinary parallel applications, seems to suggest a model where an outer coordination level is provided to allow data parallel tasks to run concurrently and to cooperate each other. The inner computational level of this coordination model can easily be expressed with HPF, a high-level data-parallel language. According to this model, we devised COLT HPF , a coordination architectural layer that supports dynamic creation and concurrent execution of HPF tasks, and permits these tasks to cooperate though message passing. This paper proposes the exploitation of COLT HPF by means of a simple skeleton{based coordination language and the associated source-to-source compiler. Dierently from other related proposals, COLT HPF is portable and can exploit commercial, standard-compliant, HPF compilation systems. We used a physics application as a test-case for our approach, and we present the results of several expe...
A Framework for Generating Task Parallel Programs
- In 7th Symposium on the Frontiers of Massively Parallel Computation - Frontiers '99
, 1999
"... We consider the generation of mixed task and data parallel programs and discuss how a clear separation into a task and data parallel level can support the development of efficient programs. The program development starts with a specification of the maximum degree of task and data parallelism and pro ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We consider the generation of mixed task and data parallel programs and discuss how a clear separation into a task and data parallel level can support the development of efficient programs. The program development starts with a specification of the maximum degree of task and data parallelism and proceeds by performing several derivation steps in which the degree of parallelism is adapted to a specific parallel machine. The separation between the task and data parallel level is preserved during the design and translation phases by clearly defined interfaces. We show how the final message-passing programs are generated from the data parallel and the task parallel specification and how the interaction between the two levels can be established. We demonstrate the usefulness of the approach by examples from numerical analysis which offer the potential of a mixed task and data parallel execution but for which it is not a priori clear, how this potential should be used for an implementation o...

