DMCA
Scheduling many-task workloads on supercomputers: Dealing with trailing tasks (2011)
Venue: | In Proc. Workshop on Many-Task Computing on Grids and Supercomputers |
Citations: | 14 - 7 self |
Citations
3421 | MapReduce: simplified data processing on large clusters, in: OSDI’04
- Dean, Ghemawat
(Show Context)
Citation Context ...ress, it simply replicates the task on other machines, under the presumption that the long running time of the task is because of some factor affecting the current machine, but not all other machines =-=[9]-=-. For obvious reasons, this strategy is not at all effective if the long-running task simple involves more work than others. The paper does not consider the straggler problem; we assume that each task... |
364 |
Scheduling algorithms
- Brucker
- 2007
(Show Context)
Citation Context ...chedule tasks on that allocation of workers. If we do not allow preemption and have no constraints on task runtimes, then this problem is closely related to the bin-packing problem and is NP-complete =-=[12]-=-. If there are no precedence constraints on tasks, then the problem allows various practical polynomial scheduling policies that produce schedules with makespans that are, in the worst case, within a ... |
357 | Exploiting process lifetime distributions for dynamic load balancing
- Harchol-Balter, Downey
- 1997
(Show Context)
Citation Context ...en studied. It has been shown that, in the context of a network of workstations, process migration is a valuable tool for load balancing when process lifetimes are part of a heavy-tailed distribution =-=[7]-=-. Both allocation and deallocation policies have been explored in the context of many-task computing using the Falkon task dispatch system [8]. A special case of the trailing task problem is referred ... |
215 | Shmoys: Using dual approximation algorithms for scheduling problems theoretical and practical results
- Hochbaum, B
- 1987
(Show Context)
Citation Context ...e constraints on tasks, then the problem allows various practical polynomial scheduling policies that produce schedules with makespans that are, in the worst case, within a fixed bound of the optimum =-=[13]-=-. Simple load-balancing approaches, where tasks are assigned to idle workers from a queue, can achieve results with makespans within fixed bounds of the optimum. Any arbitrary order of the queue will ... |
88 | Many-Task Computing for Grids and Supercomputers.
- Raicu, Foster, et al.
- 2008
(Show Context)
Citation Context ...; scheduling; highperformance computing; supercomputer systems. I. INTRODUCTION Many-task applications are characterized by two key features that, when combined, provide the motivation for this paper =-=[1]-=-. The first feature is that the application comprises many independent tasks coupled with explicit I/O dependencies. For high-throughput computing and manytask computing applications, each task typica... |
64 | A comparison of multiprocessor scheduling heuristics,
- Khan, McCreary, et al.
- 1994
(Show Context)
Citation Context ...larity of the application’s tasks and the resource provisioning on parallel supercomputers. A range of heuristics can, in some cases, increase the parallelism available at various points of execution =-=[4]-=-, [5]. A specific case that can occur in many-task applications is what we term the trailing task problem, where an increasing number of workers have no further work to do and are sitting idle, but a ... |
49 |
Gene Team. Overview of the IBM Blue Gene/P project
- Blue
(Show Context)
Citation Context ...ute nodes are typical, so a machine node might run several tasks in parallel. For example, on the Blue Gene/P Intrepid at Argonne National Laboratory, each compute node has four single-threaded cores =-=[2]-=-, and on Blue Waters, each compute node has eight cores, with four virtual threads per core [3]. We call the unit that a task is allocated to a “worker.” The nature of a worker is both machine and app... |
33 | Many-task computing: bridging the gap between high-throughput computing and high-performance computing
- Raicu
- 2009
(Show Context)
Citation Context ...process lifetimes are part of a heavy-tailed distribution [7]. Both allocation and deallocation policies have been explored in the context of many-task computing using the Falkon task dispatch system =-=[8]-=-. A special case of the trailing task problem is referred to as the “straggler” problem in the literature on data-intensive computing. In this problem tasks run for an excessively long time because of... |
32 |
Toward loosely coupled programming on petascale systems
- Raicu, Zhang, et al.
(Show Context)
Citation Context ...ent sizes. The random and sorted (long to short) load-balancing policies were both simulated. The task runtime data used in the simulation was collected from a large DOCK5 run on a Blue Gene/P system =-=[17]-=-. This relationships between worker count, time to solution, and utilization are shown for worker counts of 256 to 163,840 cores. The simulation is described in Section IX. consistently and robustly w... |
30 | M.: A tool for prioritizing DAGMan jobs and its evaluation
- Malewicz, Foster, et al.
- 2007
(Show Context)
Citation Context ...y of the application’s tasks and the resource provisioning on parallel supercomputers. A range of heuristics can, in some cases, increase the parallelism available at various points of execution [4], =-=[5]-=-. A specific case that can occur in many-task applications is what we term the trailing task problem, where an increasing number of workers have no further work to do and are sitting idle, but a tail ... |
30 |
Development and validation of a modular, extensible docking program: DOCK 5
- Moustakas, Lang, et al.
(Show Context)
Citation Context ...racteristics were used to test the effectiveness of different policies. All tasks in the workloads were single threaded and ran on a single core. Measurements of two workloads were obtained from DOCK =-=[18]-=- runs on Intrepid. Measurements for real935k, a large run of 934,710 DOCK5 tasks were collected during previous work [17]. Measurements for the real15k workload of 15,000 DOCK6 tasks were collected on... |
24 |
More scalability, less pain: A simple programming model and its implementation for extreme computing
- Lusk, Pieper, et al.
- 2010
(Show Context)
Citation Context ...ncing has been implementing efficient and scalable algorithms to detect and fix load imbalances. An example is ADLB, an MPI-based load-balancing library, which has scaled to 131,072 cores on Intrepid =-=[6]-=-. In this paper we investigate the minimization of the completion time of a set of tasks, rather than simply the balancing of load between workers. However, an efficient load-balancing and task dispat... |
16 |
T.: Bi-criteria scheduling of scientific workflows for the grid
- Wieczorek, Podlipnig, et al.
- 2008
(Show Context)
Citation Context ...bust to misbehaving hardware. Several bi-criteria scheduling problems with different objectives from the one in this paper have been studied, for independent jobs [10] and for computational workflows =-=[11]-=- on parallel machines. III. PROBLEM DESCRIPTION If we ignore unpredictable variations in task runtime, then our problem, at a high level, is as follows. The parallel computer is made up of some number... |
12 |
Scheduling n Independent Jobs on m Uniform Machines with Both Flow TiTne and Makespan Objectives: a Parametric Analysis, Department of Industrial Engineering and Operations Research
- McCollMlCK, PINEDo
- 1989
(Show Context)
Citation Context ...t the application or middleware be robust to misbehaving hardware. Several bi-criteria scheduling problems with different objectives from the one in this paper have been studied, for independent jobs =-=[10]-=- and for computational workflows [11] on parallel machines. III. PROBLEM DESCRIPTION If we ignore unpredictable variations in task runtime, then our problem, at a high level, is as follows. The parall... |
3 | Improving resource availability by relaxing network allocation constraints
- Desai, Buntinas, et al.
- 2009
(Show Context)
Citation Context ...4]. Additional complications exist on some systems. For example, on Intrepid, certain partition sizes are particularly wasteful of interconnect resources when a torus rather than mesh network is used =-=[15]-=-. As a result of these and other problems, the allocation policy on Intrepid is such that its 4-CPU compute nodes can be requested through the scheduler only in block sizes of powers of 2, from 512 to... |
2 |
Job scheduling for the bluegene/l system,” in Job Scheduling Strategies for Parallel Processing, ser
- Krevat, Castaos, et al.
(Show Context)
Citation Context ...as meshes or torus networks generally require that nodes for a job be allocated contiguously in rectangular-shaped blocks; this requirement has a strong tendency to cause fragmentation of the machine =-=[14]-=-. Additional complications exist on some systems. For example, on Intrepid, certain partition sizes are particularly wasteful of interconnect resources when a torus rather than mesh network is used [1... |
1 |
computing system,” [accessed 13-Sep-2010
- Waters
(Show Context)
Citation Context ...he Blue Gene/P Intrepid at Argonne National Laboratory, each compute node has four single-threaded cores [2], and on Blue Waters, each compute node has eight cores, with four virtual threads per core =-=[3]-=-. We call the unit that a task is allocated to a “worker.” The nature of a worker is both machine and application dependent: if tasks are multithreaded, it may be most efficient to treat each node as ... |