Results 1 - 10
of
59
Workload characteristics of a multi-cluster supercomputer
, 2004
"... Abstract. This paper presents a comprehensive characterization of a multi-cluster supercomputer 3 workload using twelve-month scientific research traces. Metrics that we characterize include system utilization, job arrival rate and interarrival time, job cancellation rate, job size (degree of parall ..."
Abstract
-
Cited by 40 (4 self)
- Add to MetaCart
Abstract. This paper presents a comprehensive characterization of a multi-cluster supercomputer 3 workload using twelve-month scientific research traces. Metrics that we characterize include system utilization, job arrival rate and interarrival time, job cancellation rate, job size (degree of parallelism), job run time, memory usage, and user/group behavior. Correlations between metrics (job runtime and memory usage, requested and actual runtime, etc) are identified and extensively studied. Differences with previously reported workloads are recognized and statistical distributions are fitted for generating synthetic workloads with the same characteristics. This study provides a realistic basis for experiments in resource management and evaluations of different scheduling strategies in a multi-cluster research environment. 1
Paired Gang Scheduling
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 2003
"... Conventional gang scheduling has the disadvantage that when processes perform I/O or blocking communication, their processors remain idle, because alternative processes cannot be run independently of their own gangs. To alleviate this problem we suggest a slight relaxation of this rule: match gangs ..."
Abstract
-
Cited by 31 (10 self)
- Add to MetaCart
Conventional gang scheduling has the disadvantage that when processes perform I/O or blocking communication, their processors remain idle, because alternative processes cannot be run independently of their own gangs. To alleviate this problem we suggest a slight relaxation of this rule: match gangs that make heavy use of the CPU with gangs that make light use of the CPU (presumably due to I/O or communication activity), and schedule such pairs together, allowing the local scheduler on each node to select either of the two processes at any instant. As I/O-intensive gangs make light use of the CPU, this only causes a minor degradation in the service to compute-bound jobs. This degradation is more than offset by the overall improvement in system performance due to the better utilization of the resources.
The performance of bags-of-tasks in large-scale distributed systems
- IN: HPDC
, 2008
"... Ever more scientists are employing large-scale distributed systems such as grids for their computational work, instead of tightly coupled high-performance computing systems. However, while these distributed systems are more cost-effective, their heterogeneity in terms of hardware, software, and syst ..."
Abstract
-
Cited by 24 (13 self)
- Add to MetaCart
Ever more scientists are employing large-scale distributed systems such as grids for their computational work, instead of tightly coupled high-performance computing systems. However, while these distributed systems are more cost-effective, their heterogeneity in terms of hardware, software, and systems administration, and the lack of accurate resource information leads to inefficient scheduling. In addition, and in contrast to the workloads of tightly coupled high-performance computing systems, a large part of the workloads submitted to these distributed systems consists of large sets (bags) of sequential tasks. Therefore, a realistic performance analysis of scheduling bags-of-tasks in large-scale distributed systems is important. Towards this end, we introduce in this paper a realistic workload model for bags-of-tasks, and we explore through trace-based simulations the design space of scheduling bags-of-tasks. Finally, we identify three new scheduling policies that use only inaccurate information when scheduling, and we compare them against known classes of proposed scheduling policies.
An early performance analysis of cloud computing services for scientific computing
- TU Delft, Tech. Rep., Dec 2008, [Online] Available
"... Abstract—Cloud computing is an emerging commercial infrastructure paradigm that promises to eliminate the need for maintaining expensive computing facilities by companies and institutes alike.Throughtheuseofvirtualizationandresourcetime-sharing, clouds serve with a single set of physical resources a ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
Abstract—Cloud computing is an emerging commercial infrastructure paradigm that promises to eliminate the need for maintaining expensive computing facilities by companies and institutes alike.Throughtheuseofvirtualizationandresourcetime-sharing, clouds serve with a single set of physical resources a large user base withdifferentneeds.Thus,cloudshavethepotentialtoprovide to their owners the benefits of an economy of scale and, at the same time, becomeanalternativeforscientiststoclusters,grids,and parallel production environments. However, the current commercial clouds have been built to support web and small database workloads, which are very different from typical scientific computing workloads. Moreover, the use of virtualization and resource time-sharing may introduce significant performance penalties for the demanding scientific computing workloads. In this work we analyze the performance of cloud computing services for scientific computing workloads. We quantify the presence in real scientific computing workloads of Many-Task Computing (MTC) users, that is, of users who employ looselycoupledapplicationscomprisingmanytaskstoachieve their scientific goals. Then, we perform an empirical evaluation of theperformanceoffourcommercialcloudcomputingservices including Amazon EC2, which is currently the largest commercial cloud. Last,wecomparethroughtrace-basedsimulationtheperformance characteristics and cost models of clouds and other scientific computing platforms, for general and MTC-based scientific computing workloads. Our results indicate that the current clouds need an order of magnitude in performance improvement to be useful tothe scientific community, and show which improvements should be considered first to address this discrepancy between offer and demand.
Benefits of global grid computing for job scheduling
- In Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
, 2004
"... Abstract — In addition to other advantages, computational Grids are considered to utilize the participating compute resources more efficiently as well as to improve the response time for user jobs. Due to the lack of common large scale global Grids and corresponding studies on Grid workloads this as ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Abstract — In addition to other advantages, computational Grids are considered to utilize the participating compute resources more efficiently as well as to improve the response time for user jobs. Due to the lack of common large scale global Grids and corresponding studies on Grid workloads this assumption is not yet verified. In this paper, the effect of geographical distribution of Grid resources on the machine utilization and the average response time is analyzed. To this end, simulations have been performed. The results show a significant benefit for the job scheduling quality due to the participation in a true global Grid. The average weighted response times of all submitted jobs decrease up to about 30%. The results have been verified using different workloads and Grid configurations. I.
Resource Availability in Enterprise Desktop Grids
, 2006
"... Desktop grids, which use the idle cycles of many desktop PC’s, are currently the largest distributed systems in the world. Despite the popularity and success of many desktop grid projects, the heterogeneity and volatility of hosts within desktop grids has been poorly understood. Yet, host characteri ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
Desktop grids, which use the idle cycles of many desktop PC’s, are currently the largest distributed systems in the world. Despite the popularity and success of many desktop grid projects, the heterogeneity and volatility of hosts within desktop grids has been poorly understood. Yet, host characterization is essential for accurate simulation and modelling of such platforms. In this paper, we present application-level traces of four real desktop grids that can be used for simulation and modelling purposes. In addition, we describe aggregate and per host statistics that reflect the heterogeneity and volatility of desktop grid resources.
The Grid Workloads Archive
, 2008
"... While large grids are currently supporting the work of thousands of scientists, very little is known about their actual use. Because of strict organizational permissions, there are few or no traces of grid workloads available to the grid researcher and practitioner. To address this problem, in this ..."
Abstract
-
Cited by 20 (12 self)
- Add to MetaCart
While large grids are currently supporting the work of thousands of scientists, very little is known about their actual use. Because of strict organizational permissions, there are few or no traces of grid workloads available to the grid researcher and practitioner. To address this problem, in this work we present the Grid Workloads Archive (GWA), which is at the same time a workload data exchange and a meeting point for the grid community. We define the requirements for building a workloads archive, and describe the approach taken to meet these requirements with the GWA. We introduce a format for sharing grid workload information, and tools associated with this format. Using these tools, we collect and analyze data from nine well-known grid environments, with a total content of more than 2000 users submitting more than 7 million jobs over a period of over 13 operational years, and with working environments spanning over 130 sites comprising 10000 resources. We show evidence that grid workloads are very different from those encountered in other large-scale environments, and in particular from the workloads of parallel production environments: they comprise almost exclusively single-node jobs, and jobs arrive in ”bags-of-tasks”. Finally, we present the immediate applications of the GWA and of its content in several critical grid research and practical areas: research in grid resource management, and grid design, operation, and maintenance.
Parallel Computer Workload Modeling with Markov Chains
- Proc. of the 10th Job Scheduling Strategies for Parallel Processing (JSSPP), volume 3277 of Lecture Notes in Computer Science
, 2004
"... In order to evaluate di#erent scheduling strategies for parallel computers, simulations are often executed. As the scheduling quality highly depends on the workload that is served on the parallel machine, a representative workload model is required. Common approaches such as using a probability d ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
In order to evaluate di#erent scheduling strategies for parallel computers, simulations are often executed. As the scheduling quality highly depends on the workload that is served on the parallel machine, a representative workload model is required. Common approaches such as using a probability distribution model can capture the static feature of real workloads, but they do not consider the temporal relation in the traces. In this paper, a workload model is presented which uses Markov chains for modeling job parameters. In order to consider the interdependence of individual parameters without requiring large scale Markov chains, a novel method for transforming the states in di#erent Markov chains is presented. The results show that the model yields closer results to the real workloads than other common approaches.
Scaling of workload traces
- In Ninth Workshop on Job Scheduling Strategies for Parallel Processing
, 2003
"... Abstract — The design and evaluation of job scheduling strategies often require simulations with workload data or models. Usually workload traces are the most realistic data source as they include all explicit and implicit job patterns which are not always considered in a model. In this paper, a met ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
Abstract — The design and evaluation of job scheduling strategies often require simulations with workload data or models. Usually workload traces are the most realistic data source as they include all explicit and implicit job patterns which are not always considered in a model. In this paper, a method is presented to enlarge and/or duplicate jobs in a given workload. This allows the scaling of workloads for later use on parallel machine configurations with a different number of processors. As quality criteria the scheduling results by common algorithms have been examined. The results show high sensitivity of schedule attributes to modifications of the workload. To this end, different strategies of scaling number of job copies and/or job size have been examined. The best results had been achieved by adjusting the scaling factors to be higher than the precise relation between the new scaled machine size and the original source configuration. I.

