Results 1 - 10
of
18
Managing Large-Scale Workflow Execution from Resource Provisioning to Provenance Tracking: The CyberShake Example
- In Proceedings of the Second IEEE international Conference on E-Science and Grid Computing
, 2006
"... This paper discusses the process of building an environment where large-scale, complex, scientific analysis can be scheduled onto a heterogeneous collection of computational and storage resources. The example application is the Southern California Earthquake Center (SCEC) CyberShake project, an anal ..."
Abstract
-
Cited by 22 (11 self)
- Add to MetaCart
This paper discusses the process of building an environment where large-scale, complex, scientific analysis can be scheduled onto a heterogeneous collection of computational and storage resources. The example application is the Southern California Earthquake Center (SCEC) CyberShake project, an analysis designed to compute probabilistic seismic hazard curves for sites in the Los Angeles area. We explain which software tools were used to build to the system, describe their functionality and interactions. We show the results of running the CyberShake analysis that included over 250,000 jobs using resources available through SCEC and the TeraGrid. 1.
Wings for Pegasus: Creating Large-Scale Scientific Representations of Computational Workflows
- IN PROCEEDINGS OF THE 19TH ANNUAL CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE (IAAI
, 2007
"... Scientific workflows are being developed for many domains as a useful paradigm to manage complex scientific computations. In our work, we are challenged with efficiently generating and validating workflows that contain large amounts (hundreds to thousands) of individual computations to be executed o ..."
Abstract
-
Cited by 21 (9 self)
- Add to MetaCart
Scientific workflows are being developed for many domains as a useful paradigm to manage complex scientific computations. In our work, we are challenged with efficiently generating and validating workflows that contain large amounts (hundreds to thousands) of individual computations to be executed over distributed environments. This paper describes a new approach to workflow creation that uses semantic representations to describe compactly complex scientific applications in a dataindependent manner, then automatically generates workflows of computations for given data sets, and finally maps them to available computing resources. The semantic representations are used to automatically generate descriptions for each of the thousands of new data products. We interleave the creation of the workflow with its execution, which allows intermediate execution data products to influence the generation of the following portions of the workflow. We have implemented this approach in Wings, a workflow creation system that combines semantic representations with planning techniques. We have used Wings to create workflows of thousands of computations, which are submitted to the Pegasus mapping system for execution over distributed computing environments. We show results on an earthquake simulation workflow that was automatically created with a total number of 24,135 jobs and that executed for a total of 1.9 CPU years.
A provisioning model and its comparison with best effort for performancecost optimization in grids
- In proceedings of the Sixteenth IEEE International Symposium on High-Performance Distributed Computing (HPDC07
, 2007
"... The resource availability in Grids is generally unpredictable due to the autonomous and shared nature of the Grid resources and stochastic nature of the workload resulting in a best effort quality of service. The resource providers optimize for throughput and utilization whereas the users optimize f ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
The resource availability in Grids is generally unpredictable due to the autonomous and shared nature of the Grid resources and stochastic nature of the workload resulting in a best effort quality of service. The resource providers optimize for throughput and utilization whereas the users optimize for application performance. We present a cost-based model where the providers advertise resource availability to the user community. We also present a multi-objective genetic algorithm formulation for selecting the set of resources to be provisioned that optimizes the application performance while minimizing the resource costs. We use trace-based simulations to compare the application performance and cost using the provisioned and the best effort approach with a number of artificially generated workflow-structured applications and a seismic hazard application from the earthquake science community. The provisioned approach shows promising results when the resources are under high utilization and/or the applications have significant resource requirements.
Scheduling data-intensive workflows onto storage-constrained distributed resources
- In proceedings of the 7th IEEE Symposium on Cluster Computing and The Grid (CCGrid
, 2007
"... In this paper we examine the issue of optimizing disk usage and of scheduling large-scale scientific workflows onto distributed resources where the workflows are dataintensive, requiring large amounts of data storage, and where the resources have limited storage resources. Our approach is two-fold: ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In this paper we examine the issue of optimizing disk usage and of scheduling large-scale scientific workflows onto distributed resources where the workflows are dataintensive, requiring large amounts of data storage, and where the resources have limited storage resources. Our approach is two-fold: we minimize the amount of space a workflow requires during execution by removing data files at runtime when they are no longer required and we schedule the workflows in a way that assures that the amount of data required and generated by the workflow fits onto the individual resources. For a workflow used by gravitationalwave physicists, we were able to improve the amount of storage required by the workflow by up to 57 %. We also designed an algorithm that can not only find feasible solutions for workflow task assignment to resources in diskspace constrained environments, but can also improve the overall workflow performance. 1.
Data Management Challenges of Data-Intensive Scientific Workflows
"... Scientific workflows play an important role in today’s science. Many disciplines rely on workflow technologies to orchestrate the execution of thousands of computational tasks. Much research to-date focuses on efficient, scalable, and robust workflow execution, especially in distributed environments ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Scientific workflows play an important role in today’s science. Many disciplines rely on workflow technologies to orchestrate the execution of thousands of computational tasks. Much research to-date focuses on efficient, scalable, and robust workflow execution, especially in distributed environments. However, many challenges remain in the area of data management related to workflow creation, execution, and result management. In this paper we examine some of these issues in the context of the entire workflow lifecycle. 1.
Semantic Metadata Generation for Large Scientific Workflows
- IN PROCEEDINGS OF THE FIFTH INTERNATIONAL SEMANTIC WEB CONFERENCE
, 2006
"... In recent years, workflows have been increasingly used in scientific applications. This paper presents novel metadata reasoning capabilities that we have developed to support the creation of large workflows. They include 1) use of semantic web technologies in handling metadata constraints on file ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
In recent years, workflows have been increasingly used in scientific applications. This paper presents novel metadata reasoning capabilities that we have developed to support the creation of large workflows. They include 1) use of semantic web technologies in handling metadata constraints on file collections and nested file collections, 2) propagation and validation of metadata constraints from inputs to outputs in a workflow component, and through the links among components in a workflow, and 3) sub-workflows that generate metadata needed for workflow creation. We show how we used these capabilities to support the creation of large workflows in an earthquake science application.
Wings for pegasus: A semantic approach to creating very large scientific workflows
- IN THE EIGHTEENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE
, 2006
"... Scientific workflows are being developed for many domains as a paradigm to manage complex scientific computations. In our work, we are challenged with efficiently generating and validating workflows that contain large amounts (hundreds to thousands) of individual computations to be executed over d ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
Scientific workflows are being developed for many domains as a paradigm to manage complex scientific computations. In our work, we are challenged with efficiently generating and validating workflows that contain large amounts (hundreds to thousands) of individual computations to be executed over distributed environments. We describe a new approach to workflow creation and validation that uses semantic representations to describe complex scientific applications in a data-independent manner, then automatically generates workflows of computations for given data sets, and finally maps them to available computing resources. We have implemented this approach in Wings and used it to create workflows of thousands of computations, which are submitted to the Pegasus mapping system for execution over grid computing environments.
Adaptive Pricing for Resource Reservations in Shared Environments
"... Application scheduling studies on large-scale shared resources have advocated the use of resource provisioning in the form of advance reservations for providing predictable and deterministic quality of service to applications. Resource scheduling studies however have shown the adverse impact of adva ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Application scheduling studies on large-scale shared resources have advocated the use of resource provisioning in the form of advance reservations for providing predictable and deterministic quality of service to applications. Resource scheduling studies however have shown the adverse impact of advance reservations in the form of reduced utilization and increased response time of the resources. Thus, resource providers either disallow reservations or impose restrictions such as minimum notice periods and this reduces the effectiveness of reservations as the means of allocating desired resources at a desired time. In this paper, we suggest adaptive pricing as an alternative for allowing reservation of resources. The prices charged for allowing a reservation are based directly on the impact these reservations have on the other users sharing the resources. Using trace-based simulations, we show that adaptive pricing allows users to make reservations at the desired time while making it uneconomical in comparison to the best effort service. Thus users are induced to make the correct choice between reservations and best effort service based on their real needs. Moreover, this pricing scheme is more cost effective and sensitive to the system load as compared to a flat pricing scheme and encourages load balancing across resources.
Server-side Parallel Data Reduction and Analysis
"... Abstract. Geoscience analysis is currently limited by cumbersome access and manipulation of large datasets from remote sources. Due to their data-heavy and compute-light nature, these analysis workloads represent a class of applications unsuited to a computational grid optimized for compute-intensiv ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract. Geoscience analysis is currently limited by cumbersome access and manipulation of large datasets from remote sources. Due to their data-heavy and compute-light nature, these analysis workloads represent a class of applications unsuited to a computational grid optimized for compute-intensive applications. We present the Script Workflow Analysis for MultiProcessing (SWAMP) system, which relocates data-intensive workflows from scientists ’ workstations to the hosting datacenters in order to reduce data transfer and exploit locality. Our colocation of computation and data leverages the typically reductive characteristics of these workflows, allowing SWAMP to complete workflows in a fraction of the time and with much less data transfer. We describe SWAMP’s implementation and interface, which is designed to leverage scientists ’ existing script-based workflows. Tests with a production geoscience workflow show drastic improvements not only in overall execution time, but in computation time as well. SWAMP’s workflow analysis capability allows it to detect dependencies, optimize I/O, and dynamically parallelize execution. Benchmarks quantify the drastic reduction in transfer time, computation time, and end-to-end execution time. 1
Efficient Resource Management using Advance Reservations for Heterogeneous Grids
"... playsakeyroleinGridresourcemanagementasitenables thesystemtomeetuserexpectationswithrespecttotime requirements and temporal dependence of applications, increases predictability of the system and enables coallocation of resources. Despite these attractive features, adoptionofadvancereservationsislimi ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
playsakeyroleinGridresourcemanagementasitenables thesystemtomeetuserexpectationswithrespecttotime requirements and temporal dependence of applications, increases predictability of the system and enables coallocation of resources. Despite these attractive features, adoptionofadvancereservationsislimitedmainlydueto thefactthatrelatedalgorithmsaretypicallycomplexand fail to scale to large and loaded systems. In this work weconsidertwoaspectsofadvancereservations.First,we investigatetheimpactofheterogeneityonGridresource management when advance reservations are supported. Second,weemploytechniquesfromcomputationalgeometrytodevelopanefficientheterogeneity-awarescheduling algorithm.OurmainfindingisthatGridsmaybenefitfrom highlevelsofresourceheterogeneity,independentlyofthe totalsystemcapacity.Ourresultsshowthatouralgorithm performswellacrossseveraluserandsystemperformance andovercomethelackofscalabilityandadaptabilityof existingmechanisms. I.

