Results 1 -
5 of
5
Reliability in Grid Computing Systems
"... In recent years, grid technology has emerged as an important tool for solving computeintensive problems within the scientific community and in industry. To further the development and adoption of this technology, researchers and practitioners from different disciplines have collaborated to produce s ..."
Abstract
- Add to MetaCart
In recent years, grid technology has emerged as an important tool for solving computeintensive problems within the scientific community and in industry. To further the development and adoption of this technology, researchers and practitioners from different disciplines have collaborated to produce standard specifications for implementing largescale, interoperable grid systems. The focus of this activity has been the Open Grid Forum, but other standards development organizations have also produced specifications that are used in grid systems. To date, these specifications have provided the basis for a growing number of operational grid systems used in scientific and industrial applications. However, if the growth of grid technology is to continue, it will be important that grid systems also provide high reliability. In particular, it will be critical to ensure that grid systems are reliable as they continue to grow in scale, exhibit greater dynamism, and become more heterogeneous in composition. Ensuring grid system reliability in turn requires that the specifications used to build these systems fully support reliable grid services. This study surveys work on grid reliability that has been done in recent years and reviews progress made toward achieving these goals. The survey identifies important issues and problems that researchers are working
Resilient Workflows for Cooperative Design Application of Distributed High-Performance Scientific Computing
, 2011
"... Abstract—This paper describes an approach to extend process modeling for engineering design applications with fault-tolerance and resilience capabilities. It is based on the requirements for application-level error handling, which is a requirement for petascale and exascale scientific computing. Thi ..."
Abstract
- Add to MetaCart
Abstract—This paper describes an approach to extend process modeling for engineering design applications with fault-tolerance and resilience capabilities. It is based on the requirements for application-level error handling, which is a requirement for petascale and exascale scientific computing. This complements the traditional fault-tolerance management features provided by the existing hardware and distributed systems. These are often based on data and operations duplication and migration, and on checkpoint-restart procedures. We show how they can be optimized for high-performance infrastructures. This approach is applied on a prototype tested against industrial testcases for optimization of engineering design artifacts.his electronic document is a “live ” template. The various components of your paper [title, text, heads, etc.] are already defined on the style sheet, as illustrated by the portions given in this document. Keywords- Workflows; fault-tolerance; resilience; distributed systems; process modeling; high-performance computing; engineering design I.
GRID COMPUTING ENVIRONMENTS BY
"... Likewise, intra-cluster communication should take place as much as possible using high-performance cluster interconnects, resorting to lower performance wide-area protocols only when necessary. This thesis examines the feasibility of deploying tightly-coupled parallel applications in Grid computing ..."
Abstract
- Add to MetaCart
Likewise, intra-cluster communication should take place as much as possible using high-performance cluster interconnects, resorting to lower performance wide-area protocols only when necessary. This thesis examines the feasibility of deploying tightly-coupled parallel applications in Grid computing environments. A desired outcome of this work is the capability of delivering application performance in a Grid environment that is on par with the performance within a single cluster while simultaneously requiring few or no modifications to application software. To that end, the thesis explores techniques that can be deployed effectively at the runtime system level and applied to a variety of application decomposition styles. iv Where would any of us be without teachers — without people who have passion for their art or their science or their craft and love it right in front of us? What would any of us do without teachers passing on to us what they know is essential about life?
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. (2009) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/cpe.1410 Reliability in grid computing
"... In recent years, grid technology has emerged as an important tool for solving compute-intensive problems within the scientific community and in industry. To further the development and adoption of this technology, researchers and practitioners from different disciplines have collaborated to produce ..."
Abstract
- Add to MetaCart
In recent years, grid technology has emerged as an important tool for solving compute-intensive problems within the scientific community and in industry. To further the development and adoption of this technology, researchers and practitioners from different disciplines have collaborated to produce standard specifications for implementing large-scale, interoperable grid systems. The focus of this activity has been the Open Grid Forum, but other standards development organizations have also produced specifications that are used in grid systems. To date, these specifications have provided the basis for a growing number of operational grid systems used in scientific and industrial applications. However, if the growth of grid technology is to continue, it will be important that grid systems also provide high reliability. In particular, it will be critical to ensure that grid systems are reliable as they continue to grow in scale, exhibit greater dynamism, and become more heterogeneous in composition. Ensuring grid system reliability in turn requires that the specifications used to build these systems fully support reliable grid services. This study surveys work on grid reliability that has been done in recent years and reviews progress made toward achieving these goals. The survey identifies important issues and problems that researchers are working to overcome in order to develop reliability methods for large-scale, heterogeneous, dynamic environments. The survey also illuminates reliability issues relating to standard specifications used in grid systems, identifying existing specifications that may need to be evolved and areas where new specifications are needed

