Results 1 -
8 of
8
A Decentralized Fault Tolerant model for Grid Computing
"... Abstract A current trend in high-performance computing is the use of large-scale computing grids. These platforms consist of geographically distributed cluster federations gathering thousands of nodes. At this scale, node and network failures are no more exceptions, but belong to the normal system ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract A current trend in high-performance computing is the use of large-scale computing grids. These platforms consist of geographically distributed cluster federations gathering thousands of nodes. At this scale, node and network failures are no more exceptions, but belong to the normal system behavior. Thus, grid applications must tolerate failures and their evaluation should take reaction to failures into account. The failures of distributed computing system can be divided into three categories: node crash, network failure and process fault. The fault tolerance is a significant and complex issue in grid computing systems. Various techniques have been investigated to detect and tolerate faults in distributed computing systems. We propose, in this paper, a decentralized model of fault tolerance based on dynamic colored graphs. From this model, we show through some experiments, the benefits of colored graphs to manage failures in grids.
Identification of Critical Factors in Checkpointing Based Multiple Fault Tolerance for Distributed System
"... Performance of a checkpointing based multiple fault tolerance is low. The main reason is overheads associate with checkpointing. A checkpointing algorithm can be improved by improved storing strategy and checkpointing scheduling. Improved storage strategy and checkpointing scheduling will reduce the ..."
Abstract
- Add to MetaCart
(Show Context)
Performance of a checkpointing based multiple fault tolerance is low. The main reason is overheads associate with checkpointing. A checkpointing algorithm can be improved by improved storing strategy and checkpointing scheduling. Improved storage strategy and checkpointing scheduling will reduce the overheads associated with checkpointing. Performance and efficiency is most desirable feature of recovery based on checkpointing. In this paper important critical issues involved in fast and efficient recovery are discussed based on checkpointing. Impact of each issue on performance of checkpointing based recovery is also discussed. Relationships among issues are also explored. Finally comparisons of important issues are done between coordinated checkpointing and uncoordinated checkpointing.
Specification of Important Features in Fault Tolerance for Distributed Systems
"... Abstract — Performance of a checkpointing based multiple fault tolerance is low. The main reason is overheads associate with checkpointing. A checkpointing algorithm can be improved by improved storing strategy and checkpointing scheduling. Improved storage strategy and checkpointing scheduling will ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — Performance of a checkpointing based multiple fault tolerance is low. The main reason is overheads associate with checkpointing. A checkpointing algorithm can be improved by improved storing strategy and checkpointing scheduling. Improved storage strategy and checkpointing scheduling will reduce the overheads associated with checkpointing. Performance and efficiency is most desirable feature of recovery based on checkpointing. In this paper important critical issues involved in fast and efficient recovery are discussed based on checkpointing. Impact of each issue on performance of checkpointing based recovery is also discussed. Relationships among issues are also explored. Finally comparisons of important issues are done between coordinated checkpointing and uncoordinated checkpointing.
GRENOBLE – RHÔNE-ALPES
, 418
"... In this report, we present X-Kaapi’s programming model. A X-Kaapi parallel program is a C or C++ sequential program with code annotation using #pragma compiler directives that allow to create tasks. A specific source to source compiler translates X-Kaapi directives to runtime calls. Key-words: paral ..."
Abstract
- Add to MetaCart
(Show Context)
In this report, we present X-Kaapi’s programming model. A X-Kaapi parallel program is a C or C++ sequential program with code annotation using #pragma compiler directives that allow to create tasks. A specific source to source compiler translates X-Kaapi directives to runtime calls. Key-words: parallel computing, data flow graph, scheduling, X-Kaapi
IS
, 2011
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
The Data Recovery File System for Hadoop Cluster-Review Paper
"... Abstract — in today’s world, require Data Recovery system is most challenging aspects in the internet or World Wide Web applications. Now a day evens a tera bytes (TB) and peta bytes (PB) of data is not enough for storing large chunks of database (DB). Hence IT industries use concept is known as Had ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — in today’s world, require Data Recovery system is most challenging aspects in the internet or World Wide Web applications. Now a day evens a tera bytes (TB) and peta bytes (PB) of data is not enough for storing large chunks of database (DB). Hence IT industries use concept is known as Hadoop in their applications. This approach has been adopted in Cloud computing environment for unstructured data. Hadoop is an open source distributed computing framework based on java and supports large set of distributed data processing. HDFS (Hadoop Distributed File System) is popular for huge data sets and streams of operation on it. Avaliable Hadoop in cloud is one of the important factors. But in Hadoop Distributed File System, Master Namenode Failure affects the performance of the Hadoop Cluster. In this paper, we examine the behaviour of Namenode and what are the issues of Namenode failure. This paper presents a Scenario to overcome this failure our scheme replicates the Namenode on the other Datanode so that the availability of the metadata is increases and also Decreases the loss and delay of data.
FAULT TOLERANT MECHANISMS FOR EFFICIENT DATA RECOVERY IN GRID ENVIRONMENT
"... Large clusters, high availability clusters and grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programming models. Distributed systems today are ubiquitous and enable many applications, including client-server systems, transacti ..."
Abstract
- Add to MetaCart
Large clusters, high availability clusters and grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programming models. Distributed systems today are ubiquitous and enable many applications, including client-server systems, transaction processing, World Wide Web, and scientific computing, among many others. The vast computing potential of these systems is often hampered by their susceptibility to failures. Therefore, many techniques have been developed to add reliability and high availability to distributed systems. This paper presents two such techniques: Checkpointing Based Rollback and Log Based Rollback which allows efficient recovery in dynamic heterogeneous system as well as multithreaded applications.