Results 1 -
7 of
7
DataSteward: Using Dedicated Compute Nodes for Scalable Data Management on Public Clouds
- in "Proceedings of the 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications
, 2013
"... HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A layout-aware optimization strategy for collective i/o
- in Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, HPDC ’10
, 2010
"... ABSTRACT In this study, we propose an optimization strategy to promote a better integration of the parallel I/O middleware and parallel file systems. We illustrate that a layout-aware optimization strategy can improve the performance of current collective I/O in parallel I/O system. We present the ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
ABSTRACT In this study, we propose an optimization strategy to promote a better integration of the parallel I/O middleware and parallel file systems. We illustrate that a layout-aware optimization strategy can improve the performance of current collective I/O in parallel I/O system. We present the motivation, prototype design and initial verification of the proposed layout-aware optimization strategy. The analytical and initial experimental testing results demonstrate that the proposed strategy has a potential in improving the parallel I/O system performance.
Author manuscript, published in "International Symposium on Parallel and Distributed Processing with Applications ISPA (2013)" DataSteward: Using Dedicated Compute Nodes for Scalable Data Management on Public Clouds
, 2014
"... Abstract—A large spectrum of scientific applications, some generating data volumes exceeding petabytes, are currently being ported on clouds to build on their inherent elasticity and scalability. One of the critical needs in order to deal with this ”data deluge ” is an efficient, scalable and reliab ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—A large spectrum of scientific applications, some generating data volumes exceeding petabytes, are currently being ported on clouds to build on their inherent elasticity and scalability. One of the critical needs in order to deal with this ”data deluge ” is an efficient, scalable and reliable storage. However, the storage services proposed by cloud providers suffer from high latencies, trading performance for availability. One alternative is to federate the local virtual disks on the compute nodes into a globally shared storage used for large intermediate or checkpoint data. This collocated storage supports a high throughput but it can be very intrusive and subject to failures that can stop the host node and degrade the application performance. To deal with these limitations we propose DataSteward, a data management system that provides a higher degree of reliability while remaining non-intrusive through the use of dedicated compute nodes. DataSteward harnesses the storage space of a set of dedicated VMs, selected using a topology-aware clustering algorithm, and has a lifetime dependent on the deployment lifetime. To capitalize on this separation, we introduce a set of scientific data processing services on top of the storage layer, that can overlap with the executing applications. We performed extensive experimentations on hundreds of cores in the Azure cloud: compared to state-of-the-art node selection algorithms, we show up to a 20% higher throughput, which improves the overall performance of a real life scientific application up to 45%. I.
Performance, Reliability
"... MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for distributed volunteer computing systems. However, unlike on dedicated resources, where MapReduce has mostly been deployed, such volunteer computing systems have significantly higher ..."
Abstract
- Add to MetaCart
(Show Context)
MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for distributed volunteer computing systems. However, unlike on dedicated resources, where MapReduce has mostly been deployed, such volunteer computing systems have significantly higher rates of node unavailability. Furthermore, nodes are not fully controlled by the MapReduce framework. Consequently, we found the data and task replication scheme adopted by existing MapReduce implementations woefully inadequate for resources with high unavailability. To address this, we propose MOON, short for MapReduce On Opportunistic eNvironments. MOON extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms in order to offer reliable MapReduce services on a hybrid resource architecture, where volunteer computing systems are supplemented by a small set of dedicated nodes. Our tests on an emulated volunteer computing system, which uses a 60-node cluster where each node possesses a similar hardware configuration to a typical computer in a student lab, demonstrate that MOON can deliver a three-fold performance improvement to Hadoop in volatile, volunteer computing environments.
DOI 10.1007/s10586-011-0158-7 Reliable MapReduce computing on opportunistic resources
, 2010
"... Abstract MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for opportunistic compute resources. However, unlike dedicated resources, where MapReduce has mostly been deployed, opportunistic resources have significantly higher rates of n ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract MapReduce offers an ease-of-use programming paradigm for processing large data sets, making it an attractive model for opportunistic compute resources. However, unlike dedicated resources, where MapReduce has mostly been deployed, opportunistic resources have significantly higher rates of node volatility. As a consequence, the data and task replication scheme adopted by existing MapReduce implementations is woefully inadequate on such volatile resources. In this paper, we propose MOON, short for MapReduce On Opportunistic eNvironments, which is designed to offer reliable MapReduce service for opportunistic computing. MOON adopts a hybrid resource architecture by supplementing opportunistic compute resources with a small set of dedicated resources, and it extends Hadoop, an open-source implementation of MapReduce, with adaptive task and data scheduling algorithms to take advantage of the hybrid resource architecture. Our results on an emulated opportunistic computing system running atop a 60-node cluster demonstrate that MOON can deliver significant performance improvements to Hadoop on volatile compute resources and even finish jobs that are not able to complete in Hadoop.
bs L Article history:
, 2014
"... The availability of end devices of peer-to-peer storage and backup systems has been shown to be critical virtually unlimited storage for backup [1,2], are still not appealing enough performance-wise, as e.g. retrieval times for saved data can be an order of magnitude higher that the time required fo ..."
Abstract
- Add to MetaCart
(Show Context)
The availability of end devices of peer-to-peer storage and backup systems has been shown to be critical virtually unlimited storage for backup [1,2], are still not appealing enough performance-wise, as e.g. retrieval times for saved data can be an order of magnitude higher that the time required for direct download [3]. A particularly illustrative example is the Wuala case: the Wuala company gained fame by proposing a peer-assisted (advertised as fully peer-to-peer) and practical storage service; nevertheless, this technical choice was abandoned this may impact does not p erformanc ficient bac commodity hardware in a fully peer-to-peer way. Other s issues with peer-to-peer solutions include security or Q but are out of the scope of this article. In this article, we propose a new architecture for peer-to-peer backup, where residential gateways are turned into a stable buffering layer between the peers and the Internet. The residential gateways are ideal to act as stable buffers: they lay at the edge of the network between the home network and the Internet, and are highly available since they remain powered-on most of the time [7]. Our idea is to temporarily store data on gateways to ⇑ Corresponding author.