Results 1 - 10
of
17
Chirp: A practical global file system for cluster and grid computing
- Journal of Grid Computing
"... Abstract. Traditional distributed filesystem technologies designed for local and campus area networks do not adapt well to wide area grid computing environments. To address this problem, we have designed the Chirp distributed filesystem, which is designed from the ground up to meet the needs of grid ..."
Abstract
-
Cited by 13 (9 self)
- Add to MetaCart
Abstract. Traditional distributed filesystem technologies designed for local and campus area networks do not adapt well to wide area grid computing environments. To address this problem, we have designed the Chirp distributed filesystem, which is designed from the ground up to meet the needs of grid computing. Chirp is easily deployed without special privileges, provides strong and flexible security mechanisms, tunable consistency semantics, and clustering to increase capacity and throughput. We demonstrate that many of these features also provide order-of-magnitude performance increases over wide area networks. We describe three applications in bioinformatics, biometrics, and gamma ray physics that each employ Chirp to attack large scale data intensive problems.
Coupling Prefix Caching and Collective Downloads for Remote Dataset Access
- In Proceedings of the 16th ACM International Conference on Supercomputing
, 2006
"... Scientific datasets are typically archived at mass storage systems or data centers close to supercomputers/instruments. Endusers of these datasets, however, usually perform parts of their workflows at their local computers. In such cases, client-side caching can offer significant gains by reducing t ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
Scientific datasets are typically archived at mass storage systems or data centers close to supercomputers/instruments. Endusers of these datasets, however, usually perform parts of their workflows at their local computers. In such cases, client-side caching can offer significant gains by reducing the cost of widearea data movement. Scientific data caches, however, traditionally cache entire datasets, which may not be necessary. In this paper, we propose a novel combination of prefix caching and collective download. Prefix caching allows the bootstrapping of dataset downloads by caching only a prefix of the dataset, while collective download facilitates efficient parallel patching of the missing suffix from an external data source. To estimate the optimal prefix size, we further present an analytical model that considers both the initial download overhead and the downloading speed. We implemented our proposed approach in the FreeLoader distributed cache prototype. Experimental results (using multiple scientific data repositories and data transfer tools, as well as a real-world scientific dataset access trace) demonstrate that prefix caching and collective download can be implemented efficiently, our model can select an appropriate prefix size, and the cache hit rate can be improved significantly without hurting the local access rate of cached datasets. 1.
Functional Partitioning to Optimize End-to-End Performance on Many-core Architectures
"... Scaling computations on emerging massive-core supercomputers is a daunting task, which coupled with the significantly lagging system I/O capabilities exacerbates applications ’ end-to-end performance. The I/O bottleneck often negates potential performance benefits of assigning additional compute cor ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Scaling computations on emerging massive-core supercomputers is a daunting task, which coupled with the significantly lagging system I/O capabilities exacerbates applications ’ end-to-end performance. The I/O bottleneck often negates potential performance benefits of assigning additional compute cores to an application. In this paper, we address this issue via a novel functional partitioning (FP) runtime environment that allocates cores to specific application tasks — checkpointing, de-duplication, and scientific data format transformation — so that the deluge of cores can be brought to bear on the entire gamut of application activities. The focus is on utilizing the extra cores to support HPC application I/O activities and also leverage solid-state disks in this context. For example, our evaluation shows that dedicating 1 core on an oct-core machine for checkpointing and its assist tasks using FP can improve overall execution time of a FLASH benchmark on 80 and 160 cores by 43.95 % and 41.34%, respectively. I.
Storage exchange: A global trading platform for storage services
- In EUROPAR ’06: Proceedings of the 12th International European Parallel Computing Conference
, 2006
"... Abstract. The Storage Exchange (SX) is a new platform allowing storage to be treated as a tradeable resource. Organisations with varying storage requirements can use the SX platform to trade and exchange storage services. Organisations have the ability to federate their storage, be-it dedicated or s ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract. The Storage Exchange (SX) is a new platform allowing storage to be treated as a tradeable resource. Organisations with varying storage requirements can use the SX platform to trade and exchange storage services. Organisations have the ability to federate their storage, be-it dedicated or scavenged and advertise it to a global storage market. In this paper we discuss the high level architecture employed by our platform and investigate a sealed Double Auction market model. We implement and experiment the following clearing algorithms: maximise surplus, optimise utilisation and an efficient combination of both. 1
Improving the availability of supercomputer job input data using temporal replication, submitted for publication
"... Supercomputers are stepping into the Peta-scale and Exascale era, wherein handling hundreds of concurrent system failures is an urgent challenge. In particular, storage system failures have been identified as a major source of service interruptions in supercomputers. RAID solutions alone cannot prov ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Supercomputers are stepping into the Peta-scale and Exascale era, wherein handling hundreds of concurrent system failures is an urgent challenge. In particular, storage system failures have been identified as a major source of service interruptions in supercomputers. RAID solutions alone cannot provide sufficient storage protection as (1) average disk recovery time is projected to grow, making RAID groups increasingly vulnerable to additional failures during data reconstruction, and (2) disk-level data protection cannot mask higherlevel faults, such as software/hardware failures of entire I/O nodes. This paper presents a complementary approach based on the observation that files in the supercomputer scratch space are typically accessed by batch jobs, whose execution can be anticipated. Therefore, we propose to transparently, selectively, and temporarily replicate ”active ” job input data, by coordinating the parallel file system with the batch job scheduler. We have implemented the temporal replication scheme in the popular Lustre parallel file system and evaluated it with both real-cluster experiments and trace-driven simulations. Our results show that temporal replication allows for fast online data reconstruction, with a reasonably low overall space and I/O bandwidth overhead. 1
Data-Driven Batch Scheduling
, 2005
"... In this paper, we develop data-driven strategies for batch computing schedulers. Current CPU-centric batch schedulers ignore the data needs within workloads and execute them by linking them transparently and directly to their needed data. When scheduled on remote computational resources, this elegan ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we develop data-driven strategies for batch computing schedulers. Current CPU-centric batch schedulers ignore the data needs within workloads and execute them by linking them transparently and directly to their needed data. When scheduled on remote computational resources, this elegant solution of direct data access can incur an order of magnitude performance penalty for data-intensive workloads. Adding data-awareness to batch schedulers allows a careful coordination of data and CPU allocation thereby reducing the cost of remote execution. We offer here new techniques by which batch schedulers can become data-driven. Such systems can use our analytical predictive models to select one of the four data-driven scheduling policies that we have created. Through simulation, we demonstrate the accuracy of our predictive models and show how they can reduce time to completion for some workloads by as much as 80%.
GridBlocks DISK - Distributed Inexpensive Storage with K-availability
- In Proceedings of HPDC 2006
, 2006
"... This paper describes an architecture for an archival data storage. The design enables aggregation of storage resources in scalable fashion to achieve a highly reliable data storage. Reliability is implemented by using erasure coding which provides the less storage overhead than full replication. Sca ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This paper describes an architecture for an archival data storage. The design enables aggregation of storage resources in scalable fashion to achieve a highly reliable data storage. Reliability is implemented by using erasure coding which provides the less storage overhead than full replication. Scalability of the architecture is achieved through the ability to work with multiple different storage locations and a scalable metadata management system. The integrity of the data is ensured through use of cryptographic naming. Feasibility of the proposed design is assessed with a real world implementation named GB-DISK. 1
Design and Implementation of a Middleware for Data Storage in Opportunistic Grids
"... Shared machines in opportunistic grids typically have large quantities of unused disk space. These resources could be used to store application and checkpointing data when the machines are idle, allowing those machines to share not only computational cycles, but also disk space. In this paper, we pr ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Shared machines in opportunistic grids typically have large quantities of unused disk space. These resources could be used to store application and checkpointing data when the machines are idle, allowing those machines to share not only computational cycles, but also disk space. In this paper, we present the design and implementation of OppStore, a middleware that provides reliable distributed data storage using the free disk space from shared grid machines. The system utilizes a two-level peer-to-peer organization to connect grid machines in a scalable and faulttolerant way. Finally, we use the concept of virtual ids to deal with resource heterogeneity, enabling heterogeneityaware load-balancing selection of storage sites. 1.
Accelerating Parallel Analysis of Scientific Simulation Data via Zazen
"... As a new generation of parallel supercomputers enables researchers to conduct scientific simulations of unprecedented scale and resolution, terabyte-scale simulation output has become increasingly commonplace. Analysis of such massive data sets is typically I/O-bound: many parallel analysis programs ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
As a new generation of parallel supercomputers enables researchers to conduct scientific simulations of unprecedented scale and resolution, terabyte-scale simulation output has become increasingly commonplace. Analysis of such massive data sets is typically I/O-bound: many parallel analysis programs spend most of their execution time reading data from disk rather than performing useful computation. To overcome this I/O bottleneck, we have developed a new data access method. Our main idea is to cache a copy of simulation output files on the local disks of an analysis cluster’s compute nodes, and to use a novel task-assignment protocol to co-locate data access with computation. We have implemented our methodology in a parallel disk cache system called Zazen. By avoiding the overhead associated with querying metadata servers and by reading data in parallel from local disks, Zazen is able to deliver a sustained read bandwidth of over 20 gigabytes per second on a commodity Linux cluster with 100 nodes, approaching the optimal aggregated I/O bandwidth attainable on these nodes. Compared with conventional NFS, PVFS2, and Hadoop/HDFS, respectively, Zazen is 75, 18, and 6 times faster for accessing large (1-GB) files, and 25, 13, and 85 times faster for accessing small (2-MB) files. We have deployed Zazen in conjunction with Anton—a special-purpose supercomputer that dramatically accelerates molecular dynamics (MD) simulations—and have been able to accelerate the parallel analysis of terabyte-scale MD trajectories by about an order of magnitude. 1

