Results 1 -
8 of
8
Exploiting Inter-File Access Patterns Using Multi-Collective I/O
, 2002
"... I/O (MCIO) that extends conventional collective I/O to optimize I/O accesses to multiple arrays simultaneously. In this approach, as in collective I/O, multiple processors co-ordinate to perform I/O on behalf of each other if doing so improves overall I/O time. However, unlike collective I/O, MCIO c ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
I/O (MCIO) that extends conventional collective I/O to optimize I/O accesses to multiple arrays simultaneously. In this approach, as in collective I/O, multiple processors co-ordinate to perform I/O on behalf of each other if doing so improves overall I/O time. However, unlike collective I/O, MCIO considers multiple arrays simultaneously; that is, it has a more global view of the overall I/O behavior exhibited by application. This paper shows that determining optimal MCIO access pattern is an NPcomplete problem, and proposes two different heuristics for the access pattern detection problem (also called the assignment problem).
DPFS: A Distributed Parallel File System
- In Proceedings of the International Conference on Parallel Processing
, 2001
"... One of challenges brought by large-scale scientific applications is how to avoid remote storage access by collectively using enough local storage resources to hold huge amount of data generated by the simulation while providing high performance I/O. DPFS, a Distributed Parallel File System, is desig ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
One of challenges brought by large-scale scientific applications is how to avoid remote storage access by collectively using enough local storage resources to hold huge amount of data generated by the simulation while providing high performance I/O. DPFS, a Distributed Parallel File System, is designed and implemented to address this problem. DPFS collects locally distributed unused storage resources as a supplement to the internal storage of parallel computing systems to satisfy the storage capacity requirement of large-scale applications. In addition, like parallel file systems, DPFS provides striping mechanisms that divides a file into small pieces and distributes them across multiple storage devices for parallel data access. The unique feature of DPFS is that it provides three file levels with each file level corresponding to a file striping method. In addition to the traditional linear striping method, DPFS also provides a novel Multidimensional striping method that can solve performance problems of linear striping for many popular access patterns. Other issues such as load-balanceing and user interface are also addressed in DPFS.
A Distributed Multi-Storage Resource Architecture and I/O Performance Prediction for Scientific Computing
- In Proceedings of the Ninth IEEE International Symposium on High Performance Distributed Computing (HPDC’00
, 2000
"... Abstract. I/O intensive applications have posed great challenges to computational scientists. A major problem of these applications is that users have to sacrifice performance requirements in order to satisfy storage capacity requirements in a conventional computing environment. Further performance ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. I/O intensive applications have posed great challenges to computational scientists. A major problem of these applications is that users have to sacrifice performance requirements in order to satisfy storage capacity requirements in a conventional computing environment. Further performance improvement is impeded by the physical nature of these storage media even when state-of-the-art I/O optimizations are employed. In this paper, we present a distributed multi-storage resource architecture, which can satisfy both performance and capacity requirements by employing multiple storage resources. Compared to a traditional single storage resource architecture, our architecture provides a more flexible and reliable computing environment. This architecture can bring new opportunities for high performance computing as well as inherit state-of-the-art I/O optimization approaches that have already been developed. It provides application users with high-performance storage access even when they do not have the availability of a single large local storage archive at their disposal. We also develop an Application Programming Interface (API) that provides transparent management and access to various storage resources in our computing environment. Since I/O usually dominates the performance in I/O intensive applications, we establish an I/O performance prediction mechanism which consists of a performance database and a prediction algorithm to help users better evaluate and schedule their applications. A tool is also developed to help users automatically generate performance data stored in databases. The experiments show that our multi-storage resource architecture is a promising platform for high performance distributed computing. Keywords: multi-storage resource architecture, I/O performance prediction, data intensive computing 1.
Optimized Management of Large-Scale Data Sets Stored on Tertiary Storage Systems
"... A new system, optimized toward high-performance computing, extends the RasDaMan (Raster Data Management) database management system to allow flexible management of multidimensional spatiotemporal data and to reduce tertiary storage access time. Large-scale scientific experiments and supercomputing s ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A new system, optimized toward high-performance computing, extends the RasDaMan (Raster Data Management) database management system to allow flexible management of multidimensional spatiotemporal data and to reduce tertiary storage access time. Large-scale scientific experiments and supercomputing simulations often generate huge multidimensional data sets. Data volume can reach hundreds of terabytes (up to petabytes). An archival mass-storage (tertiary) system permanently stores these data sets as files on thousands of magnetic tapes, cartridges, or optical disks. Access and transfer times of such tertiary storage devices, even if robotically controlled, are relatively slow. Nevertheless, tertiary storage systems are the common state of the art today for storing large volumes of data because magnetic tapes are far cheaper than hard-disk devices. This trend will continue, even if hard disks ' cost decreases and capacity increases, because magnetic tapes with more capacity (greater than 1 Tbyte) are already on the way. The development of new satellites, sensors, parameters, and so on will dramatically increase the amount of data that such devices must store, but magnetic tapes are well prepared for the task. For data access in high-performance computing (HPC), tertiary storage systems ' main drawbacks are high access latency compared to hard-disk devices and no random access capability. A major problem for scientific applications is that these systems don't allow access to specific data subsets. Accessing a subset of a large data set requires transferring the entire file from the tertiary storage media. Considering the time required to load, search, read, rewind, and unload several cartridges, such retrieval can take many hours.
MS-I/O: A Distributed Multi-Storage I/O System
"... More and more parallel applications are running in a distributed environment to take advantage of easily available and inexpensive commodity resources. For data intensive applications, employing multiple distributed storage resources has many advantages. In this paper, we present a Multi-Storage I/O ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
More and more parallel applications are running in a distributed environment to take advantage of easily available and inexpensive commodity resources. For data intensive applications, employing multiple distributed storage resources has many advantages. In this paper, we present a Multi-Storage I/O System (MS-I/O) that can not only effectively manage various distributed storage resources in the system, but also provide novel high performance storage access schemes. MS-I/O employs many state-of-the-art I/O optimizations such as collective I/O, asynchronous I/O etc. and a number of new techniques such as data location, data replication, subfile, superfile and data access history. In addition, many MS-I/O optimization schemes can work simultaneously within a single data access session, greatly improving the performance.
I/O Optimization and Evaluation for Tertiary Storage Systems
- In submitted to International Conference on Parallel Processing
, 2001
"... Large-scale parallel scientific applications are generating huge amounts of data that tertiary storage systems emerge as a popular place to hold them. SRB, a uniform interface to various storage systems including tertiary storage systems such as HPSS, UniTree etc., becomes an important and conveni ..."
Abstract
- Add to MetaCart
Large-scale parallel scientific applications are generating huge amounts of data that tertiary storage systems emerge as a popular place to hold them. SRB, a uniform interface to various storage systems including tertiary storage systems such as HPSS, UniTree etc., becomes an important and convenient way to access tertiary data across networks in a distributed environment. But SRB is not optimized for I/O performance: one SRB I/O call to storage systems must access a contiguous piece of data like UNIX I/O. For many access patterns, this results in numerous I/O calls which are very expensive. In this paper, we present an Optimization Library (SRB-OL) which is built on top of SRB low level I/O functions and employs various state-of-the-art I/O optimizations that could be found in secondary storage systems such as collective I/O and data sieving etc. We also present a novel optimization scheme: superfile that can efficiently deal with large amounts of small files. We also incorpo...
Remote I/O Optimization and Evaluation for Tertiary Storage Systems through Storage Resource Broker
- In submitted to International Conference on Parallel Processing
, 2001
"... Large-scale parallel scientific applications are generating huge amounts of data that tertiary storage systems emerge as a popular place to hold them. SRB, a uniform interface to various storage systems including tertiary storage systems such as HPSS, UniTree etc., becomes an important and conveni ..."
Abstract
- Add to MetaCart
Large-scale parallel scientific applications are generating huge amounts of data that tertiary storage systems emerge as a popular place to hold them. SRB, a uniform interface to various storage systems including tertiary storage systems such as HPSS, UniTree etc., becomes an important and convenient way to access tertiary data across networks in a distributed environment. But SRB is not optimized for parallel data access: one SRB I/O call to storage systems must access a contiguous piece of data just like UNIX I/O. For many access patterns, this results in numerous small I/O calls which are very expensive.
USENIX Association
, 1992
"... Modern storage environments are composed of a variety of devices with different performance characteristics. In this paper, we explore storage-aware caching algorithms, in which the file buffer replacement algorithm explicitly accounts for differences in performance across devices. We introduce a ne ..."
Abstract
- Add to MetaCart
Modern storage environments are composed of a variety of devices with different performance characteristics. In this paper, we explore storage-aware caching algorithms, in which the file buffer replacement algorithm explicitly accounts for differences in performance across devices. We introduce a new family of storageaware caching algorithms that partition the cache, with one partition per device. The algorithms set the partition sizes dynamically to balance work across the devices. Through simulation, we show that our storageaware policies perform similarly to LANDLORD, a costaware algorithm previously shown to perform well in Web caching environments. We also demonstrate that partitions can be easily incorporated into the Clock replacement algorithm, thus increasing the likelihood of deploying cost-aware algorithms in modern operating systems.

