Results 1 - 10
of
20
Analysis of Striping Techniques in Robotic Storage Libraries
- In Proceedings of the Fourteenth IEEE Symposium on Mass Storage Systems
, 1995
"... In recent years advances in computational speed have been the main focus of research and development in high performance computing. In comparison, the improvement in I/O performance has been modest. Faster processing speeds have created a need for faster I/O as well as for storage and retrieval of v ..."
Abstract
-
Cited by 41 (2 self)
- Add to MetaCart
In recent years advances in computational speed have been the main focus of research and development in high performance computing. In comparison, the improvement in I/O performance has been modest. Faster processing speeds have created a need for faster I/O as well as for storage and retrieval of vast amounts of data. The technology needed to develop these mass storage systems exists today. Robotic storage libraries are vital components of such systems; however, they normally exhibit high latency and long transmission times. In this paper we analyze the performance of robotic storage libraries and study striping as a technique for improving response time. We show that striping, which improves the effective bandwidth, introduces overhead into the usage of the library's resources, and hence conditions under which it is advantageous are highly dependent on the system's workload. This work was partially done while the author was a summer student at the Lawrence Livermore National Labora...
April: A Run-Time Library for Tape-Resident Data
- THE 17TH IEEE SYMPOSIUM ON MASS STORAGE SYSTEMS
, 2000
"... Over the last decade, processors have made enormous gains in speed. But increase in the speed of the secondary and tertiary storage devices could not cope with these gains. The result is that the secondary and tertiary storage access times dominate execution time of data intensive computations. Ther ..."
Abstract
-
Cited by 12 (9 self)
- Add to MetaCart
Over the last decade, processors have made enormous gains in speed. But increase in the speed of the secondary and tertiary storage devices could not cope with these gains. The result is that the secondary and tertiary storage access times dominate execution time of data intensive computations. Therefore, in scientific computations, efficient data access functionality for data stored in secondary and tertiary storage is a must. In this paper, we give an overview of APRIL, a parallel runtime library that can be used in applications that process tape-resident data. We present user interface and underlying optimization strategy. We also discuss performance improvements provided by the library on the High Performance Storage System (HPSS). The preliminary results reveal that the optimizations can improve response times by up to 97.2%.
Data Management for Large-Scale Scientific Computations in High Performance Distributed Systems
- In Proc. of the Eighth IEEE Int’l Symposium on High Performance Distributed Computing
, 1999
"... With the increasing number of scientific applications manipulating huge amounts of data, effective high-level data management is an increasingly important problem. Unfortunately, so far the solutions to the high-level data management problem either require deep understanding of specific storage arch ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
With the increasing number of scientific applications manipulating huge amounts of data, effective high-level data management is an increasingly important problem. Unfortunately, so far the solutions to the high-level data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file storage systems) or produce unsatisfactory I/O performance in exchange for ease-of-use and portability (as in relational DBMSs). In this paper we present a novel application development environment which is built around an active meta-data management system (MDMS) to handle high-level data in an effective manner. The key components of our three-tiered architecture are user application, the MDMS, and a hierarchical storage system (HSS). Our environment overcomes the performance problems of pure database-oriented solutions, while maintaining their advantages in terms of ease-of-use and portability. The high levels of performance are achieved by the MDMS, with the aid of user-specified, performance-oriented directives. Our environment supports a simple, easy-to-use yet powerful user interface, leaving the task of choosing appropriate I/O techniques for the application at hand to the MDMS. We discuss the importance of an active MDMS and show how the three components of our environment, namely application, the MDMS, and the HSS, fit together. We also report performance numbers from our ongoing implementation and illustrate that significant improvements are made possible without undue programming effort. 1
A Novel Application Development Environment for Large-Scale Scientific Computations
, 2000
"... Effective high-level data management is becoming an important issue with more and more scientific applications manipulating huge amounts of secondary-storage and tertiary-storage data using parallel processors. A major problem facing the current solutions to this data management problem is that t ..."
Abstract
-
Cited by 11 (8 self)
- Add to MetaCart
Effective high-level data management is becoming an important issue with more and more scientific applications manipulating huge amounts of secondary-storage and tertiary-storage data using parallel processors. A major problem facing the current solutions to this data management problem is that these solutions either require a deep understanding of specific data storage architectures and file layouts to obtain the best performance (as in high-performance storage management systems and parallel file systems) or they sacrifice significant performance in exchange for ease-of-use and portability (as in traditional database management systems). While the success of these approaches varies depending on the specific system and applications, the trend in scientific computing towards processing large-scale datasets demands both high-performance and ease-of-use.
Exploiting Inter-File Access Patterns Using Multi-Collective I/O
, 2002
"... I/O (MCIO) that extends conventional collective I/O to optimize I/O accesses to multiple arrays simultaneously. In this approach, as in collective I/O, multiple processors co-ordinate to perform I/O on behalf of each other if doing so improves overall I/O time. However, unlike collective I/O, MCIO c ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
I/O (MCIO) that extends conventional collective I/O to optimize I/O accesses to multiple arrays simultaneously. In this approach, as in collective I/O, multiple processors co-ordinate to perform I/O on behalf of each other if doing so improves overall I/O time. However, unlike collective I/O, MCIO considers multiple arrays simultaneously; that is, it has a more global view of the overall I/O behavior exhibited by application. This paper shows that determining optimal MCIO access pattern is an NPcomplete problem, and proposes two different heuristics for the access pattern detection problem (also called the assignment problem).
Introduction to multiprocessor I/O architecture
- Input/Output in Parallel and Distributed Computer Systems, chapter 4
, 1996
"... ..."
A Distributed Multi-Storage Resource Architecture and I/O Performance Prediction for Scientific Computing
- In Proceedings of the Ninth IEEE International Symposium on High Performance Distributed Computing (HPDC’00
, 2000
"... Abstract. I/O intensive applications have posed great challenges to computational scientists. A major problem of these applications is that users have to sacrifice performance requirements in order to satisfy storage capacity requirements in a conventional computing environment. Further performance ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. I/O intensive applications have posed great challenges to computational scientists. A major problem of these applications is that users have to sacrifice performance requirements in order to satisfy storage capacity requirements in a conventional computing environment. Further performance improvement is impeded by the physical nature of these storage media even when state-of-the-art I/O optimizations are employed. In this paper, we present a distributed multi-storage resource architecture, which can satisfy both performance and capacity requirements by employing multiple storage resources. Compared to a traditional single storage resource architecture, our architecture provides a more flexible and reliable computing environment. This architecture can bring new opportunities for high performance computing as well as inherit state-of-the-art I/O optimization approaches that have already been developed. It provides application users with high-performance storage access even when they do not have the availability of a single large local storage archive at their disposal. We also develop an Application Programming Interface (API) that provides transparent management and access to various storage resources in our computing environment. Since I/O usually dominates the performance in I/O intensive applications, we establish an I/O performance prediction mechanism which consists of a performance database and a prediction algorithm to help users better evaluate and schedule their applications. A tool is also developed to help users automatically generate performance data stored in databases. The experiments show that our multi-storage resource architecture is a promising platform for high performance distributed computing. Keywords: multi-storage resource architecture, I/O performance prediction, data intensive computing 1.
Performance of an MPI-IO implementation using third-party transfer
- In Proceedings of the 17th IEEE Symposium on Mass Storage Systems
, 2000
"... We present a unique new implementation of MPI-IO (as defined in the recent MPI2 message passing standard) that is easy to use, fast, efficient, and complete. Our implementation is layered over the High-Performance Storage System, using HPSS's third-party transfers and parallel I/O descriptors. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We present a unique new implementation of MPI-IO (as defined in the recent MPI2 message passing standard) that is easy to use, fast, efficient, and complete. Our implementation is layered over the High-Performance Storage System, using HPSS's third-party transfers and parallel I/O descriptors.
Dealing with Massive Data: From Parallel I/O to Grid I/O
, 2003
"... Acknowledgements Many people have helped us find our way during the development of this thesis. Erich Schikuta, our supervisor, provided a motivating, enthusiastic, and critical atmosphere dur-ing our discussions. It was a great pleasure for us to conduct this thesis under his su-pervision. We also ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Acknowledgements Many people have helped us find our way during the development of this thesis. Erich Schikuta, our supervisor, provided a motivating, enthusiastic, and critical atmosphere dur-ing our discussions. It was a great pleasure for us to conduct this thesis under his su-pervision. We also acknowledge Heinz and Kurt Stockinger who provided constructive comments. We would also like to thank everybody for providing us with feedback.
MS-I/O: A Distributed Multi-Storage I/O System
"... More and more parallel applications are running in a distributed environment to take advantage of easily available and inexpensive commodity resources. For data intensive applications, employing multiple distributed storage resources has many advantages. In this paper, we present a Multi-Storage I/O ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
More and more parallel applications are running in a distributed environment to take advantage of easily available and inexpensive commodity resources. For data intensive applications, employing multiple distributed storage resources has many advantages. In this paper, we present a Multi-Storage I/O System (MS-I/O) that can not only effectively manage various distributed storage resources in the system, but also provide novel high performance storage access schemes. MS-I/O employs many state-of-the-art I/O optimizations such as collective I/O, asynchronous I/O etc. and a number of new techniques such as data location, data replication, subfile, superfile and data access history. In addition, many MS-I/O optimization schemes can work simultaneously within a single data access session, greatly improving the performance.

