Results 1 -
7 of
7
Compiler-Directed I/O Optimization
- In Proceedings of the 16th International Symposium on Parallel and Distributed Processing (2002
"... Despite continued innovations in design of I/O systems, I/O performance has not kept pace with the progress in processor and communication technology. This paper addresses this I/O problem from a compiler’s perspective, and presents an I/O optimization strategy based on access pattern and storage fo ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Despite continued innovations in design of I/O systems, I/O performance has not kept pace with the progress in processor and communication technology. This paper addresses this I/O problem from a compiler’s perspective, and presents an I/O optimization strategy based on access pattern and storage form (file layout) detection. The objective of our optimization strategy is to determine storage forms for array-based data sets taking into account future use of data (future access patterns). To tackle this problem, we present a three-step strategy: (i) determining all I/O access patterns to the array, and among them, selecting the most dominant (i.e., the most beneficial) access pattern; (ii) determining the most suitable storage form for the array taking into account the most dominant access pattern detected in the previous step; and (iii) optimizing the non-dominant access patterns using collective I/O, an optimization that allows each processor to do I/O on behalf of others if doing so improves overall performance.
Speeding Up Automatic Parallel I/O Performance Optimization In Panda
, 1997
"... : The large number of system components and their complex interactions in a parallel I/O system, together with dynamically changing I/O patterns in scientific applications, impose a great challenge in selecting optimal I/O plans for an anticipated I/O workload in a target execution environment. Pre ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
: The large number of system components and their complex interactions in a parallel I/O system, together with dynamically changing I/O patterns in scientific applications, impose a great challenge in selecting optimal I/O plans for an anticipated I/O workload in a target execution environment. Previous research has shown that a model-based approach that uses a performance model of the parallel I/O system to predict the performance of the parallel I/O system for a given I/O plan, coupled with an effective search algorithm, i.e., simulated annealing, can identify a high quality I/O plan automatically. However, to be truly successful, such automatic strategies must not only be capable of selecting optimal I/O plans for a parallel I/O system, but also be able to select them quickly. In this paper, we study the cost of optimization when using the modelbased approach. We identify the major performance factors that affect the optimization time, and present techniques used to speed up the ...
A Blackboard Approach for the Automatic Optimization of Parallel I/O Operations
- I/O operations, PaCT'99 (St
, 1999
"... . The performance of parallel I/O operations is highly dependent on various parameters like disk transfer rates, speed of processor (network) interconnections, size of available memory for data buers and so forth. Tuning of parallel I/O to achieve optimum performance is a very complex task for a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
. The performance of parallel I/O operations is highly dependent on various parameters like disk transfer rates, speed of processor (network) interconnections, size of available memory for data buers and so forth. Tuning of parallel I/O to achieve optimum performance is a very complex task for application programmers. This paper presents a method to perform I/O optimization automatically. The approach used is based on a combination of a blackboard system and an A algorithm, which allows to achieve (near) optimal performance in reasonable time. The architecture of the blackboard is described in detail and illustrated on an example based on a simple cost model. 1 Introduction Parallel I/O has been an important topic in high performance computing research in the last few years. Many parallel le systems and I/O libraries have been developed. These are either proprietary systems, which are sold with speci c hardware (e.g. IBM's Vesta [2]) or portable multipurpose systems th...
Dealing with Massive Data: From Parallel I/O to Grid I/O
, 2003
"... Acknowledgements Many people have helped us find our way during the development of this thesis. Erich Schikuta, our supervisor, provided a motivating, enthusiastic, and critical atmosphere dur-ing our discussions. It was a great pleasure for us to conduct this thesis under his su-pervision. We also ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Acknowledgements Many people have helped us find our way during the development of this thesis. Erich Schikuta, our supervisor, provided a motivating, enthusiastic, and critical atmosphere dur-ing our discussions. It was a great pleasure for us to conduct this thesis under his su-pervision. We also acknowledge Heinz and Kurt Stockinger who provided constructive comments. We would also like to thank everybody for providing us with feedback.
Improving I/O Performance of Applications through Compiler-Directed Code Restructuring
- FAST'08
, 2008
"... Ever-increasing complexity of large-scale applications and continuous increases in sizes of the data they process make the problem of maximizing performance of such applications a very challenging task. In particular, many challenging applications from the domains of astrophysics, medicine, biology, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Ever-increasing complexity of large-scale applications and continuous increases in sizes of the data they process make the problem of maximizing performance of such applications a very challenging task. In particular, many challenging applications from the domains of astrophysics, medicine, biology, computational chemistry, and materials science are extremely data intensive. Such applications typically use a disk system to store and later retrieve their large data sets, and consequently, their disk performance is a critical concern. Unfortunately, while disk density has significantly improved over the last couple of decades, disk access latencies have not. As a result, I/O is increasingly becoming a bottleneck for dataintensive applications, and has to be addressed at the software level if we want to extract the maximum performance from modern computer architectures. This paper presents a compiler-directed code restructuring scheme for improving the I/O performance of data-intensive scientific applications. The proposed approach improves I/O performance by reducing the number of disk accesses through a new concept called disk reuse maximization. In this context, disk reuse refers to reusing the data in a given set of disks as much as possible before moving to other disks. Our compiler-based approach restructures application code, with the help of a polyhedral tool, such that disk reuse is maximized to the extent allowed by intrinsic data dependencies in the application code. The proposed optimization can be applied to each loop nest individually or to the entire application code. The experiments show that the average I/O improvements brought by the loop nest based version of our approach are 9.0 % and 2.7%, over the original application codes and the codes optimized using conventional schemes, respectively. Further, the average improvements obtained when our approach is applied to the entire application code are 15.0 % and 13.5%, over the original application codes and the codes optimized using
Efficient Input and Output for Scientific Simulations
- In Proceedings of I/O in Parallel and Distributed Systems (IOPADS
, 1999
"... Large simulations which run for hundreds of hours on parallel computers often periodically generate snapshots of states, which are later post-processed to visualize the simulated physical phenomenon. For many applications, fast I/O during post-processing, which is dependent on an efficient organiza ..."
Abstract
- Add to MetaCart
Large simulations which run for hundreds of hours on parallel computers often periodically generate snapshots of states, which are later post-processed to visualize the simulated physical phenomenon. For many applications, fast I/O during post-processing, which is dependent on an efficient organization of data on disk, is as important as minimizing computation-time I/O. In this paper we propose optimizations to support efficient parallel I/O for scientific simulations and subsequent visualizations. We present an ordering mechanism to linearize data on disk, a performance model to help to choose a proper stripe unit size, and a scheduling algorithm to minimize communication contention. Our experiments on an IBM SP show that the combination of these strategies provides a 20-25% performance boost. 1 Introduction Progress in developing high-performance processors and fast-speed networks has made large-scale scientific simulations possible, such as the simulations of nuclear devices that...
Performance Evaluation of Parallel I/O in Cluster Environments
"... Clusters' have been increasingly widely used for scientific and commercial applications. In a cluster environment, scientific application distributed their data across multiple computation nodes. In order to improve the performance of the clusters', many issues in parallel I/0 have to be judiciously ..."
Abstract
- Add to MetaCart
Clusters' have been increasingly widely used for scientific and commercial applications. In a cluster environment, scientific application distributed their data across multiple computation nodes. In order to improve the performance of the clusters', many issues in parallel I/0 have to be judiciously investigated. These issues include: parallel file systems, access patterns, low-level I/0 interface, scientific data libraries, and data management. In this paper, we address the bottleneck and performance factors' of parallel I/0 in a cluster environment. Our experiment shows that network is one of the potential bottlenecks' in cluster-based parallel I/0. Furthermore, the performance of the distributed RAID5, which is built on the network block device (NBD) installed on the clusters' in our department, is evaluated and compared with single disk I/0. The experiment results confirm that, in most situations, the performance of distributed RAID is noticeably better than that of single disk system. Lastly, the experiment results' indicate that file size and block size have significant impact on the performance of both single disk system and distributed RAID on clusters'.

