Results 1 -
7 of
7
Parallel I/O for Scientific Applications on Heterogeneous Clusters: A Resource-utilization Approach
- In: Proceedings of Supercomputing
, 1999
"... Clusters of workstations and PCs are gaining more popularity as an economical resource for parallel computing. However, clusters are prone to be heterogeneous in I/O as well as in processing power. As a result, the performance of I/O intensive parallel scientific applications in which processes are ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Clusters of workstations and PCs are gaining more popularity as an economical resource for parallel computing. However, clusters are prone to be heterogeneous in I/O as well as in processing power. As a result, the performance of I/O intensive parallel scientific applications in which processes are closely synchronized together can be very poor, due to a bottleneck caused by the placement of one or more I/O servers without appropriate consideration of I/O resources available. This paper presents heuristics to choose the number of I/O servers and place them on physical processors in a way to better exploit disk I/O resources and minimize total I/O time in heterogeneous cluster environments. Our experimental results show that the heuristics improve overall parallel I/O performance as much as by a factor of 2--4 on a cluster of SMPs, compared with the simple placements typically used in runtime I/O libraries and applications. Our approach is also suitable for other parallel I/O libraries,...
Parallel I/O on Networks of Workstations: Performance Improvement by Careful Placement of I/O Servers
"... Thanks to powerful processors, fast interconnects, and portable message passing libraries like PVM and MPI, networks of inexpensive workstations are getting more popular as an economical way to run highperformance parallel scientific applications. On traditional massively parallel processors, pe ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Thanks to powerful processors, fast interconnects, and portable message passing libraries like PVM and MPI, networks of inexpensive workstations are getting more popular as an economical way to run highperformance parallel scientific applications. On traditional massively parallel processors, performance of parallel I/O is most often limited by disk bandwidth, though the performance of other system components, especially the interconnect, can at times be a limiting factor. In this paper, we show that the performance of parallel I/O on commodity clusters is often significantly affected not only by disk speed but also by the interconnect network throughput, I/O bus capacity and load imbalance caused by heterogeneity of nodes. Specifically, we present our experimental results from reading and writing large multidimensional arrays with the Panda I/O library, on two significantly different clusters: HP workstations connected by FDDI and HP PCs connected by Myrinet. We also dis...
Design and Analysis of A Parallel File System for Distributed Shared Memory Systems
- Journal of Systems Architecture
, 1994
"... File accesses are usually sequentially performed in existing Distributed Shared Memory (DSM) systems. These sequential file accesses will result in the accumulation of input data at the node that handles file operations, generating a bottleneck at that node and a large amount of network traffic to m ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
File accesses are usually sequentially performed in existing Distributed Shared Memory (DSM) systems. These sequential file accesses will result in the accumulation of input data at the node that handles file operations, generating a bottleneck at that node and a large amount of network traffic to move the input data to other nodes for execution. Although the file access time is often neglected in the performance evaluation of DSM systems, it is not ignored by programmers in real life. In this paper, we will describe the design and analysis of a parallel file system for DSM systems. File accesses are carried out in parallel and a modified file access mechanism is provided to reduce network traffic. Our analysis shows that the overall performance of some I/O-intensive DSM applications such as Successive Over Relaxation can be significantly enhanced with our design in the best case. To approach this best case, we have proposed two implementations. Both implementations employ variable-dis...
Parallel I/O for Distributed Systems: Issues and Implementation
- UPM -- DATSI
, 1996
"... Parallel and distributed computing have matured sufficiently for their adoption in production environments, consequently necessitating effective, robust, and efficient frameworks for input and output. A number of concurrent I/O initiatives have evolved in response to these needs, some system specifi ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Parallel and distributed computing have matured sufficiently for their adoption in production environments, consequently necessitating effective, robust, and efficient frameworks for input and output. A number of concurrent I/O initiatives have evolved in response to these needs, some system specific, others proposing an abstract framework and portable interface. These parallel and distributed I/O efforts focus either on characterizing file access patterns, language extensions, runtime libraries, or low level primitives. Important systems issues in each approach, technical details of various strategies, and their effect on performance are analyzed. The PIOUS system, a transport independent, scalable I/O framework for parallel and distributed systems is presented, and experiences with its use are reported. 1 Introduction It is well documented that the performance of many scientific and commercial applications, including the Grand Challenge problems [4], is often limited by the performa...
Improving the Performance of Distributed Shared Memory Systems via Parallel File Input/Output
"... File accesses in page-based software Distributed Shared Memory (DSM) systems are usually performed by a single node, which may lead to a poor overall performance because a large amount of network traffic is generated to transfer data between this file handling node and the other nodes. To reduce the ..."
Abstract
- Add to MetaCart
File accesses in page-based software Distributed Shared Memory (DSM) systems are usually performed by a single node, which may lead to a poor overall performance because a large amount of network traffic is generated to transfer data between this file handling node and the other nodes. To reduce the file-related network traffic in the DSM systems, we have designed a parallel file I/O system, that is independent of the memory consistency models, for the pagebased software DSM systems built on a network of workstations. The two main features in our design are the adaptive data distribution scheme and the delayed file access mechanism. The former distributes file blocks among the nodes according to the access pattern of the application; while the latter ensures that the data are transferred to the consumer node instead of the request node by exploiting the memory mapping features of the virtual shared address space of the DSM systems. Our first prototype is built on Cohesion, a page-base ...
Glossary on Parallel Input/Output
"... Device Interface for Portable Parallel-I/O) Since there is no standard API for parallel I/O, ADIO is supposed to provide a strategy for implementing APIs (not a standard) in a simple, portable and efficient way [40] in order to take the burden of the programmer of choosing from several different AP ..."
Abstract
- Add to MetaCart
Device Interface for Portable Parallel-I/O) Since there is no standard API for parallel I/O, ADIO is supposed to provide a strategy for implementing APIs (not a standard) in a simple, portable and efficient way [40] in order to take the burden of the programmer of choosing from several different APIs. Furthermore, it makes existing applications portable across a wide range of different platforms. An API can be implemented in a portable fashion on top of ADIO, and becomes available on all file systems on which ADIO has been implemented. ADOPT (A Dynamic scheme for Optimal PrefeTching in parallel file systems) ADOPT is a dynamic prefetching scheme that is applicable to any distributed system, but major performance benefits are obtained in distributed memory I/O systems in a parallel processing environment [35]. Efficient accesses and prefetching are supposed to be obtained by exploiting access patterns specified and generated from users or compilers. I/O nodes are assumed to maintain ...
Performance Evaluation of Parallel I/O in Cluster Environments
"... Clusters' have been increasingly widely used for scientific and commercial applications. In a cluster environment, scientific application distributed their data across multiple computation nodes. In order to improve the performance of the clusters', many issues in parallel I/0 have to be judiciously ..."
Abstract
- Add to MetaCart
Clusters' have been increasingly widely used for scientific and commercial applications. In a cluster environment, scientific application distributed their data across multiple computation nodes. In order to improve the performance of the clusters', many issues in parallel I/0 have to be judiciously investigated. These issues include: parallel file systems, access patterns, low-level I/0 interface, scientific data libraries, and data management. In this paper, we address the bottleneck and performance factors' of parallel I/0 in a cluster environment. Our experiment shows that network is one of the potential bottlenecks' in cluster-based parallel I/0. Furthermore, the performance of the distributed RAID5, which is built on the network block device (NBD) installed on the clusters' in our department, is evaluated and compared with single disk I/0. The experiment results confirm that, in most situations, the performance of distributed RAID is noticeably better than that of single disk system. Lastly, the experiment results' indicate that file size and block size have significant impact on the performance of both single disk system and distributed RAID on clusters'.

