Results 1 -
5 of
5
Optimizing Threaded MPI Execution on SMP Clusters
- IN PROC. OF 15TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING
, 2001
"... Our previous work has shown that using threads to execute MPI programs can yield great performance gain on multiprogrammed shared-memory machines. This paper investigates the design and implementation of a thread-based MPI system on SMP clusters. Our study indicates that with a proper design for thr ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
Our previous work has shown that using threads to execute MPI programs can yield great performance gain on multiprogrammed shared-memory machines. This paper investigates the design and implementation of a thread-based MPI system on SMP clusters. Our study indicates that with a proper design for threaded MPI execution, both point-to-point and collective communication performance can be improved substantially, compared to a processbased MPI implementation in a cluster environment. Our contribution includes a hierarchy-aware and adaptive communication scheme for threaded MPI execution and a thread-safe network device abstraction that uses event-driven synchronization and provides separated collective and point-to-point communication channels. This paper describes the implementation of our design and illustrates its performance advantage on a Linux SMP cluster.
WOW: Self-organizing Wide Area Overlay Networks of Virtual Workstations
- In Proc. of the 15th International Symposium on High-Performance Distributed Computing (HPDC-15
, 2006
"... Abstract — This paper describes WOW, a distributed system that combines virtual machine, overlay networking and peerto-peer techniques to create scalable wide-area networks of virtual workstations for high-throughput computing. The system is architected to: facilitate the addition of nodes to a pool ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Abstract — This paper describes WOW, a distributed system that combines virtual machine, overlay networking and peerto-peer techniques to create scalable wide-area networks of virtual workstations for high-throughput computing. The system is architected to: facilitate the addition of nodes to a pool of resources through the use of system virtual machines (VMs) and self-organizing virtual network links; to maintain IP connectivity even if VMs migrate across network domains; and to present to end-users and applications an environment that is functionally identical to a local-area network or cluster of workstations. We describe a novel, extensible user-level decentralized technique to discover, establish and maintain overlay links to tunnel IP packets over different transports (including UDP and TCP) and across firewalls. We also report on several experiments conducted on a testbed WOW deployment with 118 P2P router nodes over PlanetLab and 33 VMware-based VM nodes distributed across six firewalled domains. Experiments show that the latency in joining a WOW network is of the order of seconds: in a set of 300 trials, 90 % of the nodes self-configured P2P routes within 10 seconds, and more than 99 % established direct connections to other nodes within 200 seconds. Experiments also show that the testbed delivers good performance for two unmodified, representative benchmarks drawn from the life-sciences domain. The testbed WOW achieves an overall throughput of 53 jobs/minute for PBS-scheduled executions of the MEME application (with average single-job sequential running time of 24.1s) and a parallel speedup of 13.5 for the PVM-based fastDNAml application. Experiments also demonstrate that the system is capable of seamlessly maintaining connectivity at the virtual IP layer for typical client/server applications (NFS, SSH, PBS) when VMs migrate across a WAN. I.
Performance Monitoring in a Myrinet-Connected Shrimp Cluster
, 1998
"... Performance monitoring is a crucial aspect of parallel programming. Extractingthe best possible performance from the system is the main goal of parallel programming, and monitoring tools are often essential to achieving that goal. Acommon tradeoff arises in determining at which system level to monit ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Performance monitoring is a crucial aspect of parallel programming. Extractingthe best possible performance from the system is the main goal of parallel programming, and monitoring tools are often essential to achieving that goal. Acommon tradeoff arises in determining at which system level to monitor performance information and present results. High-level monitoring approaches can often gather data directly tied to the software programming model, but may abstract away crucial low-level hardware details. Lowlevel monitoring approaches can gather fairly complete performance information about the underlyingsystem,but often at the expense of portability and flexibility. In this paper we discuss a compromise approach between the portabilityand flexibility of high-level monitoring and the detailed data awareness of low-level monitoring. We present a firmware-based performance monitor we designed for a Myrinet-connected Shrimp cluster. This monitor combines the portability and flexibilityty...
RepStore: A Self-Managing and Self-Tuning Storage Backend with Smart Bricks
"... With the continuously improving priceperformance ratio, building large, smart-brick based distributed storage system becomes increasingly attractive. The challenges, however, include not only reliability, adequate cost-performance ratio, online upgrades and so on, but also the system’s ability to ac ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
With the continuously improving priceperformance ratio, building large, smart-brick based distributed storage system becomes increasingly attractive. The challenges, however, include not only reliability, adequate cost-performance ratio, online upgrades and so on, but also the system’s ability to achieve these goals in as selfmanaging and self-adaptive a manner as possible. In this paper, we describe RepStore, a system that fulfills these goals. RepStore unites the self-organizing capability of P2P DHT and the completely autonomous, per-brick tuning mechanism to derive a scalable and cost-effective architecture. RepStore employs replication for active write-intensive data and erasure-coding for the rest, strives to achieve the best cost-performance balance automatically and transparent to application, and does so in a completely distributed manner. Our preliminary evaluations reveal that the system performs much as expected, achieving performance and reliability closer to a 3-way fully replicated system with only 60 % of the cost.
Fault-Tolerant Cluster Of Networking Elements
, 2001
"... The explosive growth of the Internet demands higher reliability and performance than what the current networking infrastructure can provide. This dissertation explores novel architectures and protocols that provide a methodology for grouping together multiple networking elements, such as routers, ..."
Abstract
- Add to MetaCart
The explosive growth of the Internet demands higher reliability and performance than what the current networking infrastructure can provide. This dissertation explores novel architectures and protocols that provide a methodology for grouping together multiple networking elements, such as routers, gateways, and switches, to create a more reliable and performant distributed networking system. Clustering of networking elements is a novel concept that requires the invention of distributed computing protocols that facilitate efficient and robust support of networking protocols. We introduce the Raincore protocol architecture that achieves these goals by bridging the fields of computer networks and distributed systems. In designing Raincore, we paid special attention to the unique requirements from the networking environment. First, networking clusters need to scale up the networking throughput in addition to the scaling up of computing power. Second, task switching between the different services supported by a networking element has a major negative impact on performance. Third, fast fail-over time is critical for maintaining network connections in the event of failures. We discuss in depth the design of Raincore Group Communication Manager that addresses the forgoing requirements and provides group membership management and reliable multicast transport. It is based on a novel token-ring protocol. We prove that this protocol is formally correct, namely, it satisfies the set of formal specifications that defines the Group Membership problem. The creation of Raincore has already made a substantial impact both at Caltech and the academic community as well as in the industry. The first application is SNOW, a scalable web server cluster that is part of RAIN, a collabo...

