Results 1  10
of
12
Effects of communication latency, overhead, and bandwidth in a cluster architecture
 In Proceedings of the 24th Annual International Symposium on Computer Architecture
, 1997
"... This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on ..."
Abstract

Cited by 108 (6 self)
 Add to MetaCart
(Show Context)
This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on a wide range of applications. Our results indicate that current efforts to improve cluster communication performance to that of tightly integrated parallel machines results in significantly improved application performance. We show that applications demonstrate strong sensitivity to overhead, slowing down by a factor of 60 on 32 processors when overhead is increased from 3 to 103 s. Applications in this study are also sensitive to permessage bandwidth, but are surprisingly tolerant of increased latency and lower perbyte bandwidth. Finally, most applications demonstrate a highly linear dependence to both overhead and permessage bandwidth, indicating that further improvements in communication performance will continue to improve application performance. 1
Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems
 ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1998
"... In this thesis, we formalize the concept of an implicitlycontrolled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing natural ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
(Show Context)
In this thesis, we formalize the concept of an implicitlycontrolled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing naturallyoccurring local events and their corresponding implicit information, i.e., information available outside of a defined interface. Many systems, particularly in distributed and networked environments, have leveraged implicit control to simplify the implementation of services with autonomous components. To concretely demonstrate the advantages of implicit control, we propose and implement implicit coscheduling, an algorithm for dynamically coordinating the time...
Connected Components on Distributed Memory Machines
 Parallel Algorithms: 3rd DIMACS Implementation Challenge October 1719, 1994, volume 30 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science
, 1994
"... . The efforts of the theory community to develop efficient PRAM algorithms often receive little attention from application programmers. Although there are PRAM algorithm implementations that perform reasonably on shared memory machines, they often perform poorly on distributed memory machines, where ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
. The efforts of the theory community to develop efficient PRAM algorithms often receive little attention from application programmers. Although there are PRAM algorithm implementations that perform reasonably on shared memory machines, they often perform poorly on distributed memory machines, where the cost of remote memory accesses is relatively high. We present a hybrid approach to solving the connected components problem, whereby a PRAM algorithm is merged with a sequential algorithm and then optimized to create an efficient distributed memory implementation. The sequential algorithm handles local work on each processor, and the PRAM algorithm handles interactions between processors. Our hybrid algorithm uses the ShiloachVishkin CRCW PRAM algorithm on a partition of the graph distributed over the processors and sequential breadthfirst search within each local subgraph. The implementation uses the SplitC language developed at Berkeley, which provides a global address space and al...
Portable and Efficient Parallel Computing Using the BSP Model
, 1998
"... ... designandimplementationoftheGreenBSPLibrary, asmalllibraryoffunctionsthat implementtheBSPmodel, andofseveralapplicationsthatwerewrittenforthislibrary. wareonavarietyofarchitectures. Ourgoalinthisworkistoexperimentallyexamine thepracticaluseoftheBSPmodeloncurrentparallelarchitectures. Wedescribet ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
(Show Context)
... designandimplementationoftheGreenBSPLibrary, asmalllibraryoffunctionsthat implementtheBSPmodel, andofseveralapplicationsthatwerewrittenforthislibrary. wareonavarietyofarchitectures. Ourgoalinthisworkistoexperimentallyexamine thepracticaluseoftheBSPmodeloncurrentparallelarchitectures. Wedescribethe portabilityoverarangeofparallelarchitectures, andshowthattheBSPcostmodelis parallelarchitectures.Ourresultsarepositive, inthatwedemonstrateeciencyand Wethendiscusstheperformanceofthelibraryandapplication programsonseveral Nbodyproblem,parallelcomputing,parallelgraphalgorithms, shortestpathproblem. IndexTerms:BSP,minimumspanningtreeproblem, modelsofparallelcomputation, usefulforpredictingperformancetrendsandestimating execution times.
Some Results on Ongoing Research on Parallel Implementation of Graph Algorithms
, 1997
"... In high performance computing, three recognized important points are usability, scalability and portability. No models seemed to satisfy these three steps till recently: a few proposed models try to fulfill the previous goals. Among them, the BSPlike CGM model seemed adapted to us to facilitate the ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
In high performance computing, three recognized important points are usability, scalability and portability. No models seemed to satisfy these three steps till recently: a few proposed models try to fulfill the previous goals. Among them, the BSPlike CGM model seemed adapted to us to facilitate the way between algorithms design and real implementations. Many algorithms have been designed but few implementations have been carried out to demonstrate the practical relevance of this model. In this article, we propose to test this model actually on an irregular problem. We present the results of implementations of permutation graph algorithms written in two different models: the PRAM and the BSPlike CGM model. These implementation have been made on a CM5 and a PC cluster. We compare the results of these implementations with the performances of sequential code for this problem. With a classical problem in gaph theory, we validate BSPlike CGM model: it is possible to write portable code o...
Feasibility, Portability, . . . Grained Graph Algorithms
, 2000
"... We study the relationship between the design and analysis of graph algorithms in the coarsed grained parallel models and the behavior of the resulting code on todays parallel machines and clusters. We conclude that the coarse grained multicomputer model (CGM) is well suited to design competitive al ..."
Abstract
 Add to MetaCart
We study the relationship between the design and analysis of graph algorithms in the coarsed grained parallel models and the behavior of the resulting code on todays parallel machines and clusters. We conclude that the coarse grained multicomputer model (CGM) is well suited to design competitive algorithms, and that it is thereby now possible to aim to develop portable, predictable and efficient parallel algorithms code for graph problems.
The Handling of Graphs on PC Clusters: A Coarse Grained Approach
, 2000
"... We study the relationship between the design and analysis of graph algorithms in the coarsed grained parallel models and the behavior of the resulting code on clusters. We conclude that the coarse grained multicomputer model (CGM) is well suited to design competitive algorithms, and that it is there ..."
Abstract
 Add to MetaCart
We study the relationship between the design and analysis of graph algorithms in the coarsed grained parallel models and the behavior of the resulting code on clusters. We conclude that the coarse grained multicomputer model (CGM) is well suited to design competitive algorithms, and that it is thereby now possible to aim to develop portable, predictable and efficient parallel code for graph problems on clusters.
Designing Stimulating Programming Assignments for an Algorithms Course: A Collection of Problems Based on Random Graphs
"... The field of random graphs contains many surprising and interesting results. Here we demonstrate how some of these results can be used to develop stimulating, openended problems for courses in algorithms and data structures or graph theory. Specifically, we provide problems for algorithms that comp ..."
Abstract
 Add to MetaCart
(Show Context)
The field of random graphs contains many surprising and interesting results. Here we demonstrate how some of these results can be used to develop stimulating, openended problems for courses in algorithms and data structures or graph theory. Specifically, we provide problems for algorithms that compute minimum spanning trees, connected components, maximum flows, and shortest paths. 1 Introduction We have found in teaching courses on algorithms and data structures that having students program some of the standard algorithms can be a useful learning experience. It ensures that they understand how the algorithms function, it provides them with experience in turning theoretical results into usable tools, and it demonstrates how theoretical time bounds translate (or fail to translate) into actual running times. We therefore often include programming exercises in our assignments. How do we evaluate the exercises we develop? We have several goals in mind for the exercises will accomplish. Fi...