• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Communication-Efficient Parallel Algorithms for Distributed Random-Access Machines (1988)

by Charles Leiserson, Bruce M. Maggs
Venue:Algorithmica
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 23
Next 10 →

Portable and Efficient Parallel Computing Using the BSP Model

by Mark W. Goudreau, Kevin Lang, Satish B. Rao , Torsten Suel, Thanasis Tsantilas , 1998
"... ... designandimplementationoftheGreenBSPLibrary, asmalllibraryoffunctionsthat implementtheBSPmodel, andofseveralapplicationsthatwerewrittenforthislibrary. wareonavarietyofarchitectures. Ourgoalinthisworkistoexperimentallyexamine thepracticaluseoftheBSPmodeloncurrentparallelarchitectures. Wedescribet ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
... designandimplementationoftheGreenBSPLibrary, asmalllibraryoffunctionsthat implementtheBSPmodel, andofseveralapplicationsthatwerewrittenforthislibrary. wareonavarietyofarchitectures. Ourgoalinthisworkistoexperimentallyexamine thepracticaluseoftheBSPmodeloncurrentparallelarchitectures. Wedescribethe portabilityoverarangeofparallelarchitectures, andshowthattheBSPcostmodelis parallelarchitectures.Ourresultsarepositive, inthatwedemonstrateeciencyand Wethendiscusstheperformanceofthelibraryandapplication programsonseveral N-bodyproblem,parallelcomputing,parallelgraphalgorithms, shortestpathproblem. IndexTerms:BSP,minimumspanningtreeproblem, modelsofparallelcomputation, usefulforpredictingperformancetrendsandestimating execution times.

The Fat-Pyramid: A Robust Network for Parallel Computation

by Ronald Greenberg - Advanced Research in VLSI: Proceedings of the Sixth MIT Conference , 1990
"... This paper shows that a fat-pyramid of area \Theta(A) built from processors of size lg A requires only O(lg 2 A) slowdown in bit-times to simulate any network of area A under very general conditions. Specifically, there is no restriction on processor size (amount of attached memory) or number of p ..."
Abstract - Cited by 6 (2 self) - Add to MetaCart
This paper shows that a fat-pyramid of area \Theta(A) built from processors of size lg A requires only O(lg 2 A) slowdown in bit-times to simulate any network of area A under very general conditions. Specifically, there is no restriction on processor size (amount of attached memory) or number of processors in the competing network, nor is the assumption of unit wire delay required. This paper also derives upper bounds on the slowdown required by a fat-pyramid to simulate a network of larger area in the case of unit wire delay. 1 Introduction This paper introduces the fat-pyramid network and shows that it is a good candidate as the basis for a general-purpose parallel computer. The flexibility of this network stems from its ability to efficiently simulate any other network of comparable physical size under general conditions. Especially notable is the capability of the fat-pyramid to contend with the issue of long wires. Previous work on universal networks has generally assumed that ...

A General-Purpose Model for Heterogeneous Computation

by Tiffani L. Williams, Major Professor, Rebecca Parsons, C Fl Tiffani L. Williams , 2000
"... Heterogeneous computing environments are becoming an increasingly popular platform for executing parallel applications. Such environments consist of a diverse set of machines and offer considerably more computational power at a lower cost than a parallel computer. Efficient heterogeneous parallel ap ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
Heterogeneous computing environments are becoming an increasingly popular platform for executing parallel applications. Such environments consist of a diverse set of machines and offer considerably more computational power at a lower cost than a parallel computer. Efficient heterogeneous parallel applications must account for the differences inherent in such an environment. For example, faster machines should possess more data items than their slower counterparts and communication should be minimized over slow network links. Current parallel applications are not designed with such heterogeneity in mind. Thus, a new approach is necessary for designing efficient heterogeneous parallel programs.

Parallel Priority Queue and List Contraction: The BSP Approach

by Alexandros V. Gerbessiotis, Ros V. Gerbessiotis, Alexandre Tiskin, Constantinos J. Siniolakis, Re Tiskin - In Proc. Euro-Par 97. LNCS , 1997
"... . In this paper we present efficient and practical extensions of the randomized Parallel Priority Queue (PPQ) algorithms of Ranade et al., and efficient randomized and deterministic algorithms for the problem of list contraction on the Bulk-Synchronous Parallel (BSP) model. We also present an experi ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
. In this paper we present efficient and practical extensions of the randomized Parallel Priority Queue (PPQ) algorithms of Ranade et al., and efficient randomized and deterministic algorithms for the problem of list contraction on the Bulk-Synchronous Parallel (BSP) model. We also present an experimental study of their performance. We show that our algorithms are communication efficient and achieve small multiplicative constant factors for a wide range of parallel machines. 1 Introduction We present an architecture independent study of the computation and communication requirements of an efficient Parallel Priority Queue (PPQ) implementation and list contraction algorithms along with an experimental study. The computational model adopted is the Bulk-Synchronous Parallel (BSP) model, proposed by L. G. Valiant [20], which deals explicitly with the notion of communication and synchronization among computational threads. A detailed discussion of the BSP model appears in [20]. The first a...

Efficient Personalized Communication on Wormhole Networks

by Fabrizio Petrini, Marco Vanneschi - 1997 International Conference on Parallel Architectures and Compilation Techniques, PACT'97 , 1997
"... Bridging models, as the BSP, tend to abstract the characteristics of the interconnection networks using a small set of parameters, by dividing the computation in supersteps and organizing the communication in global patterns called h-relations. In this paper we evaluate, through experimental results ..."
Abstract - Cited by 4 (4 self) - Add to MetaCart
Bridging models, as the BSP, tend to abstract the characteristics of the interconnection networks using a small set of parameters, by dividing the computation in supersteps and organizing the communication in global patterns called h-relations. In this paper we evaluate, through experimental results conducted on a wormhole-routed bi-dimensional torus and a quaternary fat-tree with 256 processing nodes, the execution time of three families of h-relations with variable degree of imbalance. We also prove a strong result that links the communication performance of the fat-tree with the BSP abstraction of the interconnection network. Given a generic h-relation, we can provide a value of g that, in the worst case, slightly overestimates the completion time and is very close to optimality.

WHAT GOOD ARE SHARED-MEMORY MODELS?

by Phillip B. Gibbons - INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING , 1996
"... Shared memory models have been criticized for years for failing to model essential realities of parallel machines. Given the current wave of popular message-passing and distributed memory models (e.g., BSP, LOGP), it is natural to ask whether shared memory models have outlived any usefulness they ma ..."
Abstract - Cited by 3 (1 self) - Add to MetaCart
Shared memory models have been criticized for years for failing to model essential realities of parallel machines. Given the current wave of popular message-passing and distributed memory models (e.g., BSP, LOGP), it is natural to ask whether shared memory models have outlived any usefulness they may have had. In this invited position papel; we discuss the continuing importance of shared memory models in the design and analysis of par-allel algorithms. We describe a new model, the Queuing Shared Memory (QSM) model, that accounts for limited communication bandwidth while still providing a shared memory abstraction, and provide evidence of its practicality. Finally, we discuss important areas for future models research. We argue that the compelling need for parallel computing in large scale data analysis (e.g., decision support, data mining) implies that the most important modeling issue going forward concerns how best to model disk I/O.

The fat-stack and universal routing in interconnection networks

by Kevin F. Chen, Edwin H. -m. Sha - In Proceedings of the ISCA 17th International Conference on Parallel and Distributed Computing Systems , 2004
"... This paper shows that a novel network called the fat-stack is universally efficient and is suitable for use as an interconnection network in parallel computers. A requirement for the fat-stack to be universal is that link capacities double up the levels of the network. The fat-stack resembles the fa ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
This paper shows that a novel network called the fat-stack is universally efficient and is suitable for use as an interconnection network in parallel computers. A requirement for the fat-stack to be universal is that link capacities double up the levels of the network. The fat-stack resembles the fat-tree and the fat-pyramid in hardware structure, but it has unique strengths. It is a construct of an atomic subnetwork unit consisting of one ring and one or more upward links to an upper subnetwork. This simple structure entails easy wirability. The network also uses fewer wires. More importantly, it has the capability to scale up to represent a large-scale distributed network. We developed efficient routing algorithms specific to the fat-stack. Our universality proof shows that a fat-stack variant with increased links and of area Θ(A) can simulate any competing network of area A with O(logA) overhead independently of wire delay. The universality result implies that the augmented fat-stack of a given size is nearly the best routing network of that size. The augmented fat-stack is the minimal universal network for an O(logA) overhead in terms of hardware usage. Actual simulations show that the performance of the augmented fat-stack approaches that of the fat-pyramid and is far higher than that of the fat-tree.

Broadcast and Associative Operations on Fat-Trees

by G. Bilardi, B. Codenotti, G. Del Corso, C. Pinotti, G. Resta , 1996
"... This paper analyzes the cost of performing broadcast, product and prefix computation on the ideal fat-tree, a model proposed here to capture distance and bandwidth properties common to a variety of fat-tree networks. Algorithms are developed and analyzed in terms of the capacity of channels at d ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
This paper analyzes the cost of performing broadcast, product and prefix computation on the ideal fat-tree, a model proposed here to capture distance and bandwidth properties common to a variety of fat-tree networks. Algorithms are developed and analyzed in terms of the capacity of channels at different levels of the fat-tree. Non trivial lower bounds are derived establishing the optimality of our algorithms for a wide range of channel capacities. 1 Introduction A number of networks have been introduced in the literature and referred to as fat-trees, e.g., the concentrator fat-tree, the pruned-butterfly fat-tree, and the sorting fat-tree. Loosely speaking, a fat-tree is a tree whose leaves act as input/output terminals, whose internal nodes are subnetworks with switching capability, and whose edges are channels of appropriate capacity. Proposed fat-trees differ in node structure and channel capacities. Fat-trees have interesting universality properties in VLSI and form the bas...

Network Performance under Physical Constraints

by Fabrizio Petrini, Marco Vanneschi - the International Conference on Parallel Processing 1997, ICPP'97 , 1997
"... The performance of an interconnection network in a massively parallel architecture is subject to physical constraints whose impact needs to be re-evaluated from time to time. Fat-trees and low dimensional cubes have raised a great interest in the scientific community in the last few years and are em ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
The performance of an interconnection network in a massively parallel architecture is subject to physical constraints whose impact needs to be re-evaluated from time to time. Fat-trees and low dimensional cubes have raised a great interest in the scientific community in the last few years and are emerging standards in the design of interconnection networks for massively parallel computers.

Power of fast VLSI models is insensitive to wires’ thinness

by Gene Itkis, Leonid A. Levin - Research Triangle Park, North Carolina
"... Abstract VLSI f-models allow the switching time to decrease to f(D) when the length of all wires is restricted by D. ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Abstract VLSI f-models allow the switching time to decrease to f(D) when the length of all wires is restricted by D.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University