Results 1 
8 of
8
Parallel Sorting With Limited Bandwidth
 in Proc. 7th ACM Symp. on Parallel Algorithms and Architectures
, 1995
"... We study the problem of sorting on a parallel computer with limited communication bandwidth. By using the recently proposed PRAM(m) model, where p processors communicate through a small, globally shared memory consisting of m bits, we focus on the tradeoff between the amount of local computation an ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
We study the problem of sorting on a parallel computer with limited communication bandwidth. By using the recently proposed PRAM(m) model, where p processors communicate through a small, globally shared memory consisting of m bits, we focus on the tradeoff between the amount of local computation and the amount of interprocessor communication required for parallel sorting algorithms. We prove a lower bound of \Omega\Gamma n log m m ) on the time to sort n numbers in an exclusiveread variant of the PRAM(m) model. We show that Leighton's Columnsort can be used to give an asymptotically matching upper bound in the case where m grows as a fractional power of n. The bounds are of a surprising form, in that they have little dependence on the parameter p. This implies that attempting to distribute the workload across more processors while holding the problem size and the size of the shared memory fixed will not improve the optimal running time of sorting in this model. We also show that bot...
New Coding Techniques for Improved Bandwidth Utilization
 In Proc. 37th IEEE Symp. on Foundations of Computer Science
, 1998
"... this paper, we introduce a new coding technique for transmitting the XOR of carefully selected patterns of bits to be communicated which greatly reduces bandwidth requirements in some settings. This technique has broader applications. For example, we demonstrate that the coding technique has a surpr ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
this paper, we introduce a new coding technique for transmitting the XOR of carefully selected patterns of bits to be communicated which greatly reduces bandwidth requirements in some settings. This technique has broader applications. For example, we demonstrate that the coding technique has a surprising application to a simple I/O (Input / Output) complexity problem related to finding the transpose of a matrix. Our main results are developed in the PRAM(m) model, a limited bandwidth PRAM model where p processors communicate through a small globally shared memory of m bits. We provide new algorithms for the problems of sorting and permutation routing. For the concurrent read PRAM(m), as p grows with m
The Design and Analysis of BulkSynchronous Parallel Algorithms
, 1998
"... The model of bulksynchronous parallel (BSP) computation is an emerging paradigm of generalpurpose parallel computing. This thesis presents a systematic approach to the design and analysis of BSP algorithms. We introduce an extension of the BSP model, called BSPRAM, which reconciles sharedmemory s ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
The model of bulksynchronous parallel (BSP) computation is an emerging paradigm of generalpurpose parallel computing. This thesis presents a systematic approach to the design and analysis of BSP algorithms. We introduce an extension of the BSP model, called BSPRAM, which reconciles sharedmemory style programming with efficient exploitation of data locality. The BSPRAM model can be optimally simulated by a BSP computer for a broad range of algorithms possessing certain characteristic properties: obliviousness, slackness, granularity. We use BSPRAM to design BSP algorithms for problems from three large, partially overlapping domains: combinatorial computation, dense matrix computation, graph computation. Some of the presented algorithms are adapted from known BSP algorithms (butterfly dag computation, cube dag computation, matrix multiplication). Other algorithms are obtained by application of established nonBSP techniques (sorting, randomised list contraction, Gaussian elimination without pivoting and with column pivoting, algebraic path computation), or use original techniques specific to the BSP model (deterministic list contraction, Gaussian elimination with nested block pivoting, communicationefficient multiplication of Boolean matrices, synchronisationefficient shortest paths computation). The asymptotic BSP cost of each algorithm is established, along with its BSPRAM characteristics. We conclude by outlining some directions for future research.
A framework for measuring supercomputer productivity
 The International Journal of High Performance Computing Applications, (18)4, Winter
, 2004
"... We propose a framework for measuring the productivity of high performance computing (HPC) systems, based on common economic definitions of productivity and on utility theory. We discuss how these definitions can capture essential aspects of HPC systems, such as the importance of timetosolution and ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
We propose a framework for measuring the productivity of high performance computing (HPC) systems, based on common economic definitions of productivity and on utility theory. We discuss how these definitions can capture essential aspects of HPC systems, such as the importance of timetosolution and the tradeoff between programming time and execution time. Finally, we outline a research program that would lead to the definition of effective productivity metrics for HPC that fit within the proposed framework.
Some Models for Scheduling Parallel Programs with Communication Delays
 Discrete Applied Mathematics, special issue on scheduling
, 1997
"... The aim of this paper is to present and analyze models for designing parallel programs. In the context of some extensions of the most popular execution models (precedence graphs, dataflow, PRAM), we describe scheduling techniques which take into account the communication delays. We illustrate all ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
The aim of this paper is to present and analyze models for designing parallel programs. In the context of some extensions of the most popular execution models (precedence graphs, dataflow, PRAM), we describe scheduling techniques which take into account the communication delays. We illustrate all these models by two families of representative precedence graphs, namely, grids and complete trees.
Transgressing The Boundaries: Unified Scalable Parallel Programming
, 1996
"... The diverse architectural features of parallel computers, and the lack of commonly accepted parallelprogramming environments, meant that software development for these systems has been significantly more difficult than the sequential case. Until better approaches are developed, the programming envi ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The diverse architectural features of parallel computers, and the lack of commonly accepted parallelprogramming environments, meant that software development for these systems has been significantly more difficult than the sequential case. Until better approaches are developed, the programming environment will remain a serious obstacle to mainstream scalable parallel computing. The work reported in this paper attempts to integrate architectureindependent scalable parallel programming in the Bulk Synchronous Parallel (BSP) model with the sharedmemory parallel programming using the theoretical PRAM model. We start with a discussion of problem parallelism, that is, the parallelism inherent to a problem instead of a specific algorithm, and the parallelprogramming techniques that allow the capture of this notion. We then review the ubiquitous PRAM model in terms of the model's pragmatic limitations, where particular attention is paid to simulations on practical machines. The BSP model i...
paper evolved as the result of discussions that involved
"... We propose a framework for measuring the productivity of high performance computing (HPC) systems, based on common economic definitions of productivity and on utility theory. We discuss how these definitions can capture essential aspects of HPC systems, such as the importance of timetosolution and ..."
Abstract
 Add to MetaCart
We propose a framework for measuring the productivity of high performance computing (HPC) systems, based on common economic definitions of productivity and on utility theory. We discuss how these definitions can capture essential aspects of HPC systems, such as the importance of timetosolution and the tradeoff between programming time and execution time. Finally, we outline a research program that would lead to the definition of effective productivity metrics for HPC that fit within the proposed framework.
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH
"... This paper is devoted to the study of treescheduling problems within the execution model described by Anderson, Beame and Ruzzo. We first prove the NPcompleteness of the problem of minimizing the overhead for scheduling trees on m processors, and then we propose an algorithm that provides optimal ..."
Abstract
 Add to MetaCart
This paper is devoted to the study of treescheduling problems within the execution model described by Anderson, Beame and Ruzzo. We first prove the NPcompleteness of the problem of minimizing the overhead for scheduling trees on m processors, and then we propose an algorithm that provides optimal schedules when complete trees are considered.