Results 1  10
of
91
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract

Cited by 498 (14 self)
 Add to MetaCart
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM5.
LogGP: Incorporating Long Messages into the LogP Model  One step closer towards a realistic model for parallel computation
, 1995
"... We present a new model of parallel computationthe LogGP modeland use it to analyze a number of algorithms, most notably, the single node scatter (onetoall personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the comm ..."
Abstract

Cited by 235 (1 self)
 Add to MetaCart
We present a new model of parallel computationthe LogGP modeland use it to analyze a number of algorithms, most notably, the single node scatter (onetoall personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the communication of fixedsized short messages through the use of four parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P ). As evidenced by experimental data, the LogP model can accurately predict communication performance when only short messages are sent (as on the CM5) [CKP + 93, CDMS94]. However, many existing parallel machines have special support for long messages and achieve a much higher bandwidth for long messages compared to short messages (e.g., IBM SP2, Paragon, Meiko CS2, Ncube/2). We extend the basic LogP model with a linear model for long messages. This combination, which we call the LogGP model of parallel computation, has o...
Dynamic Perfect Hashing: Upper and Lower Bounds
, 1990
"... The dynamic dictionary problem is considered: provide an algorithm for storing a dynamic set, allowing the operations insert, delete, and lookup. A dynamic perfect hashing strategy is given: a randomized algorithm for the dynamic dictionary problem that takes O(1) worstcase time for lookups and ..."
Abstract

Cited by 127 (13 self)
 Add to MetaCart
The dynamic dictionary problem is considered: provide an algorithm for storing a dynamic set, allowing the operations insert, delete, and lookup. A dynamic perfect hashing strategy is given: a randomized algorithm for the dynamic dictionary problem that takes O(1) worstcase time for lookups and O(1) amortized expected time for insertions and deletions; it uses space proportional to the size of the set stored. Furthermore, lower bounds for the time complexity of a class of deterministic algorithms for the dictionary problem are proved. This class encompasses realistic hashingbased schemes that use linear space. Such algorithms have amortized worstcase time complexity \Omega(log n) for a sequence of n insertions and
ChernoffHoeffding Bounds for Applications with Limited Independence
 SIAM J. Discrete Math
, 1993
"... ChernoffHoeffding bounds are fundamental tools used in bounding the tail probabilities of the sums of bounded and independent random variables. We present a simple technique which gives slightly better bounds than these, and which more importantly requires only limited independence among the rando ..."
Abstract

Cited by 103 (10 self)
 Add to MetaCart
ChernoffHoeffding bounds are fundamental tools used in bounding the tail probabilities of the sums of bounded and independent random variables. We present a simple technique which gives slightly better bounds than these, and which more importantly requires only limited independence among the random variables, thereby importing a variety of standard results to the case of limited independence for free. Additional methods are also presented, and the aggregate results are sharp and provide a better understanding of the proof techniques behind these bounds. They also yield improved bounds for various tail probability distributions and enable improved approximation algorithms for jobshop scheduling. The "limited independence" result implies that a reduced amount of randomness and weaker sources of randomness are sufficient for randomized algorithms whose analyses use the ChernoffHoeffding bounds, e.g., the analysis of randomized algorithms for random sampling and oblivious packet routi...
CommunicationEfficient Parallel Sorting
, 1996
"... We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sort ..."
Abstract

Cited by 64 (2 self)
 Add to MetaCart
We study the problem of sorting n numbers on a pprocessor bulksynchronous parallel (BSP) computer, which is a parallel multicomputer that allows for general processortoprocessor communication rounds provided each processor sends and receives at most h items in any round. We provide parallel sorting methods that use internal computation time that is O( n log n p ) and a number of communication rounds that is O( log n log(h+1) ) for h = \Theta(n=p). The internal computation bound is optimal for any comparisonbased sorting algorithm. Moreover, the number of communication rounds is bounded by a constant for the (practical) situations when p n 1\Gamma1=c for a constant c 1. In fact, we show that our bound on the number of communication rounds is asymptotically optimal for the full range of values for p, for we show that just computing the "or" of n bits distributed evenly to the first O(n=h) of an arbitrary number of processors in a BSP computer requires\Omega\Gammaqui n= log(h...
A cost calculus for parallel functional programming
 Journal of Parallel and Distributed Computing
, 1995
"... Abstract Building a cost calculus for a parallel program development environment is difficult because of the many degrees of freedom available in parallel implementations, and because of difficulties with compositionality. We present a strategy for building cost calculi for skeletonbased programmin ..."
Abstract

Cited by 58 (6 self)
 Add to MetaCart
Abstract Building a cost calculus for a parallel program development environment is difficult because of the many degrees of freedom available in parallel implementations, and because of difficulties with compositionality. We present a strategy for building cost calculi for skeletonbased programming languages which can be used for derivational software development and which deals in a pragmatic way with the difficulties of composition. The approach is illustrated for the BirdMeertens theory of lists, a parallel functional language with an associated equational transformation system. Keywords: functional programming, parallel programming, program transformation, cost calculus, equational theories, architecture independence, BirdMeertens formalism.
Models of Computation  Exploring the Power of Computing
"... Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and oper ..."
Abstract

Cited by 58 (6 self)
 Add to MetaCart
Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and operating systems were under development and therefore became both the subject and basis for a great deal of theoretical work. The power of computers of this period was limited by slow processors and small amounts of memory, and thus theories (models, algorithms, and analysis) were developed to explore the efficient use of computers as well as the inherent complexity of problems. The former subject is known today as algorithms and data structures, the latter computational complexity. The focus of theoretical computer scientists in the 1960s on languages is reflected in the first textbook on the subject, Formal Languages and Their Relation to Automata by John Hopcroft and Jeffrey Ullman. This influential book led to the creation of many languagecentered theoretical computer science courses; many introductory theory courses today continue to reflect the content of this book and the interests of theoreticians of the 1960s and early 1970s. Although
Optical Communication for Pointer Based Algorithms
, 1988
"... ) Abstract In this paper we study the Local Memory PRAM. This model allows unit cost communication but assumes that the shared memory is divided into modules. This model is motivated by a consideration of potential optical computers. We show that fundamental problems such as listranking and parall ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
) Abstract In this paper we study the Local Memory PRAM. This model allows unit cost communication but assumes that the shared memory is divided into modules. This model is motivated by a consideration of potential optical computers. We show that fundamental problems such as listranking and parallel tree contraction can be implemented on this model in O(log n) time using n= log n processors. To solve the listranking problem we introduce a general asynchronous technique which has relevance to a number of problems. 1 Introduction We consider a model of parallel computation that is especially suited to pointer based computation. We motivate this model by showing that basic problems, like listranking and parallel tree contraction, can be performed in O(log n) time using only n= log n processors. We also show that any step on this model can be simulated in unit time on this model by a machine with an optical communication architecture. Thus we contend that the basic problem of listra...
On the Physical Design of PRAMs
, 1993
"... The Saarbrucken Parallel Random Access Machine (SBPRAM) is a scalable shared memory machine. At the gate level it is a reengineered version of the Fluent machine [A. G. Ranade, S. N. Bhatt and S. L. Johnson. The Fluent Abstract Machine. In Proc. 5th MIT Conference on Advanced Research in VLSI, pp. ..."
Abstract

Cited by 46 (13 self)
 Add to MetaCart
The Saarbrucken Parallel Random Access Machine (SBPRAM) is a scalable shared memory machine. At the gate level it is a reengineered version of the Fluent machine [A. G. Ranade, S. N. Bhatt and S. L. Johnson. The Fluent Abstract Machine. In Proc. 5th MIT Conference on Advanced Research in VLSI, pp. 7193 (1988)]. It uses hashing of adresses, combining and latency hiding. A prototype with 128 processors is presently being designed. In this paper we deal with several problems related to the physical design of this machine such as the total number of network chips, the geometrical arrangement of boards in the network and the VLSI realization of certain sorting arrays. We also present an extremely fast method to rehash addresses without use of external memory. Research was partially supported by DFG (SFB 124) and SIEMENS AG. A preliminary version of this paper appeared in [1]. 1 Introduction Parallel machines are nowadays classified as multicomputers and multiprocessors. In multi...