Results 1  10
of
57
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 193 (9 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Randomized routing and sorting on fixedconnection networks
 JOURNAL OF ALGORITHMS
, 1994
"... This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps ..."
Abstract

Cited by 89 (13 self)
 Add to MetaCart
This paper presents a general paradigm for the design of packet routing algorithms for fixedconnection networks. Its basis is a randomized online algorithm for scheduling any set of N packets whose paths have congestion c on any boundeddegree leveled network with depth L in O(c + L + log N) steps, using constantsize queues. In this paradigm, the design of a routing algorithm is broken into three parts: (1) showing that the underlying network can emulate a leveled network, (2) designing a path selection strategy for the leveled network, and (3) applying the scheduling algorithm. This strategy yields randomized algorithms for routing and sorting in time proportional to the diameter for meshes, butterflies, shuffleexchange graphs, multidimensional arrays, and hypercubes. It also leads to the construction of an areauniversal network: an Nnode network with area Θ(N) that can simulate any other network of area O(N) with slowdown O(log N).
Models of Computation  Exploring the Power of Computing
"... Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and oper ..."
Abstract

Cited by 59 (5 self)
 Add to MetaCart
Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and operating systems were under development and therefore became both the subject and basis for a great deal of theoretical work. The power of computers of this period was limited by slow processors and small amounts of memory, and thus theories (models, algorithms, and analysis) were developed to explore the efficient use of computers as well as the inherent complexity of problems. The former subject is known today as algorithms and data structures, the latter computational complexity. The focus of theoretical computer scientists in the 1960s on languages is reflected in the first textbook on the subject, Formal Languages and Their Relation to Automata by John Hopcroft and Jeffrey Ullman. This influential book led to the creation of many languagecentered theoretical computer science courses; many introductory theory courses today continue to reflect the content of this book and the interests of theoreticians of the 1960s and early 1970s. Although
Optical Communication for Pointer Based Algorithms
, 1988
"... ) Abstract In this paper we study the Local Memory PRAM. This model allows unit cost communication but assumes that the shared memory is divided into modules. This model is motivated by a consideration of potential optical computers. We show that fundamental problems such as listranking and parall ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
) Abstract In this paper we study the Local Memory PRAM. This model allows unit cost communication but assumes that the shared memory is divided into modules. This model is motivated by a consideration of potential optical computers. We show that fundamental problems such as listranking and parallel tree contraction can be implemented on this model in O(log n) time using n= log n processors. To solve the listranking problem we introduce a general asynchronous technique which has relevance to a number of problems. 1 Introduction We consider a model of parallel computation that is especially suited to pointer based computation. We motivate this model by showing that basic problems, like listranking and parallel tree contraction, can be performed in O(log n) time using only n= log n processors. We also show that any step on this model can be simulated in unit time on this model by a machine with an optical communication architecture. Thus we contend that the basic problem of listra...
On the Physical Design of PRAMs
, 1993
"... The Saarbrucken Parallel Random Access Machine (SBPRAM) is a scalable shared memory machine. At the gate level it is a reengineered version of the Fluent machine [A. G. Ranade, S. N. Bhatt and S. L. Johnson. The Fluent Abstract Machine. In Proc. 5th MIT Conference on Advanced Research in VLSI, pp. ..."
Abstract

Cited by 46 (13 self)
 Add to MetaCart
The Saarbrucken Parallel Random Access Machine (SBPRAM) is a scalable shared memory machine. At the gate level it is a reengineered version of the Fluent machine [A. G. Ranade, S. N. Bhatt and S. L. Johnson. The Fluent Abstract Machine. In Proc. 5th MIT Conference on Advanced Research in VLSI, pp. 7193 (1988)]. It uses hashing of adresses, combining and latency hiding. A prototype with 128 processors is presently being designed. In this paper we deal with several problems related to the physical design of this machine such as the total number of network chips, the geometrical arrangement of boards in the network and the VLSI realization of certain sorting arrays. We also present an extremely fast method to rehash addresses without use of external memory. Research was partially supported by DFG (SFB 124) and SIEMENS AG. A preliminary version of this paper appeared in [1]. 1 Introduction Parallel machines are nowadays classified as multicomputers and multiprocessors. In multi...
Horizons of Parallel Computation
 JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1993
"... This paper considers the ultimate impact of fundamental physical limitationsnotably, speed of light and device sizeon parallel computing machines. Although we fully expect an innovative and very gradual evolution to the limiting situation, we take here the provocative view of exploring the ..."
Abstract

Cited by 39 (3 self)
 Add to MetaCart
This paper considers the ultimate impact of fundamental physical limitationsnotably, speed of light and device sizeon parallel computing machines. Although we fully expect an innovative and very gradual evolution to the limiting situation, we take here the provocative view of exploring the consequences of the accomplished attainment of the physical bounds. The main result is that scalability holds only for neighborly interconnections, such as the square mesh, of boundedsize synchronous modules, presumably of the areauniversal type. We also discuss the ultimate infeasibility of latencyhiding, the violation of intuitive maximal speedups, and the emerging novel processortime tradeoffs.
CommunicationEfficient Parallel Algorithms for Distributed RandomAccess Machines
 Algorithmica
, 1988
"... This paper introduces a model for parallel computation, called the distributed randomaccess machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
This paper introduces a model for parallel computation, called the distributed randomaccess machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages through a communication network. A DRAM explicitly models the congestion of messages across cuts of the network. We introduce the notion of a conservative algorithm as one whose communication requirements at each step can be bounded by the congestion of pointers of the input data structure across cuts of a DRAM. We give a simple lemma that shows how to "shortcut" pointers in a data structure so that remote processors can communicate without causing undue congestion. We give O(lg n)step, linearprocessor, linearspace, conservative algorithms for a variety of problems on n node trees, such as computing treewalk numberings, finding the separator of a tree, and evaluating all subexpressions ...
An optical simulation of shared memory
, 1994
"... We present a workoptimal randomized algorithm for simulating a shared memory machine (pram) on an optical communication parallel computer (ocpc). The ocpc model is motivated by the potential of optical communication for parallel computation. The memory of an ocpc is divided into modules, one module ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
We present a workoptimal randomized algorithm for simulating a shared memory machine (pram) on an optical communication parallel computer (ocpc). The ocpc model is motivated by the potential of optical communication for parallel computation. The memory of an ocpc is divided into modules, one module per processor. Each memory module only services a request on a timestep if it receives exactly one memory request. Our algorithm simulates each step of an n lg lg nprocessor erew pram on an nprocessor ocpc in O(lg lg n) expected delay. (The probability that the delay is longer than this is at most n; for any constant.) The best previous simulation, due to Valiant, required (lg n) expected delay.
Accounting for memory bank contention and delay in highbandwidth multiprocessors
 In Proc. 7th ACM Symp. on Parallel Algorithms and Architectures
, 1997
"... Abstract—For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several sharedmemory multiprocessors consist of more memory banks than processors. The object ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
Abstract—For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several sharedmemory multiprocessors consist of more memory banks than processors. The object of this paper is to provide a simple model (with only a few parameters) for the design and analysis of irregular parallel algorithms that will give a reasonable characterization of performance on such machines. For this purpose, we extend Valiant’s bulksynchronous parallel (BSP) model with two parameters: a parameter for memory bank delay, the minimum time for servicing requests at a bank, and a parameter for memory bank expansion, the ratio of the number of banks to the number of processors. We call this model the (d, x)BSP. We show experimentally that the (d, x)BSP captures the impact of bank contention and delay on the CRAY C90 and J90 for irregular access patterns, without modeling machinespecific details of these machines. The model has clarified the performance characteristics of several unstructured algorithms on the CRAY C90 and J90, and allowed us to explore tradeoffs and optimizations for these algorithms. In addition to modeling individual algorithms directly, we also consider the use of the (d, x)BSP as a bridging model for emulating a very highlevel abstract model, the Parallel Random Access Machine (PRAM). We provide matching upper and lower bounds for emulating the EREW and QRQW PRAMs on the (d, x)BSP.
Parallel Algorithmic Techniques for Combinatorial Computation
 Ann. Rev. Comput. Sci
, 1988
"... this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165. ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165.