Results 1 - 10
of
46
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract
-
Cited by 164 (7 self)
- Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Randomized routing and sorting on fixed-connection networks
- Journal of Algorithms
, 1994
"... This paper presents a general paradigm for the design of packet routing algorithms for fixed-connection networks. Its basis is a randomized on-line algorithm for scheduling any set of N packets whose paths have congestion c on any bounded-degree leveled network with depth L in O(c + L + log N) steps ..."
Abstract
-
Cited by 84 (13 self)
- Add to MetaCart
This paper presents a general paradigm for the design of packet routing algorithms for fixed-connection networks. Its basis is a randomized on-line algorithm for scheduling any set of N packets whose paths have congestion c on any bounded-degree leveled network with depth L in O(c + L + log N) steps, using constant-size queues. In this paradigm, the design of a routing algorithm is broken into three parts: (1) showing that the underlying network can emulate a leveled network, (2) designing a path selection strategy for the leveled network, and (3) applying the scheduling algorithm. This strategy yields randomized algorithms for routing and sorting in time proportional to the diameter for meshes, butterflies, shuffle-exchange graphs, multidimensional arrays, and hypercubes. It also leads to the construction of an area-universal network: an N-node network with area Θ(N) that can simulate any other network of area O(N) with slowdown O(log N).
Optical Communication for Pointer Based Algorithms
, 1988
"... ) Abstract In this paper we study the Local Memory P-RAM. This model allows unit cost communication but assumes that the shared memory is divided into modules. This model is motivated by a consideration of potential optical computers. We show that fundamental problems such as list-ranking and parall ..."
Abstract
-
Cited by 51 (1 self)
- Add to MetaCart
) Abstract In this paper we study the Local Memory P-RAM. This model allows unit cost communication but assumes that the shared memory is divided into modules. This model is motivated by a consideration of potential optical computers. We show that fundamental problems such as list-ranking and parallel tree contraction can be implemented on this model in O(log n) time using n= log n processors. To solve the list-ranking problem we introduce a general asynchronous technique which has relevance to a number of problems. 1 Introduction We consider a model of parallel computation that is especially suited to pointer based computation. We motivate this model by showing that basic problems, like list-ranking and parallel tree contraction, can be performed in O(log n) time using only n= log n processors. We also show that any step on this model can be simulated in unit time on this model by a machine with an optical communication architecture. Thus we contend that the basic problem of list-ra...
On the Physical Design of PRAMs
, 1993
"... The Saarbrucken Parallel Random Access Machine (SB-PRAM) is a scalable shared memory machine. At the gate level it is a re-engineered version of the Fluent machine [A. G. Ranade, S. N. Bhatt and S. L. Johnson. The Fluent Abstract Machine. In Proc. 5th MIT Conference on Advanced Research in VLSI, pp. ..."
Abstract
-
Cited by 46 (13 self)
- Add to MetaCart
The Saarbrucken Parallel Random Access Machine (SB-PRAM) is a scalable shared memory machine. At the gate level it is a re-engineered version of the Fluent machine [A. G. Ranade, S. N. Bhatt and S. L. Johnson. The Fluent Abstract Machine. In Proc. 5th MIT Conference on Advanced Research in VLSI, pp. 71--93 (1988)]. It uses hashing of adresses, combining and latency hiding. A prototype with 128 processors is presently being designed. In this paper we deal with several problems related to the physical design of this machine such as the total number of network chips, the geometrical arrangement of boards in the network and the VLSI realization of certain sorting arrays. We also present an extremely fast method to rehash addresses without use of external memory. Research was partially supported by DFG (SFB 124) and SIEMENS AG. A preliminary version of this paper appeared in [1]. 1 Introduction Parallel machines are nowadays classified as multi-computers and multi-processors. In multi-...
Models of Computation -- Exploring the Power of Computing
"... Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and oper ..."
Abstract
-
Cited by 46 (3 self)
- Add to MetaCart
Theoretical computer science treats any computational subject for which a good model can be created. Research on formal models of computation was initiated in the 1930s and 1940s by Turing, Post, Kleene, Church, and others. In the 1950s and 1960s programming languages, language translators, and operating systems were under development and therefore became both the subject and basis for a great deal of theoretical work. The power of computers of this period was limited by slow processors and small amounts of memory, and thus theories (models, algorithms, and analysis) were developed to explore the efficient use of computers as well as the inherent complexity of problems. The former subject is known today as algorithms and data structures, the latter computational complexity. The focus of theoretical computer scientists in the 1960s on languages is reflected in the first textbook on the subject, Formal Languages and Their Relation to Automata by John Hopcroft and Jeffrey Ullman. This influential book led to the creation of many languagecentered theoretical computer science courses; many introductory theory courses today continue to reflect the content of this book and the interests of theoreticians of the 1960s and early 1970s. Although
Horizons of Parallel Computation
- JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
, 1993
"... This paper considers the ultimate impact of fundamental physical limitations---notably, speed of light and device size---on parallel computing machines. Although we fully expect an innovative and very gradual evolution to the limiting situation, we take here the provocative view of exploring the ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
This paper considers the ultimate impact of fundamental physical limitations---notably, speed of light and device size---on parallel computing machines. Although we fully expect an innovative and very gradual evolution to the limiting situation, we take here the provocative view of exploring the consequences of the accomplished attainment of the physical bounds. The main result is that scalability holds only for neighborly interconnections, such as the square mesh, of bounded-size synchronous modules, presumably of the area-universal type. We also discuss the ultimate infeasibility of latencyhiding, the violation of intuitive maximal speedups, and the emerging novel processor-time tradeoffs.
Communication-Efficient Parallel Algorithms for Distributed Random-Access Machines
- Algorithmica
, 1988
"... This paper introduces a model for parallel computation, called the distributed random-access machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages ..."
Abstract
-
Cited by 34 (1 self)
- Add to MetaCart
This paper introduces a model for parallel computation, called the distributed random-access machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages through a communication network. A DRAM explicitly models the congestion of messages across cuts of the network. We introduce the notion of a conservative algorithm as one whose communication requirements at each step can be bounded by the congestion of pointers of the input data structure across cuts of a DRAM. We give a simple lemma that shows how to "shortcut" pointers in a data structure so that remote processors can communicate without causing undue congestion. We give O(lg n)-step, linear-processor, linear-space, conservative algorithms for a variety of problems on n- node trees, such as computing treewalk numberings, finding the separator of a tree, and evaluating all subexpressions ...
Accounting for memory bank contention and delay in high-bandwidth multiprocessors
- In Proc. 7th ACM Symp. on Parallel Algorithms and Architectures
, 1997
"... Abstract—For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several shared-memory multiprocessors consist of more memory banks than processors. The object ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
Abstract—For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several shared-memory multiprocessors consist of more memory banks than processors. The object of this paper is to provide a simple model (with only a few parameters) for the design and analysis of irregular parallel algorithms that will give a reasonable characterization of performance on such machines. For this purpose, we extend Valiant’s bulk-synchronous parallel (BSP) model with two parameters: a parameter for memory bank delay, the minimum time for servicing requests at a bank, and a parameter for memory bank expansion, the ratio of the number of banks to the number of processors. We call this model the (d, x)-BSP. We show experimentally that the (d, x)-BSP captures the impact of bank contention and delay on the CRAY C90 and J90 for irregular access patterns, without modeling machine-specific details of these machines. The model has clarified the performance characteristics of several unstructured algorithms on the CRAY C90 and J90, and allowed us to explore tradeoffs and optimizations for these algorithms. In addition to modeling individual algorithms directly, we also consider the use of the (d, x)-BSP as a bridging model for emulating a very high-level abstract model, the Parallel Random Access Machine (PRAM). We provide matching upper and lower bounds for emulating the EREW and QRQW PRAMs on the (d, x)-BSP.
Parallel Algorithmic Techniques for Combinatorial Computation
- Ann. Rev. Comput. Sci
, 1988
"... this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR-85-11713, CCR-86-05353, and CCR-88-14977, and by DARPA contract N00039-84-C-0165. ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR-85-11713, CCR-86-05353, and CCR-88-14977, and by DARPA contract N00039-84-C-0165.
On Parallel Hashing and Integer Sorting
, 1991
"... The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The al ..."
Abstract
-
Cited by 24 (9 self)
- Add to MetaCart
The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The algorithm is parallelizable. The resulting parallel algorithm achieves optimal speed up. Some features of the algorithm make us believe that it is relevant for practical applications. A result of independent interest is a parallel hashing technique. The expected construction time is logarithmic using an optimal number of processors, and searching for a value takes O(1) time in the worst case. This technique enables drastic reduction of space requirements for the price of using randomness. Applicability of the technique is demonstrated for the parallel sorting algorithm, and for some parallel string matching algorithms. The parallel sorting algorithm is designed for a strong and non standard mo...

