Results 1  10
of
191
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 198 (8 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Decentralized Information Processing in the Theory of Organizations
, 1999
"... Bounded rationality has been an important theme throughout the history of the theory of organizations, because it explains the sharing of information processing tasks and the existence of administrative staffs that coordinate large organizations. This article broadly surveys the theories of organiza ..."
Abstract

Cited by 46 (2 self)
 Add to MetaCart
Bounded rationality has been an important theme throughout the history of the theory of organizations, because it explains the sharing of information processing tasks and the existence of administrative staffs that coordinate large organizations. This article broadly surveys the theories of organizations that model such bounded rationality and decentralized information processing.
Can a SharedMemory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
Abstract

Cited by 44 (12 self)
 Add to MetaCart
There has been a great deal of interest recently in the development of generalpurpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the sharedmemory abstraction as an easytouse platform, the bandwidth limitations of current machines have diverted much attention to messagepassing and distributedmemory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a sharedmemory model can serve as an effective bridging model for parallel computation. In particular, can a sharedmemory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing SharedMemory (QSM) model, which accounts for limited communication bandwidth while still providing a simple sharedmemory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple workpreserving emulation of the QSM on both the BSP, and on a related model, the (d, x)BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios
GPUABiSort: Optimal parallel sorting on stream architectures
 IN PROCEEDINGS OF THE 20TH IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS ’06) (APR
, 2006
"... In this paper, we present a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p stream processor units, this approach achieves the optimal time complexity O((n log n)/p). While this makes our approach compe ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
In this paper, we present a novel approach for parallel sorting on stream processing architectures. It is based on adaptive bitonic sorting. For sorting n values utilizing p stream processor units, this approach achieves the optimal time complexity O((n log n)/p). While this makes our approach competitive with common sequential sorting algorithms not only from a theoretical viewpoint, it is also very fast from a practical viewpoint. This is achieved by using efficient linear stream memory accesses and by combining the optimal time approach with algorithms optimized for small input sequences. We present an implementation on modern programmable graphics hardware (GPUs). On recent GPUs, our optimal parallel sorting approach has shown to be remarkably faster than sequential sorting on the CPU, and it is also faster than previous nonoptimal sorting approaches on the GPU for sufficiently large input sequences. Because of the excellent scalability of our algorithm with the number of stream processor units p (up to n / log 2 n or even n / log n units, depending on the stream architecture), our approach profits heavily from the trend of increasing number of fragment processor units on GPUs, so that we can expect further speed improvement with upcoming GPU generations.
Efficient LowContention Parallel Algorithms
, 1996
"... The queueread, queuewrite (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model re ects the contention propert ..."
Abstract

Cited by 32 (13 self)
 Add to MetaCart
The queueread, queuewrite (qrqw) parallel random access machine (pram) model permits concurrent reading and writing to shared memory locations, but at a cost proportional to the number of readers/writers to any one memory location in a given step. The qrqw pram model re ects the contention properties of most commercially available parallel machines more accurately than either the wellstudied crcw pram or erew pram models, and can be e ciently emulated with only logarithmic slowdown on hypercubetype noncombining networks. This paper describes fast, lowcontention, workoptimal, randomized qrqw pram algorithms for the fundamental problems of load balancing, multiple compaction, generating a random permutation, parallel hashing, and distributive sorting. These logarithmic or sublogarithmic time algorithms considerably improve upon the best known erew pram algorithms for these problems, while avoiding the highcontention steps typical of crcw pram algorithms. An illustrative experiment demonstrates the performance advantage of a new qrqw random permutation algorithm when compared with the popular erew algorithm. Finally, this paper presents new randomized algorithms for integer sorting and general sorting.
Faster Algorithms for String Matching Problems: Matching the Convolution Bound
 In Proceedings of the 39th Symposium on Foundations of Computer Science
, 1998
"... In this paper we give a randomized O(n log n)time algorithm for the string matching with don't cares problem. This improves the FischerPaterson bound [10] from 1974 and answers the open problem posed (among others) by Weiner [30] and Galil [11]. Using the same technique, we give an O(n log ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
In this paper we give a randomized O(n log n)time algorithm for the string matching with don't cares problem. This improves the FischerPaterson bound [10] from 1974 and answers the open problem posed (among others) by Weiner [30] and Galil [11]. Using the same technique, we give an O(n log n)time algorithm for other problems, including subset matching and tree pattern matching [15, 21, 9, 7, 17] and (general) approximate threshold matching [28, 17]. As this bound essentially matches the complexity of computing of the Fast Fourier Transform which is the only known technique for solving problems of this type, it is likely that the algorithms are in fact optimal. Additionally, the technique used for the threshold matching problem can be applied to the online version of this problem, in which we are allowed to preprocess the text and require to process the pattern in time sublinear in the text length. This result involves an interesting variant of the KarpRabin fingerprint m...
Errorresistant Implementation of DNA Computations
 In Second Annual Meeting on DNA Based Computers
"... This paper introduces a new model of computation that employs the tools of molecular biology whose in vitro implementation is far more errorresistant than extant proposals. We describe an abstraction of the model which lends itself to natural algorithmic description, particularly for problems in ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
This paper introduces a new model of computation that employs the tools of molecular biology whose in vitro implementation is far more errorresistant than extant proposals. We describe an abstraction of the model which lends itself to natural algorithmic description, particularly for problems in the complexity class NP . In addition we describe a number of lineartime algorithms within our model, particularly for NP complete problems. We describe an in vitro realisation of the model and conclude with a discussion of future work. 1 Introduction The idea that living cells and molecular complexes can be viewed as potential machinic components dates back to the late 1950s, when Richard Feynman delivered his famous paper [4] describing "submicroscopic" computers. More recently, several papers [1, 10, 16] (also see [7, 13]) have advocated the realisation of massively parallel computation using the techniques and chemistry of molecular biology. Adleman describes how a computational...
Accounting for memory bank contention and delay in highbandwidth multiprocessors
 In Proc. 7th ACM Symp. on Parallel Algorithms and Architectures
, 1997
"... Abstract—For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several sharedmemory multiprocessors consist of more memory banks than processors. The object ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
Abstract—For years, the computation rate of processors has been much faster than the access rate of memory banks, and this divergence in speeds has been constantly increasing in recent years. As a result, several sharedmemory multiprocessors consist of more memory banks than processors. The object of this paper is to provide a simple model (with only a few parameters) for the design and analysis of irregular parallel algorithms that will give a reasonable characterization of performance on such machines. For this purpose, we extend Valiant’s bulksynchronous parallel (BSP) model with two parameters: a parameter for memory bank delay, the minimum time for servicing requests at a bank, and a parameter for memory bank expansion, the ratio of the number of banks to the number of processors. We call this model the (d, x)BSP. We show experimentally that the (d, x)BSP captures the impact of bank contention and delay on the CRAY C90 and J90 for irregular access patterns, without modeling machinespecific details of these machines. The model has clarified the performance characteristics of several unstructured algorithms on the CRAY C90 and J90, and allowed us to explore tradeoffs and optimizations for these algorithms. In addition to modeling individual algorithms directly, we also consider the use of the (d, x)BSP as a bridging model for emulating a very highlevel abstract model, the Parallel Random Access Machine (PRAM). We provide matching upper and lower bounds for emulating the EREW and QRQW PRAMs on the (d, x)BSP.
Parallel Algorithms for Hierarchical Clustering and Applications to Split Decomposition and Parity Graph Recognition
 JOURNAL OF ALGORITHMS
, 1998
"... We present efficient (parallel) algorithms for two hierarchical clustering heuristics. We point out that these heuristics can also be applied to solve some algorithmic problems in graphs. This includes split decomposition. We show that efficient parallel split decomposition induces an efficient para ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
We present efficient (parallel) algorithms for two hierarchical clustering heuristics. We point out that these heuristics can also be applied to solve some algorithmic problems in graphs. This includes split decomposition. We show that efficient parallel split decomposition induces an efficient parallel parity graph recognition algorithm. This is a consequence of the result of [7] that parity graphs are exactly those graphs that can be split decomposed into cliques and bipartite graphs.