Results 1  10
of
14
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 193 (9 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
NESL: A nested dataparallel language (version 2.6
, 1993
"... The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. Keywords: Dataparallel, parallel algorithms, supe ..."
Abstract

Cited by 95 (7 self)
 Add to MetaCart
The views and conclusions contained in this document are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Wright Laboratory or the U. S. Government. Keywords: Dataparallel, parallel algorithms, supercomputers, nested parallelism, This report describes Nesl, a stronglytyped, applicative, dataparallel language. Nesl is intended to be used as a portable interface for programming a variety of parallel and vector computers, and as a basis for teaching parallel algorithms. Parallelism is supplied through a simple set of dataparallel constructs based on sequences, including a mechanism for applying any function over the elements of a sequence in parallel and a rich set of parallel functions that manipulate sequences. Nesl fully supports nested sequences and nested parallelismâ€”the ability to take a parallel function and apply it over multiple instances in parallel. Nested parallelism is important for implementing algorithms with irregular nested loops (where the inner loop lengths depend on the outer iteration) and for divideandconquer algorithms. Nesl also provides a performance model for calculating the asymptotic performance of a program on
A Fast and Simple Algorithm for the Maximum Flow Problem
 OPERATIONS RESEARCH
, 1989
"... We present a simple sequential algorithm for the maximum flow problem on a network with n nodes, m arcs, and integer arc capacities bounded by U. Under the practical assumption that U is polynomially bounded in n, our algorithm runs in time O(nm + n 2 log n). This result improves the previous best b ..."
Abstract

Cited by 34 (5 self)
 Add to MetaCart
We present a simple sequential algorithm for the maximum flow problem on a network with n nodes, m arcs, and integer arc capacities bounded by U. Under the practical assumption that U is polynomially bounded in n, our algorithm runs in time O(nm + n 2 log n). This result improves the previous best bound of O(nm log(n 2 /m)), obtained by Goldberg and Taran, by a factor of log n for networks that are both nonsparse and nondense without using any complex data structures. We also describe a parallel implementation of the algorithm that runs in O(n'log U log p) time in the PRAM model with EREW and uses only p processors where p = [m/n
SublinearTime Parallel Algorithms for Matching and Related Problems
, 1988
"... This paper presents the first sublineartime deterministic parallel algorithms for bipartite matching and several related problems, including maximal nodedisjoint paths, depthfirst search, and flows in zeroone networks. Our results are based on a better understanding of the combinatorial struc ..."
Abstract

Cited by 33 (6 self)
 Add to MetaCart
This paper presents the first sublineartime deterministic parallel algorithms for bipartite matching and several related problems, including maximal nodedisjoint paths, depthfirst search, and flows in zeroone networks. Our results are based on a better understanding of the combinatorial structure of the above problems, which leads to new algorithmic techniques. In particular, we show how to use maximal matching to extend, in parallel, a current set of nodedisjoint paths and how to take advantage of the parallelism that arises when a large number of nodes are "active" during an execution of a pushrelabel network flow algorithm. We also show how to apply our techniques to design parallel algorithms for the weighted versions of the above problems. In particular, we present sublineartime deterministic parallel algorithms for finding a minimumweight bipartite matching and for finding a minimumcost flow in a network with zeroone capacities, if the weights are polynomially ...
On Parallel Hashing and Integer Sorting
, 1991
"... The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The al ..."
Abstract

Cited by 25 (9 self)
 Add to MetaCart
The problem of sorting n integers from a restricted range [1::m], where m is superpolynomial in n, is considered. An o(n log n) randomized algorithm is given. Our algorithm takes O(n log log m) expected time and O(n) space. (Thus, for m = n polylog(n) we have an O(n log log n) algorithm.) The algorithm is parallelizable. The resulting parallel algorithm achieves optimal speed up. Some features of the algorithm make us believe that it is relevant for practical applications. A result of independent interest is a parallel hashing technique. The expected construction time is logarithmic using an optimal number of processors, and searching for a value takes O(1) time in the worst case. This technique enables drastic reduction of space requirements for the price of using randomness. Applicability of the technique is demonstrated for the parallel sorting algorithm, and for some parallel string matching algorithms. The parallel sorting algorithm is designed for a strong and non standard mo...
Tradeoffs Between Communication Throughput and Parallel Time
, 1994
"... We study the effect of limited communication throughput on parallel computation in a setting where the number of processors is much smaller than the length of the input. Our model has p processors that communicate through a shared memory of size m. The input has size n, and can be read directly by a ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
We study the effect of limited communication throughput on parallel computation in a setting where the number of processors is much smaller than the length of the input. Our model has p processors that communicate through a shared memory of size m. The input has size n, and can be read directly by all the processors. We will be primarily interested in studying cases where n AE p AE m. As a test case we study the list reversal problem. For this problem we prove a time lower bound of \Omega\Gamma n p mp ). (A similar lower bound holds also for the problems of sorting, finding all unique elements, convolution, and universal hashing.) This result shows that limiting the communication (i.e., small m) has significant effect on parallel computation. We show an almost matching upper bound of O( n p mp log O(1) n). The upper bound requires the development of a few interesting techniques which can alleviate the limited communication in some
Lower bounds in a parallel model without bit operations
 TO APPEAR IN THE SIAM JOURNAL ON COMPUTING
, 1997
"... ..."
On Implementing Graph Cuts on CUDA
"... has enabled graphics processors to be explicitly programmed as generalpurpose sharedmemory multicore processors with a high level of parallelism. In this paper, we present our preliminary results of implementing the Graph Cuts algorithm on CUDA. Our primary focus is on implementing Graph Cuts on ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
has enabled graphics processors to be explicitly programmed as generalpurpose sharedmemory multicore processors with a high level of parallelism. In this paper, we present our preliminary results of implementing the Graph Cuts algorithm on CUDA. Our primary focus is on implementing Graph Cuts on grid graphs, which are extensively used in imaging applications. We first explain our implementation of breadth first search (BFS) graph traversal on CUDA, which is extensively used in our Graph Cuts implementation. We then present a basic implementation of Graph Cuts that succeeds to achieve absolute and relative speedups when used for foregroundbackground segmentation on synthesized images. Finally, we introduce two optimizations that utilize the special structure of grid graphs. The first one is lockstep BFS, which is used to reduce the overhead of BFS traversals. The second is cache emulation, which is a general technique to regularize memory access patterns and hence enhance memory access throughput. We experimentally show how each of the two optimizations can enhance the performance of the basic implementation on the image segmentation application. I.
Structural Parallel Algorithmics
, 1991
"... The first half of the paper is a general introduction which emphasizes the central role that the PRAM model of parallel computation plays in algorithmic studies for parallel computers. Some of the collective knowledgebase on nonnumerical parallel algorithms can be characterized in a structural way ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
The first half of the paper is a general introduction which emphasizes the central role that the PRAM model of parallel computation plays in algorithmic studies for parallel computers. Some of the collective knowledgebase on nonnumerical parallel algorithms can be characterized in a structural way. Each structure relates a few problems and technique to one another from the basic to the more involved. The second half of the paper provides a bird'seye view of such structures for: (1) list, tree and graph parallel algorithms; (2) very fast deterministic parallel algorithms; and (3) very fast randomized parallel algorithms. 1 Introduction Parallelism is a concern that is missing from "traditional" algorithmic design. Unfortunately, it turns out that most efficient serial algorithms become rather inefficient parallel algorithms. The experience is that the design of parallel algorithms requires new paradigms and techniques, offering an exciting intellectual challenge. We note that it had...
The Maximum Flow Problem: A RealTime Approach
 Proceedings of the Thirteenth Conference on Parallel and Distributed Computing and Systems
, 2001
"... The dynamic version of the maximum flow problem allows the graph underlying the flow network to change over time. The graph receives corrections to its structure or capacities and consequently the value of the maximum flow is modified. These corrections arrive in real time. In this paper, parallel a ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
The dynamic version of the maximum flow problem allows the graph underlying the flow network to change over time. The graph receives corrections to its structure or capacities and consequently the value of the maximum flow is modified. These corrections arrive in real time. In this paper, parallel and sequential solutions to the realtime maximum flow problem are developed on the Reconfigurable Multiple Bus Machine (RMBM) model and on the Random Access Machine (RAM) model, respectively. The parallel solution successfully meets the deadlines imposed in real time, while the sequential one fails to do so. The two solutions are then applied to a realtime process scheduler, an extension of Stone's static twoprocessor allocation problem. The scheduler allows processes to be created and destroyed, the amount of communication between two processes to change with time, and so on. The parallel algorithm is always able to compute the optimal schedule, while the solution obtained sequentially is only an approximation. The improvement provided by the parallel approach over the sequential one is superlinear in the number of processors used by the parallel model. Key words and phrases: maximum flow, parallelism, realtime computation, module allocation. 1