Results 1  10
of
52
Programming Parallel Algorithms
, 1996
"... In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a th ..."
Abstract

Cited by 193 (9 self)
 Add to MetaCart
In the past 20 years there has been treftlendous progress in developing and analyzing parallel algorithftls. Researchers have developed efficient parallel algorithms to solve most problems for which efficient sequential solutions are known. Although some ofthese algorithms are efficient only in a theoretical framework, many are quite efficient in practice or have key ideas that have been used in efficient implementations. This research on parallel algorithms has not only improved our general understanding ofparallelism but in several cases has led to improvements in sequential algorithms. Unf:ortunately there has been less success in developing good languages f:or prograftlftling parallel algorithftls, particularly languages that are well suited for teaching and prototyping algorithms. There has been a large gap between languages
Optical Communication for Pointer Based Algorithms
, 1988
"... ) Abstract In this paper we study the Local Memory PRAM. This model allows unit cost communication but assumes that the shared memory is divided into modules. This model is motivated by a consideration of potential optical computers. We show that fundamental problems such as listranking and parall ..."
Abstract

Cited by 54 (1 self)
 Add to MetaCart
) Abstract In this paper we study the Local Memory PRAM. This model allows unit cost communication but assumes that the shared memory is divided into modules. This model is motivated by a consideration of potential optical computers. We show that fundamental problems such as listranking and parallel tree contraction can be implemented on this model in O(log n) time using n= log n processors. To solve the listranking problem we introduce a general asynchronous technique which has relevance to a number of problems. 1 Introduction We consider a model of parallel computation that is especially suited to pointer based computation. We motivate this model by showing that basic problems, like listranking and parallel tree contraction, can be performed in O(log n) time using only n= log n processors. We also show that any step on this model can be simulated in unit time on this model by a machine with an optical communication architecture. Thus we contend that the basic problem of listra...
Are WaitFree Algorithms Fast?
, 1991
"... The time complexity of waitfree algorithms in "normal" executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any waitfree algorithm that achieves approximate agreement among n processes i ..."
Abstract

Cited by 40 (11 self)
 Add to MetaCart
The time complexity of waitfree algorithms in "normal" executions, where no failures occur and processes operate at approximately the same speed, is considered. A lower bound of log n on the time complexity of any waitfree algorithm that achieves approximate agreement among n processes is proved. In contrast, there exists a nonwaitfree algorithm that solves this problem in constant time. This implies an (log n) time separation between the waitfree and nonwaitfree computation models. On the positive side, we present an O(log n) time waitfree approximate agreement algorithm; the complexity of this algorithm is within a small constant of the lower bound.
CommunicationEfficient Parallel Algorithms for Distributed RandomAccess Machines
 Algorithmica
, 1988
"... This paper introduces a model for parallel computation, called the distributed randomaccess machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages ..."
Abstract

Cited by 38 (2 self)
 Add to MetaCart
This paper introduces a model for parallel computation, called the distributed randomaccess machine (DRAM), in which the communication requirements of parallel algorithms can be evaluated. A DRAM is an abstraction of a parallel computer in which memory accesses are implemented by routing messages through a communication network. A DRAM explicitly models the congestion of messages across cuts of the network. We introduce the notion of a conservative algorithm as one whose communication requirements at each step can be bounded by the congestion of pointers of the input data structure across cuts of a DRAM. We give a simple lemma that shows how to "shortcut" pointers in a data structure so that remote processors can communicate without causing undue congestion. We give O(lg n)step, linearprocessor, linearspace, conservative algorithms for a variety of problems on n node trees, such as computing treewalk numberings, finding the separator of a tree, and evaluating all subexpressions ...
A New Parallel Algorithm For The Maximal Independent Set Problem
, 1989
"... A new parallel algorithm for the maximal independent set problem is constructed. It runs in O(log 4 n) time when implemented on a linear number of EREWprocessors. This is the first deterministic algorithm for the maximal independent set problem (MIS) whose running time is polylogarithmic and whose ..."
Abstract

Cited by 35 (2 self)
 Add to MetaCart
A new parallel algorithm for the maximal independent set problem is constructed. It runs in O(log 4 n) time when implemented on a linear number of EREWprocessors. This is the first deterministic algorithm for the maximal independent set problem (MIS) whose running time is polylogarithmic and whose processortime product is optimal up to a polylogarithmic factor.
An Approach to Scalability Study of Shared Memory Parallel Systems
, 1994
"... The overheads in a parallel system that limit its scalability need to be identified and separated in order to enable parallel algorithm design and the development of parallel machines. Such overheads may be broadly classified into two components. The first one is intrinsic to the algorithm and arise ..."
Abstract

Cited by 33 (18 self)
 Add to MetaCart
The overheads in a parallel system that limit its scalability need to be identified and separated in order to enable parallel algorithm design and the development of parallel machines. Such overheads may be broadly classified into two components. The first one is intrinsic to the algorithm and arises due to factors such as the workimbalance and the serial fraction. The second one is due to the interaction between the algorithm and the architecture and arises due to latency and contention in the network. A topdown approach to scalability study of shared memory parallel systems is proposed in this research. We define the notion of overhead functions associated with the different algorithmic and architectural characteristics to quantify the scalability of parallel systems; we isolate the algorithmic overhead and the overheads due to network latency and contention from the overall execution time of an application; we design and implement an executiondriven simulation platform that incorporates these methods for quantifying the overhead functions; and we use this simulator to study the scalability characteristics of five applications on shared memory platforms with different communication topologies.
Parallel Algorithmic Techniques for Combinatorial Computation
 Ann. Rev. Comput. Sci
, 1988
"... this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165. ..."
Abstract

Cited by 29 (3 self)
 Add to MetaCart
this paper and supplied many helpful comments. This research was supported in part by NSF grants DCR8511713, CCR8605353, and CCR8814977, and by DARPA contract N0003984C0165.
Oblivious algorithms for multicores and network of processors
, 2009
"... We address the design of parallel algorithms that are oblivious to machine parameters for two dominant machine configurations: the chip multiprocessor (or multicore) and the network of processors. First, and of independent interest, we propose HM, a hierarchical multilevel caching model for multic ..."
Abstract

Cited by 22 (6 self)
 Add to MetaCart
We address the design of parallel algorithms that are oblivious to machine parameters for two dominant machine configurations: the chip multiprocessor (or multicore) and the network of processors. First, and of independent interest, we propose HM, a hierarchical multilevel caching model for multicores, and we propose a multicoreoblivious approach to algorithms and schedulers for HM. We instantiate this approach with provably efficient multicoreoblivious algorithms for matrix and prefix sum computations, FFT, the Gaussian Elimination paradigm (which represents an important class of computations including FloydWarshallâ€™s allpairs shortest paths, Gaussian Elimination and LU decomposition without pivoting), sorting, list ranking, Euler tours and connected components. We then use the network oblivious framework proposed earlier as an oblivious framework for a network of processors, and we present provably efficient networkoblivious algorithms for sorting, the Gaussian Elimination paradigm, list ranking, Euler tours and connected components. Many of these networkoblivious algorithms perform efficiently also when executed on the DecomposableBSP.
A Simulationbased Scalability Study of Parallel Systems
 Journal of Parallel and Distributed Computing
, 1993
"... Scalability studies of parallel architectures have used scalar metrics to evaluate their performance. Very often, it is difficult to glean the sources of inefficiency resulting from the mismatch between the algorithmic and architectural requirements using such scalar metrics. Lowlevel performance s ..."
Abstract

Cited by 21 (15 self)
 Add to MetaCart
Scalability studies of parallel architectures have used scalar metrics to evaluate their performance. Very often, it is difficult to glean the sources of inefficiency resulting from the mismatch between the algorithmic and architectural requirements using such scalar metrics. Lowlevel performance studies of the hardware are also inadequate for predicting the scalability of the machine on real applications. We propose a topdown approach to scalability study that alleviates some of these problems. We characterize applications in terms of the frequently occurring kernels, and their interaction with the architecture in terms of overheads in the parallel system. An overhead function is associated with the algorithmic characteristics as well as their interaction with the architectural features. We present a simulation platform called SPASM (Simulator for Parallel Architectural Scalability Measurements) that quantifies these overhead functions. SPASM separates the algorithmic overhead into ...