Results 1  10
of
41
Models and Languages for Parallel Computation
 ACM COMPUTING SURVEYS
, 1998
"... We survey parallel programming models and languages using 6 criteria [:] should be easy to program, have a software development methodology, be architectureindependent, be easy to understand, guranatee performance, and provide info about the cost of programs. ... We consider programming models in ..."
Abstract

Cited by 134 (4 self)
 Add to MetaCart
We survey parallel programming models and languages using 6 criteria [:] should be easy to program, have a software development methodology, be architectureindependent, be easy to understand, guranatee performance, and provide info about the cost of programs. ... We consider programming models in 6 categories, depending on the level of abstraction they provide.
Fast Parallel Algorithm for the Maximal Independent Set Problem
 Proc. 16th Annual ACM Symposium on Theory Of Computing
, 1984
"... Abstract. A parallel algorithm is presented that accepts as input a graph G and produces a maximal independent set of vertices in G. On a PRAM without the concurrent write or concurrent read features, the algorithm executes in G((10gn)~) time and uses 0((n/(logn))3) processors, where n is the numbe ..."
Abstract

Cited by 78 (1 self)
 Add to MetaCart
Abstract. A parallel algorithm is presented that accepts as input a graph G and produces a maximal independent set of vertices in G. On a PRAM without the concurrent write or concurrent read features, the algorithm executes in G((10gn)~) time and uses 0((n/(logn))3) processors, where n is the number of vertices in G. The algorithm has several novel features that may find other applications. These include the use of balanced incomplete block designs to replace random sampling by deterministic sampling, and the use of a “dynamic pigeonhole principle ” that generalizes the conventional pigeonhole principle.
An overview of computational complexity
 Communications of the ACM
, 1983
"... foremost recognition of technical contributions to the computing community. The citation of Cook's achievements noted that "Dr. Cook has advanced our understanding of the complexity of computation in a significant and profound way. His seminal paper, The Complexity of Theorem Proving Procedures ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
foremost recognition of technical contributions to the computing community. The citation of Cook's achievements noted that "Dr. Cook has advanced our understanding of the complexity of computation in a significant and profound way. His seminal paper, The Complexity of Theorem Proving Procedures, presented at the 1971 ACM SIGACT Symposium on the Theory of Computing, laid the foundations for the theory of NPcompleteness. The ensuing exploration of the boundaries and nature of the NPcomplete class of problems has been one of the most active and important research activities in computer science for the last decade. Cook is well known for his influential results in fundamental areas of computer science. He has made significant contributions to complexity theory, to timespace tradeoffs in computation, and to logics for programming languages. His work is characterized by elegance and insights and has illuminated the very nature of computation." During 19701979, Cook did extensive work under grants from the
Efficient Deterministic and Probabilistic Simulations of PRAMs on Linear Arrays with Reconfigurable Pipelined Bus Systems
 Journal of Supercomputing
, 2000
"... . In this paper, we present deterministic and probabilistic methods for simulating PRAM computations on linear arrays with reconfigurable pipelined bus systems (LARPBS). The following results are established in this paper. (1) Each step of a pprocessor PRAM with m = O#p# shared memory cells can b ..."
Abstract

Cited by 15 (11 self)
 Add to MetaCart
. In this paper, we present deterministic and probabilistic methods for simulating PRAM computations on linear arrays with reconfigurable pipelined bus systems (LARPBS). The following results are established in this paper. (1) Each step of a pprocessor PRAM with m = O#p# shared memory cells can be simulated by a pprocessors LARPBS in O#log p# time, where the constant in the bigO notation is small. (2) Each step of a pprocessor PRAM with m = ##p# shared memory cells can be simulated by a pprocessors LARPBS in O#log m# time. (3) Each step of a pprocessor PRAM can be simulated by a pprocessor LARPBS in O#log p# time with probability larger than 1  1/p c for all c>0. (4) As an interesting byproduct, we show that a pprocessor LARPBS can sort p items in O#log p# time, with a small constant hidden in the bigO notation. Our results indicate that an LARPBS can simulate a PRAM very efficiently. Keywords: Concurrent read, concurrent write, deterministic simulation, linear array...
Program Development and Performance Prediction on BSP Machines Using Opal
, 1994
"... Machine. This uses combining networks on a butterfly topology with a hashed address space to try and hide the network latency. [ Abolhassan et al., 1991 ] analyses Ranade's approach in a quantitative way by giving cost models for implementing various parts of the PRAM machine. This is then used to d ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Machine. This uses combining networks on a butterfly topology with a hashed address space to try and hide the network latency. [ Abolhassan et al., 1991 ] analyses Ranade's approach in a quantitative way by giving cost models for implementing various parts of the PRAM machine. This is then used to demonstrate an improvement on Ranade's Fluent machine using multiple butterflies and parallel slackness. It is then shown that the proposed improved Fluent machine would have a similar price / performance ratio of conventional distributed memory architectures. Other attempts at realising the PRAM model involves it's simulation on conventional distributed memory architectures. This method usually involves hashing the address space of the PRAM across the distributed memory of the machine and replication of variables [ Mehlhorn and Vishkin, 1984 ] , or using multiple hash functions [ Abolhassan et al., 1991 ] . 2.2 BSP A Bulk Synchronous Parallel machine consists of a number of processor memo...
Concurrent Heaps on the BSP Model
, 1996
"... In this paper we present a new randomized selection algorithm on the BulkSynchronous Parallel (BSP) model of computation along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our algorithms improve previous results upon both the ..."
Abstract

Cited by 11 (11 self)
 Add to MetaCart
In this paper we present a new randomized selection algorithm on the BulkSynchronous Parallel (BSP) model of computation along with an application of this algorithm to dynamic data structures, namely Parallel Priority Queues (PPQs). We show that our algorithms improve previous results upon both the communication requirements and the amount of parallel slack required to achieve optimal performance. We also establish that optimality to within small multiplicative constant factors can be achieved for a wide range of parallel machines. While these algorithms are fairly simple themselves, descriptions of their performance in terms of the BSP parameters is somewhat involved. The main reward of quantifying these complications is that it allows transportable software to be written for parallel machines that fit the model. We also present experimental results for the selection algorithm that reinforce our claims.
Clumps: A Candidate Model Of Efficient, General Purpose Parallel Computation
, 1994
"... A new model of parallel computation is proposed, CLUMPS (Campbell's Lenient, Unified Model of Parallel Systems). This is composed of an abstract machine with an associated cost model, and aims to be more portable, reflective of costs, expressible and encouraging of more efficient implementations of ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
A new model of parallel computation is proposed, CLUMPS (Campbell's Lenient, Unified Model of Parallel Systems). This is composed of an abstract machine with an associated cost model, and aims to be more portable, reflective of costs, expressible and encouraging of more efficient implementations of algorithms than other existing models. It is shown that each basic parallel architecture class can congruently perform each other's computations, but the congruent simulation of each other's communication is not generally possible (where for a simulation to be congruent the simulation costs on the target architecture are asymptotically equivalent to the implementation costs on the native architectures). This is reflected in the CLUMPS abstract machine through its flexibility in terms of program control and memory access. The congruence requirement is relaxed so that though strict congruence may not be achieved according to the above definition, communication costs are reflectively accounted ...
Towards a Scalable Parallel Object Database  The Bulk Synchronous Parallel Approach
, 1996
"... Parallel computers have been successfully deployed in many scientific and numerical application areas, although their use in nonnumerical and database applications has been scarce. In this report, we first survey the architectural advancements beginning to make generalpurpose parallel computing co ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
Parallel computers have been successfully deployed in many scientific and numerical application areas, although their use in nonnumerical and database applications has been scarce. In this report, we first survey the architectural advancements beginning to make generalpurpose parallel computing costeffective, the requirements for nonnumerical (or symbolic) applications, and the previous attempts to develop parallel databases. The central theme of the Bulk Synchronous Parallel model is to provide a high level abstraction of parallel computing hardware whilst providing a realisation of a parallel programming model that enables architecture independent programs to deliver scalable performance on diverse hardware platforms. Therefore, the primary objective of this report is to investigate the feasibility of developing a portable, scalable, parallel object database, based on the Bulk Synchronous Parallel model of computation. In particular, we devise a way of providing highlevel abstra...
Optimum Binary Search Trees On The Hierarchical Memory Model
, 2001
"... The Hierarchical Memory Model (HMM) of computation is similar to the standard Random Access Machine (RAM) model except that the HMM has a nonuniform memory organized in a hierarchy of levels numbered 1 through h. The cost of accessing a memory location increases with the level number, and accesses ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
The Hierarchical Memory Model (HMM) of computation is similar to the standard Random Access Machine (RAM) model except that the HMM has a nonuniform memory organized in a hierarchy of levels numbered 1 through h. The cost of accessing a memory location increases with the level number, and accesses to memory locations belonging to the same level cost the same. Formally, the cost of a single access to the memory location at address a is given by (a), where : N ! N is the memory cost function, and the h distinct values of model the different levels of the memory hierarchy. We study the problem of constructing and storing a binary search tree (BST) of minimum cost, over a set of keys, with probabilities for successful and unsuccessful searches, on the HMM with an arbitrary number of memory levels, and for the special case h = 2. While the problem of constructing optimum binary search trees has been well studied for the standard RAM model, the additional parameter for the HMM inc...
Automatic Methods for Hiding Latency in High Bandwidth Networks (Extended Abstract)
, 1996
"... In this paper we describe methods for mitigating the degradation in performance caused by high latencies in parallel and distributed networks. Our approach is similar in spirit to the "complementary slackness" technique for latency hiding but has the advantage that the slackness does not need to be ..."
Abstract

Cited by 8 (2 self)
 Add to MetaCart
In this paper we describe methods for mitigating the degradation in performance caused by high latencies in parallel and distributed networks. Our approach is similar in spirit to the "complementary slackness" technique for latency hiding but has the advantage that the slackness does not need to be provided by the programmer and that large slowdowns are not needed in order to hide the latency. For example, given any algorithm that runs in T steps on an nnode ring with unit link delays, we show how to run the algorithm in O(T ) steps on any nnode boundeddegree connected network with average link delay O(1). This is a significant improvement over prior approaches to latency hiding, which require slowdowns proportional to the maximum link delay (which can be quite large in comparison to the average delay). In the case when the network has average link delay dave , our simulation runs in O( p daveT ) ...