Results 1 
9 of
9
On Universal Classes of Extremely Random Constant Time Hash Functions and Their TimeSpace Tradeoff
"... A family of functions F that map [0; n] 7! [0; n], is said to be hwise independent if any h points in [0; n] have an image, for randomly selected f 2 F , that is uniformly distributed. This paper gives both probabilistic and explicit randomized constructions of n ffl wise independent functions, ..."
Abstract

Cited by 39 (0 self)
 Add to MetaCart
A family of functions F that map [0; n] 7! [0; n], is said to be hwise independent if any h points in [0; n] have an image, for randomly selected f 2 F , that is uniformly distributed. This paper gives both probabilistic and explicit randomized constructions of n ffl wise independent functions, ffl ! 1, that can be evaluated in constant time for the standard random access model of computation. Simple extensions give comparable behavior for larger domains. As a consequence, many probabilistic algorithms can for the first time be shown to achieve their expected asymptotic performance for a feasible model of computation. This paper also establishes a tight tradeoff in the number of random seeds that must be precomputed for a random function that runs in time T and is hwise independent. Categories and Subject Descriptors: E.2 [Data Storage Representation]: Hashtable representation; F.1.2 [Modes of Computation]: Probabilistic Computation; F2.3 [Tradepffs among Computational Measures]...
An optical simulation of shared memory
, 1994
"... We present a workoptimal randomized algorithm for simulating a shared memory machine (pram) on an optical communication parallel computer (ocpc). The ocpc model is motivated by the potential of optical communication for parallel computation. The memory of an ocpc is divided into modules, one module ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
We present a workoptimal randomized algorithm for simulating a shared memory machine (pram) on an optical communication parallel computer (ocpc). The ocpc model is motivated by the potential of optical communication for parallel computation. The memory of an ocpc is divided into modules, one module per processor. Each memory module only services a request on a timestep if it receives exactly one memory request. Our algorithm simulates each step of an n lg lg nprocessor erew pram on an nprocessor ocpc in O(lg lg n) expected delay. (The probability that the delay is longer than this is at most n; for any constant.) The best previous simulation, due to Valiant, required (lg n) expected delay.
Deterministic PRAM Simulation with Constant Redundancy * (Preliminary Version)
"... Abstract: In this paper, we show that distributing the memory of a parallel computer and, thereby, decreasing its granularity allows a reduction in the redundancy required to achieve polylog simulation time for each PRAM step. Previously, realistic models of parallel computation assigned one memory ..."
Abstract
 Add to MetaCart
Abstract: In this paper, we show that distributing the memory of a parallel computer and, thereby, decreasing its granularity allows a reduction in the redundancy required to achieve polylog simulation time for each PRAM step. Previously, realistic models of parallel computation assigned one memory module to each processor and, as a result, insisted on relatively coarsegrain memory. We propose, on the other hand, a more flexible, but equally valid model of computation, the distributedmemory, boundeddegree network (DMBDN) model. This model allows the use of finegrain memory while maintaining the realism of a boundeddegree interconnection network. We describe a PRAM simulation scheme, which is admitted under the DMBDN model, that exploits the increased memory bandwidth provided by a twodimensional mesh of trees (2DMOT) network to achieve an overhead in memory redundancy lower than that required by other fast, deterministic PRAM simulations. Specifically, for a deterministic simulation of an nprocessor PRAM on a boundeddegree network, we are able to reduce the number of copies of each variable from O(logn/loglogn) to ®(1) and still simulate each PRAM step in polylog time. 1.
"Interactive Animation of Fault Tolerant Parallel Algorithms" by Scott W. ApgarInteractive Animation of Fault Tolerant Parallel Algorithms *
, 1992
"... Animation of algorithms makes understanding them intuitively easier. This paper describes the software tool Raft (Robust Animator of Fault Tolerant Algorithms). The Raft system allows the user to animate a number of parallel algorithms which achieve fault tolerant execution. In particular, we use it ..."
Abstract
 Add to MetaCart
Animation of algorithms makes understanding them intuitively easier. This paper describes the software tool Raft (Robust Animator of Fault Tolerant Algorithms). The Raft system allows the user to animate a number of parallel algorithms which achieve fault tolerant execution. In particular, we use it to illustrate the key WriteAll problem. It has an extensive userinterface which allows a choice of the number of processors, the number of elements in the WriteAll array, and the adversary to control the processor failures. The novelty of the system is that the interface allows the user to create new online adversaries as the algorithm executes. Submitted in partial fulfillment of the rquirements for the
Parallel Programming Platforms
"... The traditional logical view of a sequential computer consists of a memory connected to a processor via a datapath. All three components – processor, memory, and datapath present bottlenecks to the overall processing rate of a computer system. A number of architectural innovations over the years hav ..."
Abstract
 Add to MetaCart
(Show Context)
The traditional logical view of a sequential computer consists of a memory connected to a processor via a datapath. All three components – processor, memory, and datapath present bottlenecks to the overall processing rate of a computer system. A number of architectural innovations over the years have addressed these bottlenecks. One of the most important innovations is multiplicity – in processing units, datapaths, and memory units. This multiplicity is either entirely hidden from the programmer, as in the case of implicit parallelism, or exposed to the programmer in different forms. In this chapter, we present an overview of important architectural concepts as they relate to parallel processing. The objective is to provide sufficient detail for programmers to be able to write efficient code on a variety of platforms. We develop cost models and abstractions for quantifying the performance of various parallel algorithms, and identify bottlenecks resulting from various programming constructs. We start our discussion of parallel platforms with an overview of serial and implicitly parallel architectures. This is necessitated by the fact that it is often possible to reengineer codes to achieve significant speedups (2 × to 5 × unoptimized speed) using simple program transformations. Parallelizing suboptimal serial codes often has undesirable effects of unreliable speedups and misleading runtimes. For this reason, we advocate optimizing serial performance of codes before attempting parallelization. As we shall demonstrate through this chapter, the tasks of serial and parallel optimization often have very similar characteristics. After discussing serial and implicitly parallel architectures, we devote the rest of this chapter to organization of parallel platforms, underlying cost models for algorithms, and platform abstractions for portable algorithm design. Readers wishing to delve directly into parallel architectures may choose to skip Sections 1.1 and 1.2.
Efficient Interconnection Schemes for VLSI and Parallel Computation
, 1989
"... This thesis is primarily concerned with two problems of interconnecting components in VLSI technologies. In the first case, the goal is to construct efficient interconnection networks for generalpurpose parallel computers. The second problem is a more specialized problem in the design of VLSI chips ..."
Abstract
 Add to MetaCart
This thesis is primarily concerned with two problems of interconnecting components in VLSI technologies. In the first case, the goal is to construct efficient interconnection networks for generalpurpose parallel computers. The second problem is a more specialized problem in the design of VLSI chips, namely multilayer channel routing. In addition, a final part of this thesis provides lower bounds on the area required for VLSI implementations of finitestate machines. This thesis shows that networks based on Leiserson's fattree architecture are nearly as good as any network built in a comparable amount of physical space. It shows that these "universal" networks can efficiently simulate competing networks by means of an appropriate correspondence between network components and efficient algorithms for routing messages on the universal network. In particular, a universal network of area A can simulate competing networks with O(lg 3 A) slowdown (in bittimes), using a very simple rando...
AND
"... Abstract. The power of sharedmemory in models of parallel computation is studied, and a novel distributed data structure that eliminates the need for shared memory without significantly increasing the run time of the parallel computation is described. More specifically, it is shown how a complete n ..."
Abstract
 Add to MetaCart
Abstract. The power of sharedmemory in models of parallel computation is studied, and a novel distributed data structure that eliminates the need for shared memory without significantly increasing the run time of the parallel computation is described. More specifically, it is shown how a complete network of processors can deterministically simulate one PRAM step in O(l~gn(loglogn)~) time when both models use n processors and the size of the PRAM’s shared memory is polynomial in n. (The best previously known upper bound was the trivial O(n)). It is established that this upper bound is nearly optimal, and it is proved that an online simulation of T PRAM steps by a complete network of processors requires a ( T(logn/loglogn)) time. A simple consequence of the upper bound is that an Ultracomputer (the currently feasible generalpurpose parallel machine) can simulate one step of a PRAM (the most convenient parallel model to program) in @(log n)210g log n) steps.
8a. NAME OF UNDING / SPONSORING T8b, OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMB3ER
, 825
"... ORGANIZA"ON j(Iif applicable) ..."
(Show Context)