Results 1  10
of
17
SpaceEfficient Scheduling of Multithreaded Computations
 SIAM Journal on Computing
, 1993
"... . This paper considers the problem of scheduling dynamic parallel computations to achieve linear speedup without using significantly more space per processor than that required for a singleprocessor execution. Utilizing a new graphtheoretic model of multithreaded computation, execution efficiency ..."
Abstract

Cited by 81 (14 self)
 Add to MetaCart
. This paper considers the problem of scheduling dynamic parallel computations to achieve linear speedup without using significantly more space per processor than that required for a singleprocessor execution. Utilizing a new graphtheoretic model of multithreaded computation, execution efficiency is quantified by three important measures: T 1 is the time required for executing the computation on 1 processor, T1 is the time required by an infinite number of processors, and S 1 is the space required to execute the computation on 1 processor. A computation executed on P processors is timeefficient if the time is O(T 1 =P + T1 ), that is, it achieves linear speedup when P = O(T 1 =T1 ), and it is spaceefficient if it uses O(S 1 P ) total space, that is, the space per processor is within a constant factor of that required for a 1processor execution. The first result derived from this model shows that there exist multithreaded computations such that no execution schedule can simultan...
Towards Optimal Locality in MeshIndexings
, 1997
"... The efficiency of many data structures and algorithms relies on "localitypreserving" indexing schemes for meshes. We concentrate on the case in which the maximal distance between two mesh nodes indexed i and j shall be a slowgrowing function of ji jj. We present a new 2D indexing scheme we call H ..."
Abstract

Cited by 31 (4 self)
 Add to MetaCart
The efficiency of many data structures and algorithms relies on "localitypreserving" indexing schemes for meshes. We concentrate on the case in which the maximal distance between two mesh nodes indexed i and j shall be a slowgrowing function of ji jj. We present a new 2D indexing scheme we call Hindexing , which has superior (possibly optimal) locality in comparison with the wellknown Hilbert indexings. Hindexings form a Hamiltonian cycle and we prove that they are optimally localitypreserving among all cyclic indexings. We provide fairly tight lower bounds for indexings without any restriction. Finally, illustrated by investigations concerning 2D and 3D Hilbert indexings, we present a framework for mechanizing upper bound proofs for locality.
The EBSP Model: Incorporating General Locality and Unbalanced Communication into the BSP Model
 In Proc. EuroPar'96
, 1996
"... . The BSP model was proposed as a step towards general purpose parallel computing. This paper introduces the EBSP model that extends the BSP model in two ways. First, it provides a way to deal with unbalanced communication patterns, i.e., communication patterns in which the amount of data sent or r ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
. The BSP model was proposed as a step towards general purpose parallel computing. This paper introduces the EBSP model that extends the BSP model in two ways. First, it provides a way to deal with unbalanced communication patterns, i.e., communication patterns in which the amount of data sent or received by each processor is different. Second, it adds a notion of general locality to the BSP model where the delay of a remote memory access depends on the relative location of the processors in the interconnection network. We use our model to develop several algorithms that improve upon algorithms derived under the BSP model. 1 Introduction It has been stressed by many authors that the emergence of one or a few computational models is essential to the progress of parallel computing [9, 14], because it enables the programmer to write architectureindependent software. Such a model should strike a balance between simplicity of usage and reflectivity of existing parallel architectures. D...
Implementing the Hierarchical PRAM on the 2D Mesh: Analyses and Experiments
, 1995
"... We investigate aspects of the performance of the EREW instance of the Hierarchical PRAM (HPRAM) model, a recursively partitionable PRAM, on the 2D mesh architecture via analysis and simulation experiments. Since one of the ideas behind the HPRAM is to systematically exploit locality in order to ne ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
We investigate aspects of the performance of the EREW instance of the Hierarchical PRAM (HPRAM) model, a recursively partitionable PRAM, on the 2D mesh architecture via analysis and simulation experiments. Since one of the ideas behind the HPRAM is to systematically exploit locality in order to negate the need for expensive communication hardware and thus promote costeffective scalability, our design decisions are based on minimizing implementation costs. The Peano indexing scheme is used as a simple and natural means of allowing the dynamic, recursive partitioning of the mesh into arbitrarilysized submeshes, as required by the HPRAM. We show that for any submesh the ratio of the largest manhattan distance between two nodes of the submesh to that of the square mesh with an identical number of processors is at most 3/2, thereby demonstrating the locality preserving properties of the Peano scheme for arbitrary partitions of the mesh. We provide matching analytical and experimenta...
A Survey of Parallel Search Algorithms for Discrete Optimization Problems
 ORSA JOURNAL ON COMPUTING
, 1993
"... Discrete optimization problems (DOPs) arise in various applications such as planning, scheduling, computer aided design, robotics, game playing and constraint directed reasoning. Often, a DOP is formulated in terms of finding a (minimum cost) solution path in a graph from an initial node to a goal n ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
Discrete optimization problems (DOPs) arise in various applications such as planning, scheduling, computer aided design, robotics, game playing and constraint directed reasoning. Often, a DOP is formulated in terms of finding a (minimum cost) solution path in a graph from an initial node to a goal node and solved by graph/tree search methods. Availability of parallel computers has created substantial interest in exploring parallel formulations of these graph and tree search methods. This article provides a survey of various parallel search algorithms such as Backtracking, IDA*, A*, BranchandBound techniques and Dynamic Programming. It addresses issues related to load balancing, communication costs, scalability and the phenomenon of speedup anomalies in parallel search.
Asymptotically Optimal Randomized Tree Embedding in Static Networks
, 1998
"... The problem of dynamic tree embedding in static networks is studied in this paper. We provide a unified framework for studying the performance of randomized tree embedding algorithms which allow a newly created tree node to take a random walk of a short distance to reach a processor nearby. In parti ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
The problem of dynamic tree embedding in static networks is studied in this paper. We provide a unified framework for studying the performance of randomized tree embedding algorithms which allow a newly created tree node to take a random walk of a short distance to reach a processor nearby. In particular, we propose simple randomized algorithms on several most common and important static networks, including ddimensional meshes, d dimensional tori, and hypercubes. It is shown that these algorithms, which have a small constant dilation, are asymptotically optimal. Our analysis technique is based on random walks on static networks. Hence, analytical expressions for expected load on all the processors are available. Keywords: Asymptotic performance, dynamic load distribution, hypercube, mesh, randomized tree embedding, static network, torus. 1 Introduction Many parallel computations are tree structured. Examples are divideandconquer algorithms, backtrack search algorithms, branchan...
Optimal Pattern Matching on Meshes
 Proc. 11th Symposium on Theoretical Aspects of Computer Science
, 1993
"... . Parallel pattern matching on a meshconnected array of processors is considered. The problem is to find all occurrences of a pattern in a text. The input text is a string of n symbols placed in a p n \Theta p n mesh, each processor storing one symbol. The pattern is stored similarly in a conti ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
. Parallel pattern matching on a meshconnected array of processors is considered. The problem is to find all occurrences of a pattern in a text. The input text is a string of n symbols placed in a p n \Theta p n mesh, each processor storing one symbol. The pattern is stored similarly in a contiguous portion of the mesh. An algorithm solving the problem in time O( p n) is presented. It applies a novel technique to design parallel patternmatching algorithms based on the notion of a pseudoperiod. 1 Introduction The problem of pattern matching is to find all occurrences of a given pattern in a given text. A parallel algorithm solving this problem on a meshconnected computer is presented. This parallel computer is a p n \Theta p n array of n processors interconnected according to a grid pattern. Suppose a text t is a string of n symbols taken from some alphabet, and a pattern p is a string of m symbols, m n, from the same alphabet (no restrictions on the size of alphabets are...
Better Algorithms for Parallel Backtracking
 In Workshop on Algorithms for Irregularly Structured Problems, number 980 in LNCS
, 1995
"... . Many algorithms in operations research and artificial intelligence are based on the backtracking principle, i.e., depth first search in implicitly defined trees. For parallelizing these algorithms, an efficient load balancing scheme is of central importance. Previously known load balancing algorit ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
. Many algorithms in operations research and artificial intelligence are based on the backtracking principle, i.e., depth first search in implicitly defined trees. For parallelizing these algorithms, an efficient load balancing scheme is of central importance. Previously known load balancing algorithms either require sending a message for each tree node or they only work efficiently for large search trees. This paper introduces new randomized dynamic load balancing algorithms for tree structured computations, a generalization of backtrack search. These algorithms only need to communicate when necessary and have an asymptotically optimal scalability for hypercubes, butterflies and related networks, and an improved scalability for meshes and hierarchical networks like fat trees. Keywords: Analysis of randomized algorithms, depth first search, distributed memory, divide and conquer, load balancing, parallel backtracking. 1 Introduction Load balancing is one of the central issues in paral...
On the ManhattanDistance Between Points on SpaceFilling MeshIndexings
, 1996
"... Indexing schemes based on space filling curves like the Hilbert curve are a powerful tool for building efficient parallel algorithms on meshconnected computers. The main reason is that they are localitypreserving, i.e., the Manhattandistance between processors grows only slowly with increasing in ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Indexing schemes based on space filling curves like the Hilbert curve are a powerful tool for building efficient parallel algorithms on meshconnected computers. The main reason is that they are localitypreserving, i.e., the Manhattandistance between processors grows only slowly with increasing index differences. We present a simple and easytoverify proof that the Manhattandistance of any indices i and j is bounded by 3 p ji \Gamma jj \Gamma 2 for the 2DHilbert curve. The technique used for the proof is then generalized for a large class of selfsimilar curves. We use this result to show a (quite tight) bound of 4:73458 3 p ji \Gamma jj \Gamma 3 for a 3DHilbert curve. 1 Introduction It has become increasingly clear that meshconnected processor arrays, grids for short, are among the most realistic models of parallel computation [1, 4, 14, 18]. The indexing of the processors is an important aspect in the design of mesh algorithms. Several indexing schemes are wellknown. Mos...
Dynamic Randomized Simulation of Hierarchical PRAMs on Meshes
, 1995
"... The Hierarchical PRAM (HPRAM) [5] model is a dynamically partitionable PRAM, which charges for communication and synchronization, and allows parallel algorithms to abstractly represent general locality. In this paper we show that the HPRAM can be implemented efficiently on a twodimensional mesh. ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
The Hierarchical PRAM (HPRAM) [5] model is a dynamically partitionable PRAM, which charges for communication and synchronization, and allows parallel algorithms to abstractly represent general locality. In this paper we show that the HPRAM can be implemented efficiently on a twodimensional mesh. We use the Peano indexing scheme to hierarchically partition the mesh. Multiple subPRAMs of the HPRAM are simulated on irregular submeshes. For an HPRAM program of cost T , the overall CRCW HPRAM simulation runs in time constant in T with high probability. The simulation is dynamic, i.e. it does not depend on prior knowledge of a program's specific hierarchical configuration, which may be data dependent. 1 Introduction In parallel computing, a model of computation has a difficult task. It must mediate between the conflicting requirements of abstraction (ease of use) for program design/analysis, and cost/resource details of realistic architectures. In other words, it must abstract away...