Results 1 
3 of
3
An Overview of the Pablo Performance Analysis Environment
, 1992
"... As massively parallel, distributed memory systems replace traditional vector supercomputers, effective application program optimization and system resource management become more than research curiosities  they are crucial to achieving substantial fractions of peak performance for scientific appl ..."
Abstract

Cited by 86 (6 self)
 Add to MetaCart
As massively parallel, distributed memory systems replace traditional vector supercomputers, effective application program optimization and system resource management become more than research curiosities  they are crucial to achieving substantial fractions of peak performance for scientific application codes. By recording dynamic activity, either at the application or system software level, one can identify and remove performance bottlenecks. Pablo is a performance analysis environment designed to provide performance data capture, analysis, and presentation across a wide variety of scalable parallel systems. The Pablo environment includes software performance instrumentation, graphical performance data reduction and analysis, and support for mapping performance data to both graphics and sound. Current research directions include complete performance data immersion via headmounted displays and the integration of Pablo with data parallel Fortran compilers based on the emerging High ...
Asynchronous Analysis of Parallel Dynamic Programming Algorithms
, 1994
"... We examine a very simple asynchronous model of parallel computation that assumes the time to compute a task is random, following some probability distribution. The goal of this model is to capture the effects of unpredictable delays on processors, due to communication delays or cache misses, for exa ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
We examine a very simple asynchronous model of parallel computation that assumes the time to compute a task is random, following some probability distribution. The goal of this model is to capture the effects of unpredictable delays on processors, due to communication delays or cache misses, for example. Using techniques from queueing theory and occupancy problems, we use this model to analyze two parallel dynamic programming algorithms. We show that this model is simple to analyze and correctly predicts which algorithm will perform better in practice. The algorithms we consider are a pipeline algorithm, where each processor i computes in order the entries of rows i, i + p and so on, where p is the number of processors; and a diagonal algorithm, where entries along each diagonal extending from the left to the top of the table are computed in turn. It is likely that the techniques used here can be useful in the analysis of other algorithms that use barriers or pipelining techniques. Ind...
Realistic Analysis of Parallel Dynamic Programming Algorithms
 Computer Sciences Dept., Univ. of WisconsinMadison
, 1992
"... We examine a very simple asynchronous model of parallel computation that assumes the time to compute a task is random, following some probability distribution. The goal of this model is to capture the effects of unexpected delays on processors. Using techniques from queueing theory and occupancy pro ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
We examine a very simple asynchronous model of parallel computation that assumes the time to compute a task is random, following some probability distribution. The goal of this model is to capture the effects of unexpected delays on processors. Using techniques from queueing theory and occupancy problems, we use this model to analyze two parallel dynamic programming algorithms. We show that this model is both simple to analyze and realistic in the sense that the analysis corresponds to experimental results on a shared memory parallel machine. The algorithms we consider are a pipeline algorithm, where each processor i computes in order the entries of rows i, i + p and so on, where p is the number of processors; and a diagonal algorithm, where entries along each diagonal extending from the left to the top of the table are computed in turn. It is likely that the techniques used here can be used in the analysis of other algorithms that use barriers or pipelining techniques. 1 Introduction ...