Results 1 
3 of
3
The Uniform Memory Hierarchy Model of Computation
 Algorithmica
, 1992
"... The Uniform Memory Hierarchy (UMH) model introduced in this paper captures performancerelevant aspects of the hierarchical nature of computer memory. It is used to quantify architectural requirements of several algorithms and to ratify the faster speeds achieved by tuned implementations that use im ..."
Abstract

Cited by 112 (9 self)
 Add to MetaCart
The Uniform Memory Hierarchy (UMH) model introduced in this paper captures performancerelevant aspects of the hierarchical nature of computer memory. It is used to quantify architectural requirements of several algorithms and to ratify the faster speeds achieved by tuned implementations that use improved datamovement strategies. A sequential computer's memory is modelled as a sequence hM 0 ; M 1 ; :::i of increasingly large memory modules. Computation takes place in M 0 . Thus, M 0 might model a computer's central processor, while M 1 might be cache memory, M 2 main memory, and so on. For each module M U , a bus B U connects it with the next larger module M U+1 . All buses may be active simultaneously. Data is transferred along a bus in fixedsized blocks. The size of these blocks, the time required to transfer a block, and the number of blocks that fit in a module are larger for modules farther from the processor. The UMH model is parameterized by the rate at which the blocksizes i...
Modeling Parallel Computers as Memory Hierarchies
 In Proc. Programming Models for Massively Parallel Computers
, 1993
"... A parameterized generic model that captures the features of diverse computer architectures would facilitate the development of portable programs. Specific models appropriate to particular computers are obtained by specifying parameters of the generic model. A generic model should be simple, and for ..."
Abstract

Cited by 43 (6 self)
 Add to MetaCart
A parameterized generic model that captures the features of diverse computer architectures would facilitate the development of portable programs. Specific models appropriate to particular computers are obtained by specifying parameters of the generic model. A generic model should be simple, and for each machine that it is intended to represent, it should have a reasonably accurate specific model. The Parallel Memory Hierarchy (PMH) model of computation uses a single mechanism to model the costs of both interprocessor communication and memory hierarchy traffic. A computer is modeled as a tree of memory modules with processors at the leaves. All data movement takes the form of block transfers between children and their parents. This paper assesses the strengths and weaknesses of the PMH model as a generic model. 1 Introduction The raw computing power of multiprocessor computers is exploding. The challenge is to create software that can take advantage of this computing power. The diversit...
Towards an Optimal BitReversal Permutation Program
 In Proceeding of IEEE Foundations of Computer Science
, 1998
"... The speed of many computations is limited not by the number of arithmetic operations but by the time it takes to move and rearrange data in the increasingly complicated memory hierarchies of modern computers. Array transpose and the bitreversal permutation  trivial operations on a RAM  present ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
The speed of many computations is limited not by the number of arithmetic operations but by the time it takes to move and rearrange data in the increasingly complicated memory hierarchies of modern computers. Array transpose and the bitreversal permutation  trivial operations on a RAM  present nontrivial problems when designing highlytuned scientific library functions, particular for the Fast Fourier Transform. We prove a precise bound for RoCol, a simple pebbletype game that is relevant to implementing these permutations. We use RoCol to give lower bounds on the amount of memory traffic in a computer with fourlevels of memory (registers, cache, TLB, and memory), taking into account such "messy" features as block moves and setassociative caches. The insights from this analysis lead to a bitreversal algorithm whose performance is close to the theoretical minimum. Experiments show it performs significantly better than every program in a comprehensive study of 30 published algo...