Results 1 - 10
of
22
KOAN: a Shared Virtual Memory for the iPSC/2 hypercube
- In CONPAR/VAPP92
, 1991
"... In this paper, we describe the salient features of an implementation of a shared virtual memory, named KOAN, running on a iPSC/2 hypercube. We then discuss its performance on a non-numerical algorithm like ray-tracing as well as a numerical one: the Modified Gram-Schmidt algorithm. 1 Introductio ..."
Abstract
-
Cited by 37 (20 self)
- Add to MetaCart
In this paper, we describe the salient features of an implementation of a shared virtual memory, named KOAN, running on a iPSC/2 hypercube. We then discuss its performance on a non-numerical algorithm like ray-tracing as well as a numerical one: the Modified Gram-Schmidt algorithm. 1 Introduction Programming distributed memory parallel computers (DMPC) using a Shared Virtual Memory (SVM) seems to be in fashion. However, there is few such system available for DMPCs. Most of the research in this area has been done for a network of workstations such as IVY [18], Clouds [9, 21], Munin [6, 5], Memnet [10], Mach [27] and Chorus [1, 24]. These implementations concern only high latency networks and do not allow the comparison of the efficiency of different strategies for parallelizing algorithms. Distributed memory parallel computers with a hypercube or 2D-mesh topology have been commonly used for designing and testing parallel algorithms. Several results are now available. Hence, it woul...
A taxonomy-based comparison of several distributed shared memory systems
- ACM Operating Systems Review
, 1990
"... Two possible modes of Input/Output (I/O)are "sequential " and "random-access", and there is an extremely strong conceptual link between I/O and communication. Sequential communi-cation, typified in the I/O setting by magnetic tape, is typified in the communication setting by a st ..."
Abstract
-
Cited by 37 (4 self)
- Add to MetaCart
Two possible modes of Input/Output (I/O)are "sequential " and "random-access", and there is an extremely strong conceptual link between I/O and communication. Sequential communi-cation, typified in the I/O setting by magnetic tape, is typified in the communication setting by a stream, e.g., a UNIX 1 pipe. Random-access communication, typified in the I/O setting by a drum or disk device, is typified in the communication setting by shared memory. In this paper, we study and survey the extension of the random-access model to distributed computer systems. A Distributed Shared Memory (DSM) is a memory area shared by processes running on computers connected by a network. DSM provides direct system support of the shared memory programming model. When assisted by hardware, it can also provide a low-overhead interprocess communication (IPC) mechanism to software. Shared pages are migrated on demand between the hosts. Since computer network latency is typically much larger than that of a shared bus, caching in DSM is necessary for performance. We use caching and issues such as address space structure and page replacement schemes to define a taxonomy. Based on the taxonomy we examine three DSM efforts in detail, namely: IVY, Clouds and MemNet.
Scheduling and Resource Management Techniques for Multiprocessors
, 1990
"... and related areas. Application requirements motivated the major research areas, processor scheduling and non-uniform memory management, as these areas contain the most important problems raised by the changing design and use of multiprocessors. ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
and related areas. Application requirements motivated the major research areas, processor scheduling and non-uniform memory management, as these areas contain the most important problems raised by the changing design and use of multiprocessors.
Randomized Algorithms For Multiprocessor Page Migration
- SIAM Journal on Computing
"... . The page migration problem is to manage a globally addressed shared memory in a multiprocessor system. Each physical page of memory is located at a given processor, and memory references to that page by other processors incur a cost proportional to the network distance. At times the page may migra ..."
Abstract
-
Cited by 27 (2 self)
- Add to MetaCart
. The page migration problem is to manage a globally addressed shared memory in a multiprocessor system. Each physical page of memory is located at a given processor, and memory references to that page by other processors incur a cost proportional to the network distance. At times the page may migrate between processors at cost proportional to the distance times D, a page size factor. The problem is to schedule movements on-line so that the total cost of memory references is within a constant factor c of the best off-line schedule. An algorithm that does so is called c-competitive. Black and Sleator gave 3-competitive deterministic on-line algorithms for uniform networks (complete graphs with unit edge lengths) and for trees with arbitrary edge lengths. No good deterministic algorithm is known for general networks with arbitrary edge lengths. We present randomized algorithms for the migration problem that are both simple and better than 3-competitiveagainst an oblivious adversary. We ...
On Page Migration and Other Relaxed Task Systems
, 1997
"... This paper is concerned with the page migration (or file migration) problem [BS89] as part of a large class of on-line problems. The page migration problem deals with the management of pages residing in a network of processors. In the classical problem there is only one copy of each page which is ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
This paper is concerned with the page migration (or file migration) problem [BS89] as part of a large class of on-line problems. The page migration problem deals with the management of pages residing in a network of processors. In the classical problem there is only one copy of each page which is accessed by different processors over time. The page is allowed to be migrated between processors. However a migration incurs higher communication cost than an access (proportionally to the page size). The problem is that of deciding when and where to migrate the page in order to lower access costs. A more general setting is the k-page migration where we wish to maintain k copies of the page. The page migration problems are concerned with a dilemma common to many on-line problems: determining when is it beneficial to make configuration changes. We deal with the relaxed task systems model which captures a large class of problems of this type, that can be described as the generalizati...
New On-Line Algorithms for the Page Replication Problem
, 1994
"... We present improved competitive on-line algorithms for the page replication problem and concentrate on important network topologies for which algorithms with a constant competitive ratio can be given. We develop an optimal randomized on-line replication algorithm for trees and uniform networks; its ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
We present improved competitive on-line algorithms for the page replication problem and concentrate on important network topologies for which algorithms with a constant competitive ratio can be given. We develop an optimal randomized on-line replication algorithm for trees and uniform networks; its competitive ratio is approximately 1.58. This performance holds against oblivious adversaries. We also give a randomized memoryless replication algorithm for trees and uniform networks that is 2-competitive against adaptive on-line adversaries. Furthermore we consider on-line replication algorithms for rings and present general techniques that transform c-competitive algorithms for trees into 2c-competitive algorithms for rings. As a result we obtain a randomized on-line algorithm for rings that is 3.16-competitive. We also derive two 4-competitive on-line algorithms for rings which are either deterministic or randomized and memoryless. Again, the randomized results hold against oblivious ad...
Exploiting Operating System Support for Dynamic Page Placement on a NUMA Shared Memory Multiprocessor
- ACM SIGPLAN SYMPOSIUM ON PRINCIPLES AND PRACTICE OF PARALLEL PROGRAMMING
, 1991
"... Shared memory multiprocessors are attractive because they are programmed in a manner similar to uniprocessors. The UMA class of shared memory multiprocessors is the most attractive, from the programmer's point of view, since the programmer need not be concerned with the placement of code and data in ..."
Abstract
-
Cited by 19 (5 self)
- Add to MetaCart
Shared memory multiprocessors are attractive because they are programmed in a manner similar to uniprocessors. The UMA class of shared memory multiprocessors is the most attractive, from the programmer's point of view, since the programmer need not be concerned with the placement of code and data in the physical memory hierarchy. Scalable shared memory multiprocessors, on the other hand, tend to present at least some degree of non-uniformity of memory access to the programmer, making the NUMA class an important one to consider. In this paper, we investigate the role that DUnX, an operating system supporting dynamic page placement on a BBN GP1000, might play in simplifying the memory model presented to the applications programmer. We consider a case study of psolu, a real scientific application originally targeted for a NUMA architecture. We find that dynamic page placement can dramatically improve the performance of a simpler implementation of psolu targeted for an UMA memory architec...
The Effectiveness of SRAM Network Caches in Clustered DSMs
, 1998
"... The frequency of accesses to remote data is a key factor affecting the performance of all Distributed Shared Memory (DSM) systems. Remote data caching is one of the most effective and general techniques to fight processor stalls due to remote capacity misses in the processor caches. The design space ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
The frequency of accesses to remote data is a key factor affecting the performance of all Distributed Shared Memory (DSM) systems. Remote data caching is one of the most effective and general techniques to fight processor stalls due to remote capacity misses in the processor caches. The design space of remote data caches (RDC) has many dimensions and one essential performance trade-off: hit ratio versus speed. Some recent commercial systems have opted for large and slow (S)DRAM network caches (NC), but others completely avoid them because of their damaging effects on the remote/local latency ratio. In this paper we will explore small and fast SRAM network caches as a means to reduce the remote stalls and capacity traffic of multiprocessor clusters. The major appeal of SRAM NCs is that they add less penalty on the latency of NC hits and remote accesses. Their small capacity can handle conflict misses and a limited amount of capacity misses. However, they can be coupled with main memory...
Page Migration Algorithms Using Work Functions
- In Proc. of the 4th Int. Symp. on Algorithms and Computation (ISAAC
, 1994
"... The page migration problem occurs in managing a globally addressed shared memory in a multiprocessor system. Each physical page of memory is located at a given processor, and memory references to that page by other processors are charged a cost equal to the network distance. At times the page may mi ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
The page migration problem occurs in managing a globally addressed shared memory in a multiprocessor system. Each physical page of memory is located at a given processor, and memory references to that page by other processors are charged a cost equal to the network distance. At times the page may migrate between processors, at a cost equal to the distance times a page size factor, D. The problem is to schedule movements on-line so as to minimize the total cost of memory references. Page migration can also be viewed as a restriction of the 1-server with excursions problem. This paper presents a collection of algorithms and lower bounds for the page migration problem in various settings. Competitive analysis is used. The competitiveness of an on-line algorithm is the worst-case ratio of its cost to the optimum cost on any sequence of requests. Randomized (2 + 1 2D )-competitive on-line algorithms are given for trees and products of trees, including the mesh and the hypercube, and for un...
Visualising sharing behaviour in relation to shared memory management
- Proceedings of International Conference on Parallel and Distributed Systems
, 1992
"... Accesses to the shared memory remain to be a major performance limitation in shared memory multiprocessors. Scalable multiprocessors with distributed memory also poses the problem of keeping the memory coherent. A large number of shared memory coherence mechanisms has been proposed to solve this pro ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
Accesses to the shared memory remain to be a major performance limitation in shared memory multiprocessors. Scalable multiprocessors with distributed memory also poses the problem of keeping the memory coherent. A large number of shared memory coherence mechanisms has been proposed to solve this problem. Their relative performance is, however, determined by the sharing behaviour of the workloads. This paper presents a methodology to capture and visualise the sharing behaviour of a parallel program with respect to the choice coherence mechanisms. We identify four conceptual workload parameters: Spatial granularity, Degree of sharing, Access mode, and the Temporal Granularity. To demonstrate the effectiveness of the methodology, we

