The goal of the Olden project is to build a system that provides parallelism for general purpose C programs with minimal programmer annotations. We focus on programs using dynamic structures such as trees, lists, and DAGs. We demonstrate that providing both software caching and computation migration can improve the performance of these programs, and provide a compile-time heuristic that selects between them for each pointer dereference. We have implemented a prototype system on the Thinking Machines CM-5. We describe our implementation and report on experiments with ten benchmarks. 1 Introduction Olden is a continuing project whose goals are to build a compiler and runtime system for C programs on distributed-memory SPMD machines, automatically detecting parallelism, and inserting communication as much as possible. We specifically focus on programs using recursive data structures. To date, little work has been done to address the problem of supporting these programs. Although work has...
|
890
|
Active messages: A mechanism for integrated communication and computation
– Eicken, Culler, et al.
- 1992
|
|
829
|
Memory coherence in shared virtual memory systems
– Li, Hudak
- 1989
|
|
607
|
Revision to ‘‘Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors
– GHARACHORLOO, GUPTA, et al.
- 1993
|
|
517
|
Implementation and performance of Munin
– Carter, Bennett, et al.
- 1991
|
|
486
|
Fine-grained mobility in the Emerald system
– Jul, Levy, et al.
- 1988
|
|
333
|
The Midway distributed shared memory system
– Bershad, Zekauskas, et al.
- 1993
|
|
303
|
Orca: A language for parallel programming of distributed systems
– Bal, Kaashoek, et al.
- 1992
|
|
294
|
Tempest and Typhoon: User-level shared memory
– Reinhardt, Larus, et al.
- 1994
|
|
213
|
The Amber system: Parallel programming on a network of multiprocessors
– Chase, Amador, et al.
- 1989
|
|
195
|
An Evaluation of Directory Schemes for Cache Coherence
– Agarwal, Simoni, et al.
- 1988
|
|
190
|
R.H.: Lazy task creation – a technique for increasing the granularity of parallel programs
– Mohr, Kranz, et al.
- 1991
|
|
160
|
et al., \The Stanford FLASH multiprocessor
– Kuskin, Ofelt
- 1994
|
|
157
|
Fine-grain Access Control for Distributed Shared Memory
– Schoinas, Falsafi, et al.
- 1994
|
|
140
|
A Retargetable C Compiler: Design and Implementation
– Fraser, Hanson
- 1995
|
|
139
|
Parallel programming in Split-C
– CULLER, ARPACI-DUSSEAU, et al.
- 1993
|
|
138
|
Supporting dynamic data structures on distributed memory machines
– Rogers, Carlisle, et al.
- 1995
|
|
134
|
A Hierarchical O(N log N) Force-Calculation Algorithm
– Barnes, Hut
- 1986
|
|
95
|
Software write detection for a distributed shared memory
– Zekauskas, Sawdon, et al.
- 1994
|
|
86
|
Application-specific protocols for user-level shared memory
– Falsafi, Lebeck, et al.
- 1994
|
|
70
|
Probabilistic analysis of partitioning algorithms for the traveling-salesman problem in the plane
– Karp
- 1977
|
|
66
|
A General Data Dependence Test for Dynamic, Pointer-Based Data Structures
– Hummel, Hendren, et al.
- 1994
|
|
47
|
Distributed Data Structures in Linda
– Carriero, Gelernter, et al.
- 1986
|
|
47
|
MULTILISP: a language for concurrent symbolic computation
– Jr
|
|
46
|
The concert system – compiler and runtime support for efficient finegrained concurrent object-oriented programs
– Chien, Karamcheti, et al.
- 1993
|
|
43
|
Computation migration: Enhancing locality for distributed-memory parallel systems
– Hsieh, Wang, et al.
- 1993
|
|
37
|
How to make a multiprocessor that correctly executes multiprocess programs
– Lamport
- 1979
|
|
36
|
A Cache Coherence Scheme With Fast Selective Invalidation
– CHEONG, VEIDENBAUM
- 1988
|
|
36
|
Comparison of Hardware and Software Cache Coherence Schemes
– Adve, Adve, et al.
- 1991
|
|
29
|
Adaptive bitonic sorting: An optimal parallel algorithm for shared-memory machines
– Bilardi, Nicolau
- 1989
|
|
26
|
Cache Coherence for Shared Memory Multiprocessors Based On Virtual Memory Support
– PETERSEN, LI
- 1992
|
|
18
|
A Performance Study of Time Warp
– Lomow, Cleary, et al.
- 1988
|
|
18
|
Cid : A Parallel, "Shared-memory" C for Distributed-memory Machines
– Nikhil
- 1994
|
|
16
|
High performance software coherence for current and future architectures
– Kontothanassis, Scott
- 1995
|
|
14
|
Design and analysis of a scalable cache coherence scheme based on clocks and timestamps
– Min, Baer
- 1992
|
|
13
|
Cache Coherence Using Local Knowledge
– Darnell, Kennedy
- 1993
|
|
10
|
A parallel algorithm for constructing minimum spanning trees
– Bentley
- 1980
|
|
10
|
Decentralized optimal power pricing: The development of a parallel program
– Lumetta, Murphy, et al.
- 1993
|
|
7
|
Computing perimeters of regions in images represented by quadtrees
– Samet
- 1981
|
|
5
|
et al. The MIT Alewife machine : A large scale Distributed-Memory Multiprocessor. In Scalable shared memory multiprocessors
– Agarwal
- 1991
|
|
3
|
A parallel, "shared-memory" C for distributed-memory machines
– Cid
- 1994
|
|
2
|
General subdivisions and voronoi diagrams
– Guibas, Stolfi
- 1985
|