Results 1 -
9 of
9
Memory Management with Explicit Regions
, 1998
"... Much research has been devoted to studies of and algorithms for memory management based on garbage collection or explicit allocation and deallocation. An alternative approach, region-based memory management, has been known for decades, but has not been wellstudied. In a region-based system each allo ..."
Abstract
-
Cited by 115 (4 self)
- Add to MetaCart
Much research has been devoted to studies of and algorithms for memory management based on garbage collection or explicit allocation and deallocation. An alternative approach, region-based memory management, has been known for decades, but has not been wellstudied. In a region-based system each allocation specifies a region, and memory is reclaimed by destroying a region, freeing all the storage allocated therein. We show that on a suite of allocation-intensive C programs, regions are competitive with malloc/free and sometimes substantially faster. We also show that regions support safe memory management with low overhead. Experience with our benchmarks suggests that modifying many existing programs to use regions is not difficult. 1 Introduction The two most popular memory management techniques are explicit allocation and deallocation, as in C's malloc/free, and various forms of garbagecollection [Wil92]. Both have well-known advantages and disadvantages, discussed further below. A t...
Titanium performance and potential: an NPB experimental study
- In proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC
, 2005
"... Titanium is an explicitly parallel dialect of Java TM designed for high-performance scientific programming. It offers objectorientation, strong typing, and safe memory management in the context of a language that supports high performance and scalable parallelism. We present an overview of the langu ..."
Abstract
-
Cited by 20 (11 self)
- Add to MetaCart
Titanium is an explicitly parallel dialect of Java TM designed for high-performance scientific programming. It offers objectorientation, strong typing, and safe memory management in the context of a language that supports high performance and scalable parallelism. We present an overview of the language features and demonstrate their use in the context of the NAS Parallel Benchmarks, a standard benchmark suite of kernels that are common across many scientific applications. We argue that parallel languages like Titanium provide greater expressive power than conventional approaches, enabling much more concise and expressive code and minimizing time to solution without sacrificing parallel performance. Empirical results demonstrate our Titanium implementations of three of the NAS Parallel Benchmarks can match or even exceed the performance of the standard MPI/Fortran implementations at realistic problem sizes and processor scales, while still using far cleaner, shorter and more maintainable code. 1
Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations
- in 2nd Workshop on Hardware/Software Support for High Performance Scientific and Engineering Computing (SHPSEC-03
, 2003
"... MPI support is nearly ubiquitous on high performance sytems today, and is generally highly tuned for performance. It would thus seem to offer a convenient "portable network assembly language" to developers of parallel programming languages who wish to target different network architectures. Unfortun ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
MPI support is nearly ubiquitous on high performance sytems today, and is generally highly tuned for performance. It would thus seem to offer a convenient "portable network assembly language" to developers of parallel programming languages who wish to target different network architectures. Unfortunately, neither the traditional MPI 1.1 API, nor the newer MPI 2.0 extensions for one-sided communication provide an adequate compilation target for global address space languages, and this is likely to be the case for many other parallel languages as well. Simulating one-sided communication under the MPI 1.1 API is too expensive, while the MPI 2.0 one-sided API imposes a number of restrictions that would need to be incorporated at the language level, as is it unlikely that a compiler could effectively hide them.
Hierarchical Pointer Analysis for Distributed Programs
- The 14th International Static Analysis Symposium (SAS 2007, Kongens Lyngby
, 2007
"... Abstract. We present a new pointer analysis for use in shared memory programs running on hierarchical parallel machines. The analysis is motivated by the partitioned global address space languages, in which programmers have control over data layout and threads and can directly read and write to memo ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Abstract. We present a new pointer analysis for use in shared memory programs running on hierarchical parallel machines. The analysis is motivated by the partitioned global address space languages, in which programmers have control over data layout and threads and can directly read and write to memory associated with other threads. Titanium, UPC, Co-Array Fortran, X10, Chapel, and Fortress are all examples of such languages. The novelty of our analysis comes from the hierarchical machine model used, which captures the increasingly hierarchical nature of modern parallel machines. For example, the analysis can distinguish between pointers that can reference values within a thread, within a shared memory multiprocessor, or within a network of processors. The analysis is presented with a formal type system and operational semantics, articulating the various ways in which pointers can be used within a hierarchical machine model. The hierarchical analysis has several applications, including race detection, sequential consistency enforcement, and software caching. We present results of an implementation of the analysis, applying it to data race detection, and show that the hierarchical analysis is very effective at reducing the number of false races detected. 1
BulkCompiler: High-Performance Sequential Consistency through Cooperative Compiler and Hardware Support
"... A platform that supported Sequential Consistency (SC) for all codes — not only the well-synchronized ones — would simplify the task of programmers. Recently, several hardware architectures that support high-performance SC by committing groups of instructions at a time have been proposed. However, fo ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A platform that supported Sequential Consistency (SC) for all codes — not only the well-synchronized ones — would simplify the task of programmers. Recently, several hardware architectures that support high-performance SC by committing groups of instructions at a time have been proposed. However, for a platform to support SC, it is insufficient that the hardware does; the compiler has to support SC as well. This paper presents the hardware-compiler interface, and the main compiler ideas for BulkCompiler, a simple compiler layer that works with the group-committing hardware to provide a whole-system high-performance SC platform. We introduce ISA primitives and software algorithms for BulkCompiler to drive instruction-group formation, and to transform code to exploit the groups. Our simulation results show that BulkCompiler not only enables a wholesystem SC environment, but also one that actually outperforms a conventional platform that uses the more relaxed Java Memory Model by an average of 37%. The speedups come from code optimization inside software-assembled instruction groups. Categories and Subject Descriptors C.1.2 [Processor Architectures]: Multiple Data Stream Architectures
Analysis of Partitioned Global Address Space Programs
, 2006
"... The introduction of multi-core processors by the major microprocessor vendors has brought parallel programming into the mainstream. Analysis of parallel languages is critical both for safety and optimization purposes. In this report, we consider the specific case of languages with barrier synchroniz ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The introduction of multi-core processors by the major microprocessor vendors has brought parallel programming into the mainstream. Analysis of parallel languages is critical both for safety and optimization purposes. In this report, we consider the specific case of languages with barrier synchronization and global address space abstractions. Two of the fundamental problems in the analysis of parallel programs are to determine when two statements in a program can execute concurrently, and what data can be referenced by each memory location. We present an efficient interprocedural analysis algorithm that conservatively computes the set of all concurrent statements, and improve its precision by using context-free language reachability to ignore infeasible program paths. In addition, we describe a pointer analysis using a hierarchical machine model, which distinguishes between pointers that can reference values within a thread, within a shared memory multiprocessor, or within a network of processors. We then apply the analyses to two clients, data race detection and memory model enforcement. Using a set of five benchmarks, we show that both clients benefit significantly from the analyses. 1
permission. A Team Analysis Proposal for Recursive Single Program, Multiple Data Programs
, 2012
"... All rights reserved. ..."
Data Sharing Analysis for Titanium
, 2001
"... Parallel programs share data in ways that may not be obvious at the source level. Understanding a program's data sharing behavior is critical to understanding the program as a whole, and is also a necessary component for numerous program analysis, optimization, and run time clients. We report on the ..."
Abstract
- Add to MetaCart
Parallel programs share data in ways that may not be obvious at the source level. Understanding a program's data sharing behavior is critical to understanding the program as a whole, and is also a necessary component for numerous program analysis, optimization, and run time clients. We report on the design of and experience with an implementation of a data sharing analysis for the Titanium programming language.

