Results 1 - 10
of
21
Memory Management with Explicit Regions
, 1998
"... Much research has been devoted to studies of and algorithms for memory management based on garbage collection or explicit allocation and deallocation. An alternative approach, region-based memory management, has been known for decades, but has not been wellstudied. In a region-based system each allo ..."
Abstract
-
Cited by 133 (6 self)
- Add to MetaCart
(Show Context)
Much research has been devoted to studies of and algorithms for memory management based on garbage collection or explicit allocation and deallocation. An alternative approach, region-based memory management, has been known for decades, but has not been wellstudied. In a region-based system each allocation specifies a region, and memory is reclaimed by destroying a region, freeing all the storage allocated therein. We show that on a suite of allocation-intensive C programs, regions are competitive with malloc/free and sometimes substantially faster. We also show that regions support safe memory management with low overhead. Experience with our benchmarks suggests that modifying many existing programs to use regions is not difficult. 1 Introduction The two most popular memory management techniques are explicit allocation and deallocation, as in C's malloc/free, and various forms of garbagecollection [Wil92]. Both have well-known advantages and disadvantages, discussed further below. A t...
Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations
- in 2nd Workshop on Hardware/Software Support for High Performance Scientific and Engineering Computing (SHPSEC-03
, 2003
"... MPI support is nearly ubiquitous on high performance sytems today, and is generally highly tuned for performance. It would thus seem to offer a convenient "portable network assembly language" to developers of parallel programming languages who wish to target different network architectures ..."
Abstract
-
Cited by 36 (5 self)
- Add to MetaCart
(Show Context)
MPI support is nearly ubiquitous on high performance sytems today, and is generally highly tuned for performance. It would thus seem to offer a convenient "portable network assembly language" to developers of parallel programming languages who wish to target different network architectures. Unfortunately, neither the traditional MPI 1.1 API, nor the newer MPI 2.0 extensions for one-sided communication provide an adequate compilation target for global address space languages, and this is likely to be the case for many other parallel languages as well. Simulating one-sided communication under the MPI 1.1 API is too expensive, while the MPI 2.0 one-sided API imposes a number of restrictions that would need to be incorporated at the language level, as is it unlikely that a compiler could effectively hide them.
Titanium performance and potential: an NPB experimental study
- In proceedings of the 18th International Workshop on Languages and Compilers for Parallel Computing (LCPC
, 2005
"... Titanium is an explicitly parallel dialect of Java TM designed for high-performance scientific programming. It offers objectorientation, strong typing, and safe memory management in the context of a language that supports high performance and scalable parallelism. We present an overview of the langu ..."
Abstract
-
Cited by 34 (15 self)
- Add to MetaCart
(Show Context)
Titanium is an explicitly parallel dialect of Java TM designed for high-performance scientific programming. It offers objectorientation, strong typing, and safe memory management in the context of a language that supports high performance and scalable parallelism. We present an overview of the language features and demonstrate their use in the context of the NAS Parallel Benchmarks, a standard benchmark suite of kernels that are common across many scientific applications. We argue that parallel languages like Titanium provide greater expressive power than conventional approaches, enabling much more concise and expressive code and minimizing time to solution without sacrificing parallel performance. Empirical results demonstrate our Titanium implementations of three of the NAS Parallel Benchmarks can match or even exceed the performance of the standard MPI/Fortran implementations at realistic problem sizes and processor scales, while still using far cleaner, shorter and more maintainable code. 1
DRFx: A simple and efficient memory model for concurrent programming languages.
, 2009
"... Abstract The most intuitive memory model for shared-memory multithreaded programming is sequential consistency (SC), but it disallows the use of many compiler and hardware optimizations thereby impacting performance. Data-race-free (DRF) models, such as the proposed C++0x memory model, guarantee SC ..."
Abstract
-
Cited by 29 (5 self)
- Add to MetaCart
(Show Context)
Abstract The most intuitive memory model for shared-memory multithreaded programming is sequential consistency (SC), but it disallows the use of many compiler and hardware optimizations thereby impacting performance. Data-race-free (DRF) models, such as the proposed C++0x memory model, guarantee SC execution for datarace-free programs. But these models provide no guarantee at all for racy programs, compromising the safety and debuggability of such programs. To address the safety issue, the Java memory model, which is also based on the DRF model, provides a weak semantics for racy executions. However, this semantics is subtle and complex, making it difficult for programmers to reason about their programs and for compiler writers to ensure the correctness of compiler optimizations. We present the DRFx memory model, which is simple for programmers to understand and use while still supporting many common optimizations. We introduce a memory model (MM) exception which can be signaled to halt execution. If a program executes without throwing this exception, then DRFx guarantees that the execution is SC. If a program throws an MM exception during an execution, then DRFx guarantees that the program has a data race. We observe that SC violations can be detected in hardware through a lightweight form of conflict detection. Furthermore, our model safely allows aggressive compiler and hardware optimizations within compiler-designated program regions. We formalize our memory model, prove several properties about this model, describe a compiler and hardware design suitable for DRFx, and evaluate the performance overhead due to our compiler and hardware requirements.
BulkCompiler: High-Performance Sequential Consistency through Cooperative Compiler and Hardware Support
"... A platform that supported Sequential Consistency (SC) for all codes — not only the well-synchronized ones — would simplify the task of programmers. Recently, several hardware architectures that support high-performance SC by committing groups of instructions at a time have been proposed. However, fo ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
(Show Context)
A platform that supported Sequential Consistency (SC) for all codes — not only the well-synchronized ones — would simplify the task of programmers. Recently, several hardware architectures that support high-performance SC by committing groups of instructions at a time have been proposed. However, for a platform to support SC, it is insufficient that the hardware does; the compiler has to support SC as well. This paper presents the hardware-compiler interface, and the main compiler ideas for BulkCompiler, a simple compiler layer that works with the group-committing hardware to provide a whole-system high-performance SC platform. We introduce ISA primitives and software algorithms for BulkCompiler to drive instruction-group formation, and to transform code to exploit the groups. Our simulation results show that BulkCompiler not only enables a wholesystem SC environment, but also one that actually outperforms a conventional platform that uses the more relaxed Java Memory Model by an average of 37%. The speedups come from code optimization inside software-assembled instruction groups. Categories and Subject Descriptors C.1.2 [Processor Architectures]: Multiple Data Stream Architectures
Hierarchical Pointer Analysis for Distributed Programs
- The 14th International Static Analysis Symposium (SAS 2007, Kongens Lyngby
, 2007
"... Abstract. We present a new pointer analysis for use in shared memory programs running on hierarchical parallel machines. The analysis is motivated by the partitioned global address space languages, in which programmers have control over data layout and threads and can directly read and write to memo ..."
Abstract
-
Cited by 8 (7 self)
- Add to MetaCart
(Show Context)
Abstract. We present a new pointer analysis for use in shared memory programs running on hierarchical parallel machines. The analysis is motivated by the partitioned global address space languages, in which programmers have control over data layout and threads and can directly read and write to memory associated with other threads. Titanium, UPC, Co-Array Fortran, X10, Chapel, and Fortress are all examples of such languages. The novelty of our analysis comes from the hierarchical machine model used, which captures the increasingly hierarchical nature of modern parallel machines. For example, the analysis can distinguish between pointers that can reference values within a thread, within a shared memory multiprocessor, or within a network of processors. The analysis is presented with a formal type system and operational semantics, articulating the various ways in which pointers can be used within a hierarchical machine model. The hierarchical analysis has several applications, including race detection, sequential consistency enforcement, and software caching. We present results of an implementation of the analysis, applying it to data race detection, and show that the hierarchical analysis is very effective at reducing the number of false races detected. 1
Analysis of Partitioned Global Address Space Programs
, 2006
"... The introduction of multi-core processors by the major microprocessor vendors has brought parallel programming into the mainstream. Analysis of parallel languages is critical both for safety and optimization purposes. In this report, we consider the specific case of languages with barrier synchroniz ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
(Show Context)
The introduction of multi-core processors by the major microprocessor vendors has brought parallel programming into the mainstream. Analysis of parallel languages is critical both for safety and optimization purposes. In this report, we consider the specific case of languages with barrier synchronization and global address space abstractions. Two of the fundamental problems in the analysis of parallel programs are to determine when two statements in a program can execute concurrently, and what data can be referenced by each memory location. We present an efficient interprocedural analysis algorithm that conservatively computes the set of all concurrent statements, and improve its precision by using context-free language reachability to ignore infeasible program paths. In addition, we describe a pointer analysis using a hierarchical machine model, which distinguishes between pointers that can reference values within a thread, within a shared memory multiprocessor, or within a network of processors. We then apply the analyses to two clients, data race detection and memory model enforcement. Using a set of five benchmarks, we show that both clients benefit significantly from the analyses. 1
Defining and enforcing referential security
- In submission
, 2012
"... Abstract. Referential integrity, which guarantees that named resources can be accessed when referenced, is an important property for reliability and security. In distributed systems, however, the attempt to provide referential integrity can itself lead to security vulnerabilities that are not curren ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Referential integrity, which guarantees that named resources can be accessed when referenced, is an important property for reliability and security. In distributed systems, however, the attempt to provide referential integrity can itself lead to security vulnerabilities that are not currently well understood. This paper identifies three kinds of referential security vulnerabilities related to the ref-erential integrity of distributed, persistent information. Security conditions cor-responding to the absence of these vulnerabilities are formalized. A language model is used to capture the key aspects of programming distributed systems with named, persistent resources in the presence of an adversary. The referential security of distributed systems is proved to be enforced by a new type system. 1
permission. A Team Analysis Proposal for Recursive Single Program, Multiple Data Programs
, 2012
"... All rights reserved. ..."
(Show Context)