Results 1 - 10
of
17
Threads cannot be implemented as a library
- In PLDI
, 2005
"... threads, library, register promotion, compiler optimization, garbage collection In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a ge ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
threads, library, register promotion, compiler optimization, garbage collection In many environments, multi-threaded code is written in a language that was originally designed without thread support (e.g. C), to which a library of threading primitives was subsequently added. There appears to be a general understanding that this is not the right approach. We provide specific arguments that a pure library approach, in which the compiler is designed independently of threading issues, cannot guarantee correctness of the resulting code. We first review why the approach almost works, and then examine some of the surprising behavior it may entail. We further illustrate that there are very simple cases in which a pure library-based approach seems incapable of expressing an efficient parallel algorithm. Our discussion takes place in the context of C with Pthreads, since it is commonly used, reasonably well specified, and does not attempt to ensure type-safety, which would entail even stronger constraints. The issues we raise are not specific to that context.
Load-reuse analysis: Design and evaluation
- IN PROCEEDINGS OF THE ACM SIGPLAN ’99 CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION
, 1999
"... ..."
Spatial Computation
- in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS
, 2004
"... This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the expense of computation units. In this paper we investigate a particular implementation of SC: ASH (Application-Specific Hardware). Under the assumption that computation is cheaper than communication, ASH replicates computation units to simplify interconnect, building a system which uses very simple, completely dedicated communication channels. As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ASH relies on very simple hardware primitives, using no associative structures, no multiported register files, no scheduling logic, no broadcast, and no clocks. As a consequence, ASH hardware is fast and extremely power efficient.
Partial Redundancy Elimination for Access Path Expressions
- In CC
, 2001
"... Pointer traversals pose significant overhead to the execution of object-oriented programs, since every access to an object?s state requires a pointer dereference. Eliminating redundant pointer traversals reduces both instructions executed as well as redundant memory accesses to relieve pressure on t ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Pointer traversals pose significant overhead to the execution of object-oriented programs, since every access to an object?s state requires a pointer dereference. Eliminating redundant pointer traversals reduces both instructions executed as well as redundant memory accesses to relieve pressure on the memory subsystem. We describe an approach to elimination of redundant access expressions that combines partial redundancy elimination (PRE) with type-based alias analysis (TBAA). To explore the potential of this approach we have implemented an optimization framework for Java class files incorporating TBAA-based PRE over pointer access expressions. The framework is implemented as a class-file-to-class-file transformer; optimized classes can then be run in any standard Java execution environment. Our experiments demonstrate improvements in the execution of optimized code for several Java benchmarks running in diverse execution environments: the standard interpreted JDK virtual machine, a virtual machine using ?just-in-time? compilation, and native binaries compiled off-line (?way-ahead-of-time?). Overall, however, our experience is of mixed success with the optimizations, mainly because of the isolation between our optimizer and the underlying execution environments which prevents more effective cooperation between them.We isolate the impact of access path PRE using TBAA, and demonstrate that Java?s requirement of precise exceptions can noticeably impact code-motion optimizations like PRE.
Impact of economics on compiler optimization
- In Proc. of the Joint ACM Java Grande/ISCOPE 2001 Conf
, 2001
"... Compile-time program optimizations are similar to poetry: more are written than are actually published in commercial compilers. Hard economic reality is that many interesting optimizations have too narrow an audience to justify their cost in a general-purpose compiler, and custom compilers are too e ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Compile-time program optimizations are similar to poetry: more are written than are actually published in commercial compilers. Hard economic reality is that many interesting optimizations have too narrow an audience to justify their cost in a general-purpose compiler, and custom compilers are too expensive to write. An alternative is to allow programmers to define their own compiletime optimizations. This has already happened accidentally for C++, albeit imperfectly, in the form of template metaprogramming. This paper surveys the problems, the accidental success, and what directions future research might take to circumvent current economic limitations of monolithic compilers.
An Efficient Static Analysis Algorithm to Detect Redundant Memory Operations
, 2002
"... As memory system performance becomes an increasingly dominant factor in overall system performance, it is important to optimize programs for memory related operations. This paper concerns static analysis to detect redundant memory operations and enable other compiler transformations to remove such r ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
As memory system performance becomes an increasingly dominant factor in overall system performance, it is important to optimize programs for memory related operations. This paper concerns static analysis to detect redundant memory operations and enable other compiler transformations to remove such redundant operations. We present an
Array SSA for Explicitly Parallel Programs
- In Proc. 5th Intl. Euro-Par Conf., LNCS vol 1685
, 1998
"... The usefulness and applicability of the Static Single Assignment (SSA) framework is undisputed. SSA was originally crafted for sequential programs manipulating scalars, but it has been separately extended to parallel programs on the one hand, and to sequential programs with arrays on the other. In a ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The usefulness and applicability of the Static Single Assignment (SSA) framework is undisputed. SSA was originally crafted for sequential programs manipulating scalars, but it has been separately extended to parallel programs on the one hand, and to sequential programs with arrays on the other. In an Array SSA framework, arrays are precisely handled on an element-per-element basis. This paper proposes an Array SSA form for parallel programs with either weak or strong memory consistency, with event-based synchronization or mutual exclusion, with parallel sections or indexed parallel constructs. 1 Introduction and Related Work The usefulness and applicability of the Static Single Assignment (SSA) framework [5] is undisputed, and still is the subject of active research (e.g.,[22]). SSA was originally crafted for sequential programs manipulating scalars, but it has been extended recently to sequential programs with arrays [11]. In an Array SSA framework, arrays are precisely handled on a...
Reducing Loads and Stores in Stack Architectures
, 2000
"... The stack model of execution uses a stack to hold temporary results during evaluation of a program. Implementations of the stack model, such as Java virtual machines for execution of stack-based Java bytecodes, can often access the stack more efficiently than local variables. Thus, converting loca ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
The stack model of execution uses a stack to hold temporary results during evaluation of a program. Implementations of the stack model, such as Java virtual machines for execution of stack-based Java bytecodes, can often access the stack more efficiently than local variables. Thus, converting local variable accesses into stack accesses can improve the performance of stack-based programs. We formulate a generic family of transformations on Java-like bytecodes that replaces loads and stores of local variables with equivalent operation sequences that avoid the loads and stores in favor of stack manipulation operations. We prove the correctness of these transformations, and argue that the resulting sequences are likely to yield improved performance by reducing memory traffic. We have implemented and evaluated several instances of the generic transformations for their effectiveness on code produced by Sun's javac Java-to-bytecode compiler. Our results demonstrate significant reduct...
Memory Redundancy Elimination to Improve Application Energy Efficiency
- In Proceedings of the 16th International Workshop on Languages and Compilers for Parallel Computing (LCPC'03
, 2003
"... Application energy consumption has become an increasingly important issue for both high-end microprocessors and mobile and embedded devices. A multitude of circuit and architecture-level techniques have been developed to improve application energy e#ciency. However, relatively less work studies ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Application energy consumption has become an increasingly important issue for both high-end microprocessors and mobile and embedded devices. A multitude of circuit and architecture-level techniques have been developed to improve application energy e#ciency. However, relatively less work studies the e#ects of compiler transformations in terms of application energy e#ciency. In this paper, we use energyestimation tools to profile the execution of benchmark applications. The results show that energy consumption due to memory instructions accounts for a large share of total energy. An e#ective compiler technique that can improve energy e#ciency is memory redundancy elimination.

