Results 1 - 10
of
21
Dynamic Feedback: An Effective Technique for Adaptive Computing
, 1997
"... This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a different optimization policy. The generated co ..."
Abstract
-
Cited by 51 (3 self)
- Add to MetaCart
This paper presents dynamic feedback, a technique that enables computations to adapt dynamically to different execution environments. A compiler that uses dynamic feedback produces several different versions of the same source code; each version uses a different optimization policy. The generated code alternately performs sampling phases and production phases. Each sampling phase measures the overhead of each version in the current environment. Each production phase uses the version with the least overhead in the previous sampling phase. The computation periodically resamples to adjust dynamically to changes in the environment.
Type Directed Cloning for Object-Oriented Programs
- In Proceedings of the Workshop for Languages and Compilers for Parallel Computing
, 1995
"... . Object-oriented programming encourages the use of small functions, dynamic dispatch (virtual functions), and inheritance for code reuse. As a result, such programs typically suffer from inferior performance. The problem is that polymorphic functions do not know the exact types of the data they ope ..."
Abstract
-
Cited by 31 (8 self)
- Add to MetaCart
. Object-oriented programming encourages the use of small functions, dynamic dispatch (virtual functions), and inheritance for code reuse. As a result, such programs typically suffer from inferior performance. The problem is that polymorphic functions do not know the exact types of the data they operate on, and hence must use indirection to operate on them. However, most polymorphism is parametric (e.g. templates in C++) which is amenable to elimination through code replication. We present a cloning algorithm which eliminates parametric polymorphism while minimizing code duplication. The effectiveness of this algorithm is demonstrated on a number of concurrent object-oriented programs. Finally, since functions and data structures can be parameterized over properties other than type, this algorithm is applicable to general forward data flow problems. 1 Introduction Object-oriented (OOP) and concurrent object-oriented (COOP) programming languages have gained popularity because they prov...
Runtime Mechanisms for Efficient Dynamic Multithreading
, 1996
"... High performance on distributed memory machines for programming models with dynamic thread creation and multithreading requires efficient thread management and communication. Traditional multithreading runtimes, consisting of few general-purpose, bundled mechanisms that assume minimal compiler and h ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
High performance on distributed memory machines for programming models with dynamic thread creation and multithreading requires efficient thread management and communication. Traditional multithreading runtimes, consisting of few general-purpose, bundled mechanisms that assume minimal compiler and hardware support, are suitable for computations involving coarse-grained threads but provide low efficiency in the presence of small granularity threads and irregular communication behavior. We describe two mechanisms of the Illinois Concert runtime system which address this shortcoming. The first, hybrid stack-heap execution, exploits close coupling with the compiler to dynamically form coarse-grained execution units; threads are lazily created as required by runtime situations. The second, pull messaging, exploits hardware support to implement a distributed message queue with receiverinitiated data transfer, delivering robust performance across a wide range of dynamic communication charact...
Fine-grain Multithreading with Minimal Compiler Support -- A Cost Effective Approach to Implementing Efficient Multithreading Languages
- PLDI'97
, 1997
"... It is difficult to map the execution model of multithread-ing languages (languages which support fine-grain dynamic thread creation) onto the single stack execution model of C. Consequently, previous work on efficient multithreading uses elaborate frame formats and allocation strategy, with com-pile ..."
Abstract
-
Cited by 20 (5 self)
- Add to MetaCart
It is difficult to map the execution model of multithread-ing languages (languages which support fine-grain dynamic thread creation) onto the single stack execution model of C. Consequently, previous work on efficient multithreading uses elaborate frame formats and allocation strategy, with com-pilers customized for them. This paper presents an alterna-tive cost-effective implementation strategy for multithread-ing languages which can maximally exploit current sequen-tial C compilers. We identify a set of primitives whereby ef-ficient dynamic thread creation and switch can be achieved and clarify implementation issues and solutions which work under the stack frame layout and calling conventions of cur-rent C compilers. The primitives are implemented as a C library and named StackThreads. In StackThreads, a thread creation is done just by a C procedure call, max-imizing thread creation performance. When a procedure suspends an execution, the context of the procedure, which is roughly a stack frame of the procedure, is saved into heap and resumed later. With StackThreads, the compiler writer can straightforwardly translate sequential constructs of the source language into corresponding C statements or expres-sions, while using StackThreads primitives as a blackbox mechanism which switches execution between C procedures.
Whole-Program Optimization for Time and Space Efficient Threads
- in Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS
, 1996
"... Modern languages and operating systems often encourage programmers to use threads, or independent control streams, to mask the overhead of some operations and simplify program structure. Multitasking operating systemsuse threads to mask communication latency, either with hardwares devices or users. ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Modern languages and operating systems often encourage programmers to use threads, or independent control streams, to mask the overhead of some operations and simplify program structure. Multitasking operating systemsuse threads to mask communication latency, either with hardwares devices or users. Client-server applications typically use threads to simplify the complex controlflow that arises when multiple clients are used. Recently, the scientific computing community has started using threads to mask network communication latency in massively parallel architectures, allowing computation and communication to be overlapped. Lastly, some architectures implement threads in hardware, using those threads to tolerate memory latency. In general, it would be desirable if threaded programs could be written to expose the largest degree of parallelism possible, or to simplify the program design. However, threads incur time and space overheads, and programmers often compromise simple designs for ...
MORPH: A System Architecture for Robust High Performance Using Customization (An NSF 100 TeraOps Point Design Study)
, 1996
"... ..."
Supporting High Level Programming with High Performance: The Illinois Concert System
- In Proceedings of the Second International Workshop on High-level Parallel Programming Models and Supportive Environments
, 1997
"... Programmers of concurrent applications are faced with a complex performance space in which data distribution and concurrency management exacerbate the difficulty of building large, complex applications. To address these challenges, the Illinois Concert system provides a global namespace, implicit co ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
Programmers of concurrent applications are faced with a complex performance space in which data distribution and concurrency management exacerbate the difficulty of building large, complex applications. To address these challenges, the Illinois Concert system provides a global namespace, implicit concurrency control and granularity management, implicit storage management, and object-oriented programming features. These features are embodied in a language ICC++ (derived from C++) which has been used to build a number of kernels and applications. As high level features can potentially incur overhead, the Concert system employs a range of compiler and runtime optimization techniques to efficiently support the high level programming model. The compiler techniques include type inference, inlining and specialization; and the runtime techniques include caching, prefetching and hybrid stack/heap multithreading. The effectiveness of these techniques permits the construction of complex parallel ...
Performance Evaluation of OpenMP Applications with Nested Parallelism
, 2000
"... Many existing OpenMP systems do not su ciently implement nested parallelism. This is supposedly because nested parallelism is believed to require a significant implementation effort, incur a large overhead, or lack applications. This paper demonstrates Omni/ST, a simple and efficient implementation ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Many existing OpenMP systems do not su ciently implement nested parallelism. This is supposedly because nested parallelism is believed to require a significant implementation effort, incur a large overhead, or lack applications. This paper demonstrates Omni/ST, a simple and efficient implementation of OpenMP nested parallelism using StackThreads/MP, which is a fine-grain thread library. Thanks to StackThreads/MP, OpenMP parallel constructs are simply mapped onto thread creation primitives of StackThreads/MP, yet they are efficiently managed with a fixed number of threads in the underlying thread package (e.g., Pthreads). Experimental results on Sun Ultra Enterprise 10000 with up to 60 processors show that overhead imposed by nested parallelism is very small (1-3 % in ve out of six applications, and 8 % for the other), and there is a significant scalability benefit for applications with nested parallelism.
An Efficient Implementation of Nested Data Parallelism for Irregular Divide-and-Conquer Algorithms
- FIRST INTERNATIONAL WORKSHOP ON HIGH-LEVEL PROGRAMMING MODELS AND SUPPORTIVE ENVIRONMENTS, APRIL 1996.
, 1996
"... This paper presents work in progress on a new method of implementing irregular divide-and-conquer algorithms in a nested data-parallel language model on distributedmemory multiprocessors. The main features discussed are the recursive subdivision of asynchronous processor groups to match the change f ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
This paper presents work in progress on a new method of implementing irregular divide-and-conquer algorithms in a nested data-parallel language model on distributedmemory multiprocessors. The main features discussed are the recursive subdivision of asynchronous processor groups to match the change from data-parallel to control-parallel behavior over the lifetime of an algorithm, switching from parallel code to serial code when the group size is one (with the opportunityto use a more efficient serial algorithm) , and a simple manager-based run-time load-balancing system. Sample algorithms translated from the high-level nested data-parallel language NESL into C and MPI using this method are significantly faster than the current NESL system, and show the potential for further speedup.
Threads for Interoperable Parallel Programming
- In Proc. 9th Conference on Languages and Compilers for Parallel Computers
, 1996
"... . Many thread packages are freely available on the Internet. Yet, most parallel language design groups seem to have rejected all existing packages and implemented their own. This is unsurprising. Existing thread packages were designed for sequential computers, not parallel machines, and do not fit w ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
. Many thread packages are freely available on the Internet. Yet, most parallel language design groups seem to have rejected all existing packages and implemented their own. This is unsurprising. Existing thread packages were designed for sequential computers, not parallel machines, and do not fit well in a parallel environment. Also importantly, existing thread packages try to impose a number of design decisions, especially in regard to scheduling and preemption. Designers of parallel languages are simply not willing to have scheduling methods decided for them, nor are they willing to allow the threads package to decide how concurrency control will work. In this paper, we explore the special issues raised when threads packages are used on parallel machines, particularly as parts of new parallel languages and systems. We describe the Converse threads subsystem, whose goals are to support the special needs of parallel programs, and to support interoperability among parallel languages. W...

