Results 1 - 10
of
50
A library implementation of POSIX threads under UNIX
- In Proceedings of the USENIX Conference
, 1993
"... Recently, there has been an effort to specify an IEEE standard for portable operating systems for open systems, called POSIX. One part of it, the POSIX 1003.4a threads extension (Pthreads for short) [12], describes the interface for light-weight threads that rely on shared memory and have a smaller ..."
Abstract
-
Cited by 118 (15 self)
- Add to MetaCart
Recently, there has been an effort to specify an IEEE standard for portable operating systems for open systems, called POSIX. One part of it, the POSIX 1003.4a threads extension (Pthreads for short) [12], describes the interface for light-weight threads that rely on shared memory and have a smaller context frame than processes. This paper describes and evaluates the design and implementation of a library of Pthreads calls that is solely based on UNIX. It shows that a library implementation is feasible and can result in good performance. This work can also be used as a comparison of the performance of other implementations, or as a prototyping, testing, and debugging system in the regular UNIX environment. Finally, some problems with the Pthreads standard are identified.
The ADAPTIVE Communication Environment: An Object-Oriented Network Programming Toolkit for Developing Communication Software
, 1993
"... The ADAPTIVE Communication Environment (ACE) is an object-oriented toolkit that implements strategic and tactical design patterns to simplify the development of concurrent, event-driven communication software. ACE provides a rich set of reusable C++ wrappers, class categories, and frameworks that pe ..."
Abstract
-
Cited by 104 (5 self)
- Add to MetaCart
The ADAPTIVE Communication Environment (ACE) is an object-oriented toolkit that implements strategic and tactical design patterns to simplify the development of concurrent, event-driven communication software. ACE provides a rich set of reusable C++ wrappers, class categories, and frameworks that perform common communication software tasks across a range of operating system platforms. The communication software tasks provided by ACE include event demultiplexing, event handler dispatching, connection establishment, interprocess communication, shared memory management, message routing, dynamic (re)configuration of network services, multi-threading, and concurrency control. ACE is targeted for developers of high-performance concurrent network applications and services. The primary goal of ACE is to simplify the development of concurrent OO communication software that utilizes interprocess communication, event demultiplexing, explicit dynamic linking, and concurrency. In addition, ACE auto...
On the Design of Chant: A Talking Threads Package
- PROC.SUPERCOMPUTING 94,PP.350-359, WASHINGTON,D.C
, 1994
"... Lightweight threads are becoming increasingly useful in supporting parallelism and asynchronous control structures in applications and language implementations. However, lightweight thread packages traditionally support only shared memory synchronization and communication primitives, limiting their ..."
Abstract
-
Cited by 71 (9 self)
- Add to MetaCart
Lightweight threads are becoming increasingly useful in supporting parallelism and asynchronous control structures in applications and language implementations. However, lightweight thread packages traditionally support only shared memory synchronization and communication primitives, limiting their use in distributed memory environments. We introduce the design of a runtime interface, called Chant, that supports lightweight threads with the capability of communication using both point-to-point and remote service request primitives, built from standard message passing libraries. This is accomplished by extending the POSIX pthreads interface with global thread identifiers, global thread operations, and message passing primitives. This paper introduces the Chant interface and describes the runtime issues in providing an efficient, portable implementation of such an interface. In particular, we present performance results of the initial portion of our runtime system: point-to-point message passing among threads. We examine the issue of thread scheduling in the presence of polling for messages, and measure the overhead incurred when using this interface as opposed to using the underlying communication layer directly. Weshow that our design can accommodate various polling methods, depending on the level of support present in the underlying thread system, and imposes little overhead in point-to-point message passing over the existing communication layer.
Applying Patterns to Develop Extensible ORB Middleware
, 1998
"... Distributed object computing forms the basis for nextgeneration application middleware. At the heart of distributed object computing are Object Request Brokers (ORBs), which automate many tedious and error-prone distributed programming tasks. This article presents a case study of key design patterns ..."
Abstract
-
Cited by 67 (28 self)
- Add to MetaCart
Distributed object computing forms the basis for nextgeneration application middleware. At the heart of distributed object computing are Object Request Brokers (ORBs), which automate many tedious and error-prone distributed programming tasks. This article presents a case study of key design patterns needed to develop ORBs that can be dynamically configured and evolved for specific application requirements and system characteristics.
The Nexus Task-parallel Runtime System
- IN PROC. 1ST INTL WORKSHOP ON PARALLEL PROCESSING
, 1994
"... A runtime system provides a parallel language compiler with an interface to the low-level facilities required to support interaction between concurrently executing program components. Nexus is a portable runtime system for task-parallel programming languages. Distinguishing features of Nexus include ..."
Abstract
-
Cited by 43 (5 self)
- Add to MetaCart
A runtime system provides a parallel language compiler with an interface to the low-level facilities required to support interaction between concurrently executing program components. Nexus is a portable runtime system for task-parallel programming languages. Distinguishing features of Nexus include its support for multiple threads of control, dynamic processor acquisition, dynamic address space creation, a global memory model via interprocessor references, and asynchronous events. In addition, it supports heterogeneity at multiple levels, allowing a single computation to utilize different programming languages, executables, processors, and network protocols. Nexus is currently being used as a compiler target for two task-parallel languages: Fortran M and Compositional C++ . In this paper, we present the Nexus design, outline techniques used to implement Nexus on parallel computers, showhow it is used in compilers, and compare its performance with that of another runtime system.
OPUS: A Coordination Language for Multidisciplinary Applications
- SCIENTIFIC PROGRAMMING
, 1997
"... Data parallel languages, such as High Performance Fortran, can be successfully applied to a wide range of numerical applications. However, many advanced scientific and engineering applications are raultidisciplinary and heterogeneous in nature, and thus do not fit well into the data parallel paradig ..."
Abstract
-
Cited by 33 (14 self)
- Add to MetaCart
Data parallel languages, such as High Performance Fortran, can be successfully applied to a wide range of numerical applications. However, many advanced scientific and engineering applications are raultidisciplinary and heterogeneous in nature, and thus do not fit well into the data parallel paradigm. In this paper we present Opus, a language designed to fill this yap. The central concept of Opus is a mechanism called Shared Abstractions (SDA). An SDA can be used as a computation server, i.e., a locus of computational activity, or as a data repository for sharing data between asynchronous tasks. SDAs can be internally data parallel, providing support for the integration of data and task parallelism as well as nested task parallelism. They can thus be used to express multidisciplinary applications in a natural and efficient way. In this paper we describe the features of the language through a series of examples and give an overview of the runtime support required to implement these concepts in
Nexus: Runtime Support for Task-Parallel Programming Languages
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne Il. 60439
, 1994
"... A runtime system provides a parallel language compiler with an interface to the low-level facilities required to support interaction between concurrently executing program components. Nexus is a portable runtime system for taskparallel programming languages. Distinguishing features of Nexus inclu ..."
Abstract
-
Cited by 31 (3 self)
- Add to MetaCart
A runtime system provides a parallel language compiler with an interface to the low-level facilities required to support interaction between concurrently executing program components. Nexus is a portable runtime system for taskparallel programming languages. Distinguishing features of Nexus include its support for multiple threads of control, dynamic processor acquisition, dynamic address space creation, a global memory model via interprocessor references, and asynchronous events. In addition, it supports heterogeneityat multiple levels, allowing a single computation to utilize di#erent programming languages, executables, processors, and network protocols. Nexus is currently being used as a compiler target for two task-parallel languages: Fortran M and Compositional C++ . In this paper, we present the Nexus design, outline techniques used to implement Nexus on parallel computers, showhowitis used in compilers, and compare its performance with that of another runtime system...
Space-Efficient Scheduling of Nested Parallelism
- ACM Transactions on Programming Languages and Systems
, 1999
"... This article presents an on-line scheduling algorithm that is provably space e#cient and time e#cient for nested-parallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler spa ..."
Abstract
-
Cited by 28 (5 self)
- Add to MetaCart
This article presents an on-line scheduling algorithm that is provably space e#cient and time e#cient for nested-parallel languages. For a computation with depth D and serial space requirement S1 , the algorithm generates a schedule that requires at most S1 +O(K D p)space (including scheduler space) on p processors. Here, K is a user-adjustable runtime parameter specifying the net amount of memory that a thread may allocate before it is preempted by the scheduler. Adjusting the value of K provides a trade-o# between the running time and the memory requirement of a parallel computation. To allow the scheduler to scale with the number of processors, we also parallelize the scheduler and analyze the space and time bounds of the computation to include scheduling costs. In addition to showing that the scheduling algorithm is space and time e#cient in theory, we demonstrate that it is e#ective in practice. We have implemented a runtime system that uses our algorithm to schedule lightweight parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance
Taming the Memory Hogs: Using Compiler-Inserted Releases to Manage Physical Memory Intelligently
- IN PROCEEDINGS OF THE 4TH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI-00
, 2000
"... Out-of-core applications consume physical resources at a rapid rate, causing interactive applications sharing the same machine to exhibit poor response times. This behavior is the result of default resource management strategies in the OS that are inappropriate for memory-intensive applications. Usi ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
Out-of-core applications consume physical resources at a rapid rate, causing interactive applications sharing the same machine to exhibit poor response times. This behavior is the result of default resource management strategies in the OS that are inappropriate for memory-intensive applications. Using an approach that integrates compiler analysis with simple OS support and a runtime layer that adapts to dynamic conditions, we have shown that the impact of out-of-core applications on interactive ones can be greatly mitigated. A combination of prefetching pages that will soon be needed, and releasing pages no longer in use results in good throughput for the out-of-core task and good response time for the interactive one. Each class of application performs well according to the metric most important to it. In addition, the OS does not need to attempt to identify these application classes, or modify its default resource management policies in any way. We also observe that when an out-of-core application releases pages, it both improves the response time of interactive tasks, and also improves its own performance through better replacement decisions and reduced memory management overhead.
Comparative Evaluation of Latency Tolerance Techniques for Software Distributed Shared Memory
- In Proceedings of the 4th IEEE Symposium on High-Performance Computer Architecture
, 1998
"... A key challenge in achieving high performance on software DSMs is overcoming their relatively large communication latencies. In this paper, we consider two techniques which address this problem: prefetching and multithreading. While previous studies have examined each of these techniques in isolatio ..."
Abstract
-
Cited by 24 (2 self)
- Add to MetaCart
A key challenge in achieving high performance on software DSMs is overcoming their relatively large communication latencies. In this paper, we consider two techniques which address this problem: prefetching and multithreading. While previous studies have examined each of these techniques in isolation, this paper is the first to evaluate both techniques using a consistent hardware platform and set of applications, thereby allowing direct comparisons. In addition, this is the first study to consider combining prefetching and multithreading in a software DSM. We performed our experiments on real hardware using a full implementation of both techniques. Our experimental results demonstrate that both prefetching and multithreading result in significant performance improvements when applied individually. In addition, we observe that prefetching and multithreading can potentially complement each other by using prefetching to hide memory latency and multithreading to hide synchronization latency...

