Results 1 - 10
of
84
Efficient Software-Based Fault Isolation
, 1993
"... One way to provide fault isolation among cooperating software modules is to place each in its own address space. However, for tightly-coupled modules, this solution incurs prohibitive context switch overhead. In this paper, we present a software approach to implementing fault isolation within a sing ..."
Abstract
-
Cited by 627 (11 self)
- Add to MetaCart
One way to provide fault isolation among cooperating software modules is to place each in its own address space. However, for tightly-coupled modules, this solution incurs prohibitive context switch overhead. In this paper, we present a software approach to implementing fault isolation within a single address space. Our approach has two parts. First, we load the code and data for a distrusted module into its own fault domain, a logically separate portion of the application's address space. Second, we modify the object code of a distrusted module to prevent it from writing or jumping to an address outside its fault domain. Both these software operations are portable and programming language independent. Our approach poses a tradeo relative to hardware fault isolation: substantially faster communication between fault domains, at a cost of slightly increased execution time for distrusted modules. We demonstrate that for frequently communicating modules, implementing fault isolation in software rather than hardware can substantially improve end-to-end application performance.
The Paradyn Parallel Performance Measurement Tools
- IEEE Computer
, 1995
"... Paradyn is a performance measurement tool for parallel and distributed programs. Paradyn uses several novel technologies so that it scales to long running programs (hours or days) and large (thousand node) systems, and automates much of the search for performance bottlenecks. It can provide precise ..."
Abstract
-
Cited by 353 (28 self)
- Add to MetaCart
Paradyn is a performance measurement tool for parallel and distributed programs. Paradyn uses several novel technologies so that it scales to long running programs (hours or days) and large (thousand node) systems, and automates much of the search for performance bottlenecks. It can provide precise performance data down to the procedure and statement level. Paradyn is based on a dynamic notion of performance instrumentation and measurement. Unmodified executable files are placed into execution and then performance instrumentation is inserted into the application program and modified during execution. The instrumentation is controlled by the Performance Consultant module, that automatically directs the placement of instrumentation. The Performance Consultant has a well-defined notion of performance bottlenecks and program structure, so that it can associate bottlenecks with specific causes and specific parts of a program. Paradyn controls its instrumentation overhead by monitoring the cost of its data collection, limiting its instrumentation to a (user controllable) threshold. The instrumentation in Paradyn can easily be configured to accept new operating system, hardware, and application specific performance data. It also provides an open interface for performance visualization, and a simple programming library to allow these visualizations to interface to Paradyn. Paradyn can gather and present performance data in terms of high-level parallel languages (such as data parallel Fortran) and can measure programs on massively parallel computers, workstation clusters, and heterogeneous combinations of these systems. 1.
Shade: A Fast Instruction-Set Simulator for Execution Profiling
, 1994
"... Tracing tools are used widely to help analyze, design, and tune both hardware and software systems. This paper describes a tool called Shade which combines efficient instruction-set simulation with a flexible, extensible trace generation capability. Efficiency is achieved by dynamically compiling an ..."
Abstract
-
Cited by 315 (2 self)
- Add to MetaCart
Tracing tools are used widely to help analyze, design, and tune both hardware and software systems. This paper describes a tool called Shade which combines efficient instruction-set simulation with a flexible, extensible trace generation capability. Efficiency is achieved by dynamically compiling and caching code to simulate and trace the application program. The user may control the extent of tracing in a variety of ways; arbitrarily detailed application state information may be collected during the simulation, but tracing less translates directly into greater efficiency. Current Shade implementations run on SPARC systems and simulate the SPARC (Versions 8 and 9) and MIPS I instruction sets. This paper describes the capabilities, design, implementation, and performance of Shade, and discusses instruction set emulation in general.
Optimally Profiling and Tracing Programs
- ACM Transactions on Programming Languages and Systems
, 1994
"... copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others ..."
Abstract
-
Cited by 255 (17 self)
- Add to MetaCart
copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications
Fine-grain Access Control for Distributed Shared Memory
- In Proceedings of the Sixth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VI
, 1994
"... This paper discusses implementations of fine-grain memory access control, which selectively restricts reads and writes to cache-block-sized memory regions. Fine-grain access control forms the basis of efficient cache-coherent shared memory. This paper focuses on low-cost implementations that require ..."
Abstract
-
Cited by 160 (26 self)
- Add to MetaCart
This paper discusses implementations of fine-grain memory access control, which selectively restricts reads and writes to cache-block-sized memory regions. Fine-grain access control forms the basis of efficient cache-coherent shared memory. This paper focuses on low-cost implementations that require little or no additional hardware. These techniques permit efficient implementation of shared memory on a wide range of parallel systems, thereby providing shared-memory codes with a portability previously limited to message passing. This paper categorizes techniques based on where access control is enforced and where access conflicts are handled. We incorporated three techniques that require no additional hardware into Blizzard, a system that supports distributed shared memory on the CM-5. The first adds a software lookup before each shared-memory reference by modifying the program's executable. The second uses the memory's error correcting code (ECC) as cache-block valid bits. The third is...
Fine-grained dynamic instrumentation of commodity operating system kernels
, 1999
"... We have developed a technology, fine-grained dynamic instrumentation of commodity kernels, which can splice (insert) dynamically generated code before almost any machine code instruction of a completely unmodified running commodity operating system kernel. This technology is well-suited to performan ..."
Abstract
-
Cited by 107 (5 self)
- Add to MetaCart
We have developed a technology, fine-grained dynamic instrumentation of commodity kernels, which can splice (insert) dynamically generated code before almost any machine code instruction of a completely unmodified running commodity operating system kernel. This technology is well-suited to performance profiling, debugging, code coverage, security auditing, runtime code optimizations, and kernel extensions. We have designed and implemented a tool called KernInst that performs dynamic instrumentation on a stock production Solaris kernel running on an UltraSPARC. On top of KernInst, we have implemented a kernel performance profiling tool, and used it to understand kernel and application performance under a Web proxy server workload. We used this information to make two changes (one to the kernel, one to the proxy) that cumulatively reduce the percentage of elapsed time that the proxy spends opening disk cache files from 40 % to 7%. 1
The concept of dynamic analysis
- In ESEC / SIGSOFT FSE
, 1999
"... Abstract. Dynamic analysis is the analysis of the properties of a run-ning program. In this paper, we explore two new dynamic analyses based on program profiling:- Frequency Spectrum Analysis. We show how analyzing the frequen-cies of program entities in a single execution can help programmers to de ..."
Abstract
-
Cited by 95 (0 self)
- Add to MetaCart
Abstract. Dynamic analysis is the analysis of the properties of a run-ning program. In this paper, we explore two new dynamic analyses based on program profiling:- Frequency Spectrum Analysis. We show how analyzing the frequen-cies of program entities in a single execution can help programmers to decompose a program, identify related computations, and find computations related to specific input and output characteristics of a program.- Coverage Concept Analysis. Concept analysis of test coverage data computes dynamic analogs to static control flow relationships such as domination, postdomination, and regions. Comparison of these dynamically computed relationships to their static counterparts can point to areas of code requiring more testing and can aid program-mers in understanding how a program and its test sets relate to one another. 1
Studies of Windows NT Performance using Dynamic Execution Traces
- IN PROCEEDINGS OF THE SECOND SYMPOSIUM ON OPERATING SYSTEM DESIGN AND IMPLEMENTATION
, 1996
"... We studied two aspects of the performance of Windows NT: processor bandwidth requirements for memory accesses in a uniprocessor system running commercial and benchmark applications, and locking behavior of a commercial database on a small-scale multiprocessor. Our studies are based on full dynamic e ..."
Abstract
-
Cited by 87 (0 self)
- Add to MetaCart
We studied two aspects of the performance of Windows NT: processor bandwidth requirements for memory accesses in a uniprocessor system running commercial and benchmark applications, and locking behavior of a commercial database on a small-scale multiprocessor. Our studies are based on full dynamic execution traces of the systems, which include all instructions executed by the operating system and applications over periods of a few seconds (enough time to allow for significant computation). The traces were obtained on Alpha PCs, using a new software tool called PatchWrx that takes advantage of the Alpha architecture's PAL-code layer to implement efficient, comprehensive system tracing. Because the Alpha version of Windows NT uses substantially the same code base as other versions, and therefore executes nearly the same sequence of calls, basic blocks, and data structure accesses, we believe our conclusions are relevant for non-Alpha systems as well. This paper describes our performance studies and interesting aspects of PatchWrx. We conclude
JRes: A Resource Accounting Interface for Java
, 1998
"... With the spread of the Internet the computing model on server systems is undergoing several important changes. Recent research ideas concerning dynamic operating system extensibility are finding their way into the commercial domain, resulting in designs of extensible databases and Web servers. In ad ..."
Abstract
-
Cited by 85 (4 self)
- Add to MetaCart
With the spread of the Internet the computing model on server systems is undergoing several important changes. Recent research ideas concerning dynamic operating system extensibility are finding their way into the commercial domain, resulting in designs of extensible databases and Web servers. In addition, both ordinary users and service providers must deal with untrusted downloadable executable code of unknown origin and intentions. Across the board, Java has emerged as the language of choice for Internet-oriented software. We argue that, in order to realize its full potential in applications dealing with untrusted code, Java needs a flexible resource accounting interface. The design and prototype implementation of such an interface --- JRes --- is presented in this paper. The interface allows to account for heap memory, CPU time, and network resources consumed by individual threads or groups of threads. JRes allows limits to be set on resources available to threads and it can invoke...
Application-Specific Protocols for User-Level Shared Memory
- In Proceedings of Supercomputing '94
, 1994
"... Recent distributed shared memory (DSM) systems and proposed shared-memory machines have implemented some or all of their cache coherence protocols in software. One way to exploit the flexibility of this software is to tailor a coherence protocol to match an application's communication patterns and m ..."
Abstract
-
Cited by 84 (24 self)
- Add to MetaCart
Recent distributed shared memory (DSM) systems and proposed shared-memory machines have implemented some or all of their cache coherence protocols in software. One way to exploit the flexibility of this software is to tailor a coherence protocol to match an application's communication patterns and memory semantics. This paper presents evidence that this approach can lead to large performance improvements. It shows that application-specific protocols substantially improved the performance of three application programs---appbt, em3d, and barnes---over carefully tuned transparent shared memory implementations. The speed-ups were obtained on Blizzard, a fine-grained DSM system running on a 32-node Thinking Machines CM-5. 1 Introduction A shared address space is central to many parallel languages and models of parallel computation. It provides the global names for data that enable a proces- This work is supported in part by NSF PYI/NYI Awards MIP-8957278, CCR-9157366, and CCR-9357779,...

