Results 1 -
8 of
8
Synthesis: An Efficient Implementation of Fundamental Operating System Services
, 1992
"... This dissertation shows that operating systems can provide fundamental services an order of magnitude more efficiently than traditional implementations. It describes the implementation of a new operating system kernel, Synthesis, that achieves this level of performance. The Synthesis kernel combines ..."
Abstract
-
Cited by 79 (1 self)
- Add to MetaCart
This dissertation shows that operating systems can provide fundamental services an order of magnitude more efficiently than traditional implementations. It describes the implementation of a new operating system kernel, Synthesis, that achieves this level of performance. The Synthesis kernel combines several new techniques to provide high performance without sacrificing the expressive power or security of the system. The new ideas include: ffl Run-time code synthesis --- a systematic way of creating executable machine code at runtime to optimize frequently-used kernel routines --- queues, buffers, context switchers, interrupt handlers, and system call dispatchers --- for specific situations, greatly reducing their execution time. ffl Fine-grain scheduling --- a new process-scheduling technique based on the idea of feedback that performs frequent scheduling actions and policy adjustments (at submillisecond intervals) resulting in an adaptive, self-tuning system that can support real-ti...
From Trace Generation to Visualization: A Performance Framework for Distributed Parallel Systems
- In Proc. of SC2000: High Performance Networking and Computing
, 2000
"... In this paper we describe a trace analysis framework, from trace generation to visualization. It includes a unified tracing facility on IBM SP systems, a self-defining interval file format, an API for framework extensions, utilities for merging and statistics generation, and a visualization tool ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
In this paper we describe a trace analysis framework, from trace generation to visualization. It includes a unified tracing facility on IBM SP systems, a self-defining interval file format, an API for framework extensions, utilities for merging and statistics generation, and a visualization tool with preview and multiple time-space diagrams. The trace environment is extremely scalable, and combines MPI events with system activities in the same set of trace files, one for each SMP node. Since the amount of trace data may be very large, utilities are developed to convert and merge individual trace files into a self-defining interval trace file with multiple frame directories. The interval format allows the development of multiple time-space diagrams, such as thread-activity view, processoractivity view, etc., from the same interval file. A visualization tool, Jumpshot, is modified to visualize these views. A statistics utility is developed using the API, along with its graphics v...
Observable Clock Synchronization
, 1994
"... While the synchronization of time-of-day clocks ordinarily requires information flow in both directions between the clocks, this information need not flow directly via messages. However, to take advantage of indirect information flow, we have to make a number of complex assumptions about the behavio ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
While the synchronization of time-of-day clocks ordinarily requires information flow in both directions between the clocks, this information need not flow directly via messages. However, to take advantage of indirect information flow, we have to make a number of complex assumptions about the behavior of the communication medium. To facilitate the verification of such assumptions, we develop a relativistic theory of observable clock synchronization that does not use or depend on a Newtonian framework or real time. Within the context of this theory, we focus on the problem of estimating the time on a remote clock. We generalize the concept of rapport to capture the situation when such an estimate is sufficient for clock synchronization purposes. With a single property, called the Observable Drift Property, we characterize the information flow required for obtaining rapport. We compare our relativized and observable concepts with analogs based on the notion of real time in order to show t...
Performance Monitoring in a Myrinet-Connected Shrimp Cluster
, 1998
"... Performance monitoring is a crucial aspect of parallel programming. Extractingthe best possible performance from the system is the main goal of parallel programming, and monitoring tools are often essential to achieving that goal. Acommon tradeoff arises in determining at which system level to monit ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Performance monitoring is a crucial aspect of parallel programming. Extractingthe best possible performance from the system is the main goal of parallel programming, and monitoring tools are often essential to achieving that goal. Acommon tradeoff arises in determining at which system level to monitor performance information and present results. High-level monitoring approaches can often gather data directly tied to the software programming model, but may abstract away crucial low-level hardware details. Lowlevel monitoring approaches can gather fairly complete performance information about the underlyingsystem,but often at the expense of portability and flexibility. In this paper we discuss a compromise approach between the portabilityand flexibility of high-level monitoring and the detailed data awareness of low-level monitoring. We present a firmware-based performance monitor we designed for a Myrinet-connected Shrimp cluster. This monitor combines the portability and flexibilityty...
Synchronized Universal Time Coordinated for Distributed Real-Time Systems
- Control Engineering Practice
, 1995
"... This paper presents a novel technique for establishing a highly accurate global time in fault-tolerant, large-scale distributed real-time systems. Unlike the usual clock synchronization approaches, the proposed clock validation technique provides a precise system time that also relates to an externa ..."
Abstract
-
Cited by 14 (8 self)
- Add to MetaCart
This paper presents a novel technique for establishing a highly accurate global time in fault-tolerant, large-scale distributed real-time systems. Unlike the usual clock synchronization approaches, the proposed clock validation technique provides a precise system time that also relates to an external time standard like UTC with high accuracy. The underlying idea is to validate time information of external time sources like GPS-receivers against a global time maintained by the local clocks in the system. As an example, a promising interval-based clock validation algorithm ICV
Quality of Service (QoS) Metrics for Continuous Media
- Multimedia Tools and Applications
, 1996
"... This paper presents quality of service (QoS) metrics for continuity and synchronization specifications in continuous media (CM). Proposed metrics specify continuity and synchronization, with tolerable limits on average and bursty defaults from perfect continuity, timing and synchronization constrain ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
This paper presents quality of service (QoS) metrics for continuity and synchronization specifications in continuous media (CM). Proposed metrics specify continuity and synchronization, with tolerable limits on average and bursty defaults from perfect continuity, timing and synchronization constraints. These metrics can be used in a distributed environment for resource allocation. Continuity specification of a CM stream consists of its sequencing, display rate and drift profiles. The sequencing profile of a CM stream consists of tolerable aggregate and consecutive frame miss ratios. Rate profiles specify the average rendition rate and its variation. Given a rate profile, the ideal time unit for frame display is determined as an offset from the beginning of the stream. Drift profile specifies the average and bursty deviation of schedules for frames from such fixed points in time. Synchronization requirements of a collection of CM streams are specified by mixing, rate and synchronization...
Monitoring Shared Virtual Memory Performance on a Myrinet-based PC Cluster
- In International Conference on Supercomputing
, 1998
"... Network-connected clusters of PCs or workstations are becoming a widespread parallel computing platform. Performance methodologies that use either simulation or high-level software instrumentation cannot adequately measure the detailed behavior of such systems. The availability of new network techno ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Network-connected clusters of PCs or workstations are becoming a widespread parallel computing platform. Performance methodologies that use either simulation or high-level software instrumentation cannot adequately measure the detailed behavior of such systems. The availability of new network technologies based on programmable network interfaces opens a new avenue of research in analyzing and improving the performance of software shared memory protocols. We have developed monitoring firmware embedded in the programmable network interfaces of a Myrinet-based PC cluster. Timestamps on network packets facilitate the collection of low-level statistics on, e.g., network latencies, interrupt handler times and inter-node synchronization. This paper describes our use of the low-level software performance monitor to measure and understand the performance of a Shared Virtual Memory (SVM) system implemented on a Myrinetbased cluster, running the SPLASH-2 benchmarks. We measured time spent in vari...
Strategies For The Modelling And Simulation Of Asynchronous Computer Architectures
, 1995
"... 15 Preface 19 Acknowledgements 22 1 Introduction 24 1.1 Background : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 24 1.2 Motivation and Objectives : : : : : : : : : : : : : : : : : : : : : : 24 1.3 Structure of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : 25 1.3.1 Related ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
15 Preface 19 Acknowledgements 22 1 Introduction 24 1.1 Background : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 24 1.2 Motivation and Objectives : : : : : : : : : : : : : : : : : : : : : : 24 1.3 Structure of the Thesis : : : : : : : : : : : : : : : : : : : : : : : : 25 1.3.1 Related Publications : : : : : : : : : : : : : : : : : : : : : 27 2 The Quest for High Performance 28 2.1 Introduction : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 28 2.2 Bit and Instruction Level Parallelism : : : : : : : : : : : : : : : : 29 2.3 Reduced Instruction Set Computers : : : : : : : : : : : : : : : : : 30 2.4 The Limits of Sequential Computation : : : : : : : : : : : : : : : 31 2.5 Parallel Computer Architectures : : : : : : : : : : : : : : : : : : : 32 2.5.1 SIMD : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 33 2.5.2 MIMD : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 34 2.5.2.1 Shared Memory MIMD Architectures : : : : : : : 34 2.5.2.2 Distributed M...

