Results 1 - 10
of
23
Debugging Heterogeneous Distributed Systems Using Event-Based Models of Behavior
- Acm Transactions on Computer Systems
, 1995
"... which debugging is treated as a process of creating models of expected program behaviors and comparing these to the actual behaviors exhibited by the program. The use of EBBA techniques can enhance debugging-tool transparency, reduce latency and uncertainty for fundamental debugging activities, and ..."
Abstract
-
Cited by 140 (0 self)
- Add to MetaCart
which debugging is treated as a process of creating models of expected program behaviors and comparing these to the actual behaviors exhibited by the program. The use of EBBA techniques can enhance debugging-tool transparency, reduce latency and uncertainty for fundamental debugging activities, and accommodate diverse, heterogeneous architectures. Using events and behavior models as a basic mechanism provides a uniform view of heterogeneous systems and enables analysis to be performed in well-defined ways. Their use also enables EBBA users to extend and reuse knowledge gained in solving previous problems to new situations. We describe our behavior-modeling algorithm that matches actual behavior to models and automates many behavior analysis steps. The algorithm matches behavior in as many ways as possible and resolves these to return the best match to the user. It deals readily with partial behavior matches and incomplete information. In particular, we describe a tool set we have built. The tool set has been used to investigate the behavior of a wide range of programs. The tools are modular and can be distributed readily throughout a system.
Breakpoints and Halting in Distributed Programs
, 1988
"... Interactive debugging requires that the programmer be able to halt a program at interesting points in its execution. This paper presents an algorithm for halting a distributed program in a consistent state, and presents a definition of distributed breakpoints with an algorithm for implementing the d ..."
Abstract
-
Cited by 75 (0 self)
- Add to MetaCart
Interactive debugging requires that the programmer be able to halt a program at interesting points in its execution. This paper presents an algorithm for halting a distributed program in a consistent state, and presents a definition of distributed breakpoints with an algorithm for implementing the detection of these breakpoints. The Halting Algorithm extends Chandy and Lamport's algorithm for recording global state and solves the problem of processes that are not fully connected or frequently communicating. The definition of distributed breakpoints is based on those events that can be detected in a distributed system. Events that can be partially ordered are detectable and form the basis for the breakpoint predicates, and from the breakpoint definition comes the description of an algorithm that can be used in a distributed debugger to detect these breakpoints. Index Items - Distributed Programming, Distributed Debugging, Halting Algorithm, Distributed Breakpoints. 1. Introduction Inte...
Monitoring, Testing, and Debugging of Distributed Real-Time Systems
, 2000
"... Testing is an important part of any software development project, and can typically surpass more than half of the development cost. For safety-critical computer based systems, testing is even more important due to stringent reliability and safety requirements. However, most safety-critical comput ..."
Abstract
-
Cited by 56 (1 self)
- Add to MetaCart
Testing is an important part of any software development project, and can typically surpass more than half of the development cost. For safety-critical computer based systems, testing is even more important due to stringent reliability and safety requirements. However, most safety-critical computer based systems are real-time systems, and the majority of current testing and debugging techniques have been developed for sequential (non real-time) programs. These techniques are not directly applicable to real-time systems, since they disregard issues of timing and concurrency. This means that existing techniques for reproducible testing and debugging cannot be used. Reproducibility is essential for regression testing and cyclic debugging, where the same test cases are run repeatedly with the intention of verifying modified program code or to track down errors. The current trend of consumer and industrial applications goes from single microcontrollers to sets of distributed micro-controllers, which are even more challenging than handling real-time per-see, since multiple loci of observation and control additionally must be considered. In this thesis we try to remedy these problems by presenting an integrated approach to monitoring, testing, and debugging of distributed real-time systems. For monitoring
Distributed Performance Monitoring: Methods, Tools and Applications
- IEEE Transactions on Parallel and Distributed Systems
, 1994
"... A method for analyzing the functional behavior and the performance of programs in distributed systems is presented. We use hybrid monitoring, a technique which combines advantages of both software monitoring and hardware monitoring. The paper contains a description of a hardware monitor and a softwa ..."
Abstract
-
Cited by 25 (4 self)
- Add to MetaCart
(Show Context)
A method for analyzing the functional behavior and the performance of programs in distributed systems is presented. We use hybrid monitoring, a technique which combines advantages of both software monitoring and hardware monitoring. The paper contains a description of a hardware monitor and a software package (ZM4/SIMPLE) which make our concepts available to programmers, assisting them in debugging and tuning of their code. A short survey of related monitor systems highlights the distinguishing features of our implementation. As an application of our monitoring and evaluation system, the analysis of a parallel ray tracing program running on the SUPRENUM multiprocessor is described. It is shown that monitoring and modeling both rely on a common abstraction of a system's dynamic behavior and therefore can be integrated to one comprehensive methodology. This methodology is supported by a set of tools. Keywords (Index Terms, Key Phrases): hardware monitoring, hybrid monitoring, event-dri...
Standardization of Event Traces Considered Harmful or Is an Implementation of Object-Independent Event Trace Monitoring and Analysis Systems Possible?
- Proc. CNRS-NSF Workshop on Environments and Tools For Parallel Scientific Computing, St. Hilaire du Touvet, France, Elsevier, Advances in Parallel Computing
, 1993
"... Programming non-sequential computer systems is hard! Many tools and environments have been designed and implemented to ease the use and programming of such systems. The majority of the analysis tools is event-based and uses event traces for representing the dynamic behavior of the system under inves ..."
Abstract
-
Cited by 16 (8 self)
- Add to MetaCart
Programming non-sequential computer systems is hard! Many tools and environments have been designed and implemented to ease the use and programming of such systems. The majority of the analysis tools is event-based and uses event traces for representing the dynamic behavior of the system under investigation, the object system. Most tools can only be used for one special object system, or a specific class of systems such as distributed shared memory machines. This limitation is not obvious because all tools provide the same basic functionality. This article discusses approaches to implementing object-independent event trace monitoring and analysis systems. The term object-independent means that the system can be used for the analysis of arbitrary (non-sequential) computer systems, operating systems, programming languages and applications. Three main topics are addressed: objectindependent monitoring, standardization of event trace formats and access interfaces and the application-indepe...
Large-Scale Parallel Programming: Experience with the BBN Butterfly Parallel
- Processor,’’ Proceedings of the First ACM Conference on Parallel Programming: Experience with Applications, Languages and Systems
, 1988
"... For three years, members of the Computer Science Department at the University of Rochester have used a collection of BBN Butterfly TM Parallel Processors to conduct research in parallel systems and applications. For most of that time, Rochester’s 128-node machine has had the distinction of being the ..."
Abstract
-
Cited by 15 (10 self)
- Add to MetaCart
For three years, members of the Computer Science Department at the University of Rochester have used a collection of BBN Butterfly TM Parallel Processors to conduct research in parallel systems and applications. For most of that time, Rochester’s 128-node machine has had the distinction of being the largest shared-memory multiprocessor in the world. In the course of our work with the Butterfly we have ported three compilers, developed five major and several minor library packages, built two different operating systems, and implemented dozens of applications. Our experience clearly demonstrates the practicality of largescale shared-memory multiprocessors, with non-uniform memory access times. It also demonstrates that the problems inherent in programming such machines are far from adequately solved. Both locality and Amdahl’s law become increasingly important with a very large number of nodes. The availability of multiple programming models is also a concern; truly general-purpose parallel computing will require the development of environments that allow programs written under different models to coexist and interact. Most important, there is a continuing need for high-quality programming tools; widespread acceptance of parallel machines will require the development of programming environments comparable to those available on sequential computers.
A Debugger for Distributed Programs
, 1994
"... this paper supports dynamic roll back and replay of distributed programs. This feature is implemented entirely at the user level without modifying the operating system kernel, and without using synchronized clocks. DPD also implements a graphical user interface (GUI) that abstracts processes, events ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
this paper supports dynamic roll back and replay of distributed programs. This feature is implemented entirely at the user level without modifying the operating system kernel, and without using synchronized clocks. DPD also implements a graphical user interface (GUI) that abstracts processes, events, and messages into a graph that helps the user in event analysis. The GUI combined with the dynamic roll back and replay feature provides a powerful tool for finding bugs in a distributed program
Webmon: a performance profiler for web transactions
- in 4th IEEE International Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems
, 2002
"... We describe WebMon, a tool for correlated, transactionoriented performance monitoring of web services. Data collected with WebMon can be analyzed from a variety of perspectives: business, client, transaction, or systems. Maintainers of web services can use such analysis to better understand and mana ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
(Show Context)
We describe WebMon, a tool for correlated, transactionoriented performance monitoring of web services. Data collected with WebMon can be analyzed from a variety of perspectives: business, client, transaction, or systems. Maintainers of web services can use such analysis to better understand and manage the performance of their services. Moreover, WebMon’s data will enable the construction of more accurate performance prediction models for web services. Current web logging techniques create a log file per server, making it difficult to correlate data from log files with respect to a given transaction. Additionally, data about the quality of service perceived by the client is missing entirely. WebMon overcomes these limitations by providing heterogenous instrumentation sensors and HTTP cookiebased correlators. In this paper, we present the design and implementation of of WebMon and our experience in applying WebMon to an HP Library web service. 1
Techniques for Performance Measurement of Parallel Programs
"... Programmers of parallel systems require high-level tools to aid in analyzing the performance of applications. Performance tuning of parallel programs differs substantially from the analogous processes on sequential architectures for two main reasons: the inherent complexity of concurrent systems is ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Programmers of parallel systems require high-level tools to aid in analyzing the performance of applications. Performance tuning of parallel programs differs substantially from the analogous processes on sequential architectures for two main reasons: the inherent complexity of concurrent systems is greater, and the observability of concurrent systems is complicated by the effects instrumentation can have on the behavior of the system. The complexity of parallel architectures combined with non-determinism can make performance difficult to predict and analyze. Many approaches to help users to understand parallel programs have been proposed. This paper summarizes the problems associated with creating parallel performance measurement tools and describes some of the systems that have been built to solve these problems.
Design for Deterministic Monitoring of Distributed Real-Time Systems
, 2000
"... In order to test, or debug, a system we must observe its run-time behavior and deem how well the observations comply with the system requirements. There are two significant differences between debugging and testing of software for desktop computers and embedded real-time systems: (1) It is more diff ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
In order to test, or debug, a system we must observe its run-time behavior and deem how well the observations comply with the system requirements. There are two significant differences between debugging and testing of software for desktop computers and embedded real-time systems: (1) It is more difficult to observe embedded computer systems, simply because they are embedded, and that they thus have very few interfaces to the outside world, and (2) the actual act of observing a real-time systems or distributed real-time system can change their behavior. Monitoring of sequential software is straightforward, but for distributed realtime systems it is more complicated, since race conditions with respect to order of access to shared resources occur naturally. Any intrusive observation, or probing, of the distributed real-time system affects the timing and consequently the outcome of the races. In this paper we present a method for deterministic observations of single tasking, multi-taskin...