Results 1 - 10
of
36
An Overview of the Pablo Performance Analysis Environment
, 1992
"... As massively parallel, distributed memory systems replace traditional vector supercomputers, effective application program optimization and system resource management become more than research curiosities --- they are crucial to achieving substantial fractions of peak performance for scientific appl ..."
Abstract
-
Cited by 80 (6 self)
- Add to MetaCart
As massively parallel, distributed memory systems replace traditional vector supercomputers, effective application program optimization and system resource management become more than research curiosities --- they are crucial to achieving substantial fractions of peak performance for scientific application codes. By recording dynamic activity, either at the application or system software level, one can identify and remove performance bottlenecks. Pablo is a performance analysis environment designed to provide performance data capture, analysis, and presentation across a wide variety of scalable parallel systems. The Pablo environment includes software performance instrumentation, graphical performance data reduction and analysis, and support for mapping performance data to both graphics and sound. Current research directions include complete performance data immersion via head-mounted displays and the integration of Pablo with data parallel Fortran compilers based on the emerging High ...
Monitoring, Testing, and Debugging of Distributed Real-Time Systems
, 2000
"... Testing is an important part of any software development project, and can typically surpass more than half of the development cost. For safety-critical computer based systems, testing is even more important due to stringent reliability and safety requirements. However, most safety-critical comput ..."
Abstract
-
Cited by 44 (1 self)
- Add to MetaCart
Testing is an important part of any software development project, and can typically surpass more than half of the development cost. For safety-critical computer based systems, testing is even more important due to stringent reliability and safety requirements. However, most safety-critical computer based systems are real-time systems, and the majority of current testing and debugging techniques have been developed for sequential (non real-time) programs. These techniques are not directly applicable to real-time systems, since they disregard issues of timing and concurrency. This means that existing techniques for reproducible testing and debugging cannot be used. Reproducibility is essential for regression testing and cyclic debugging, where the same test cases are run repeatedly with the intention of verifying modified program code or to track down errors. The current trend of consumer and industrial applications goes from single microcontrollers to sets of distributed micro-controllers, which are even more challenging than handling real-time per-see, since multiple loci of observation and control additionally must be considered. In this thesis we try to remedy these problems by presenting an integrated approach to monitoring, testing, and debugging of distributed real-time systems. For monitoring
Runtime Monitoring of Timing Constraints in Distributed Real-Time Systems
- Real-Time Systems
, 1994
"... . Embedded real-time systems often operate under strict timing and dependability constraints. To ensure responsiveness, these systems must be able to provide the expected services in a timely manner even in the presence of faults. In this paper, we describe a run-time environment for monitoring of t ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
. Embedded real-time systems often operate under strict timing and dependability constraints. To ensure responsiveness, these systems must be able to provide the expected services in a timely manner even in the presence of faults. In this paper, we describe a run-time environment for monitoring of timing constraints in distributed real-time systems. In particular, we focus on the problem of detecting violations of timing assertions in an environment in which the real-time tasks run on multiple processors, and timing constraints can be either inter-processor or intra-processor constraints. Constraint violations are detected at the earliest possible time by deriving and checking intermediate constraints from the user-specified constraints. If the violations must be detected as early as possible, then the problem of minimizing the number of messages to be exchanged between the processors becomes intractable. We characterize a sub-class of timing constraints that occur commonly in distribu...
Efficient Run-Time Monitoring of Timing Constraints
- In IEEE Real-Time Technology and Applications Symposium
, 1997
"... A real-time system operates under timing constraints which it may be unable to meet under some circumstances. The criticality of a timing constraint determines how a system is to react when a timing failure happens. For critical timing constraints, a timing failure should be detected as soon as poss ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
A real-time system operates under timing constraints which it may be unable to meet under some circumstances. The criticality of a timing constraint determines how a system is to react when a timing failure happens. For critical timing constraints, a timing failure should be detected as soon as possible. However, early detection of timing failures requires more resource usage which may be deemed excessive. While work in real-time system monitoring has progressed in recent years, the issue of tradeoff between detection latency and resource overhead has not been adequately considered. This paper presents an approach for monitoring timing constraints in real-time systems which is based on a simple and expressive specification method for defining the timing constraints to be monitored. Efficient algorithms are developed to catch violations of timing constraints at the earliest possible time. These algorithms have been implemented in a tool called JRTM (Java Run-time Timing-constraint Monit...
Application-dependent dynamic monitoring of distributed and parallel systems
- IEEE Transactions on Parallel and Distributed Systems
, 1993
"... Abstract- Achieving high performance for parallel or dis-tributed programs often requires substantial amounts of infor-mation about the programs themselves, about the systems on which they are executing, and about specific program runs. The monitoring system presented in this paper collects, analyze ..."
Abstract
-
Cited by 32 (11 self)
- Add to MetaCart
Abstract- Achieving high performance for parallel or dis-tributed programs often requires substantial amounts of infor-mation about the programs themselves, about the systems on which they are executing, and about specific program runs. The monitoring system presented in this paper collects, analyzes, and makes application-dependent monitoring information available to the programmer and to the executing program. The system may be used for off-line program analysis, for on-line debugging, and for making on-line, dynamic changes to parallel or distributed programs to enhance their performance. We employ a high-level, uniform data model for the representation of program information and monitoring data. We show how this model may be used for the specification of program views and attributes for monitoring, and we demonstrate how such specifications can be translated into efficient, program-specific monitoring code that uses alternative mechanisms for the distributed analysis and collection to be performed for the specified views. The model’s utility has been demonstrated on a wide variety of parallel machines, including several kinds of multiprocessors and a local area network. Index Terms- Application-dependent monitoring, distributed programs, dynamic monitoring, parallel programs, program
On-line Fault Detection of Sensor Measurements
- IEEE Sensors
, 2003
"... On-line fault detection in sensor networks is of paramount importance due to the convergence of a variety of challenging technological, application, conceptual, and safety related factors. We introduce a taxonomy for classication of faults in sensor networks and the rst on-line model-based testing t ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
On-line fault detection in sensor networks is of paramount importance due to the convergence of a variety of challenging technological, application, conceptual, and safety related factors. We introduce a taxonomy for classication of faults in sensor networks and the rst on-line model-based testing technique. The approach is generic in the sense that it can be applied on an arbitrary system of heterogeneous sensors with an arbitrary type of fault model, while it provides a exible tradeoff between accuracy and latency. The key idea is to formulate on-line testing as a set of instances of a non-linear function minimization and consequently apply nonparametric statistical methods to identify the sensors that have the highest probability to be faulty. The optimization is conducted using the Powell nonlinear function minimization method. The effectiveness of the approach is evaluated in the presence of random noise using a system of light sensors.
Run-Time Monitoring of Real-Time Systems
- in Advances in Real-time Systems, Prentice-Hall
, 1995
"... Introduction In designing real-time systems, we often make assumptions about the behavior of the system and its environment. These assumptions take many forms, such as upper bounds on interprocess communication delay, deadlines on the execution of tasks, or minimum separations between occurrences o ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Introduction In designing real-time systems, we often make assumptions about the behavior of the system and its environment. These assumptions take many forms, such as upper bounds on interprocess communication delay, deadlines on the execution of tasks, or minimum separations between occurrences of two events. They are often made to deal with the unpredictability of the external environment or to simplify a problem that is otherwise intractable or very hard to solve. Such assumptions may be expressed as part of the formal specification of the system or as scheduling requirements on real-time computations. Despite the contributions of formal verification methods and real-time scheduling results in recent years, the need to perform run-time monitoring of these systems is not diminished, for several reasons: the execution environment of most systems is imperfect and the interaction with the external world introduces additional unpredictability; design assumptions can be violated
On-Chip Monitoring of Single- and Multiprocessor Hardware Real-Time Operating Systems
- In 8th International Conference on Real-Time Computing Systems and Applications. IEEE
, 2002
"... This paper presents a novel hardware monitoring system that gives non-intrusive observability into the execution of hardware-accelerated Real-Time Operating Systems. ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
This paper presents a novel hardware monitoring system that gives non-intrusive observability into the execution of hardware-accelerated Real-Time Operating Systems.
High-Level Views of Distributed Executions
- PROCEEDINGS OF THE 2ND INTERNATIONAL WORKSHOP ON AUTOMATED AND ALGORITHMIC DEBUGGING
, 1995
"... Due to the complexity of distributed applications, understanding their behaviour is a challenging task. The top-down use of suitable abstraction hierarchies is frequently proposed to manage this complexity.One commonly used abstraction is to group primitiveevents into abstract events. This paper ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Due to the complexity of distributed applications, understanding their behaviour is a challenging task. The top-down use of suitable abstraction hierarchies is frequently proposed to manage this complexity.One commonly used abstraction is to group primitiveevents into abstract events. This paper discusses some of the problems encountered when displaying executions at abstract levels and presents a graphical representation for convex abstract events. Using convex events as building block for abstract execution visualizations avoids the identified problems. The proposed representation can easily be included in the process-time diagrams frequently employed to depict the behaviour of distributed applications. Such visualizations, in turn, are helpful during the construction, debugging, and monitoring of distributed applications as well as in trying to understand old "legacy" code in a program-understanding task. We enhanced the visualization component of a prototype distributed debugger with the facility to depict executions at various abstraction levels. Examples of the resulting abstract visualization for the execution of a non-trivial distributed application are discussed. These abstract visualizations are essential to minimize the complexity of the understanding process, and support top-down debugging.
Software Engineering for Parallel Systems: The TRAPPER Approach
- 28th Hawaiian International Conference on System Sciences
, 1995
"... TRAPPER- is a graphical programming environment for parallel systems. The novel approach introduced with TRAPPER is the support of the different stages of the soft-ware engineering process with emphasis on the spectfic problems of parallel systems. The programming environ-ment contains components fo ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
TRAPPER- is a graphical programming environment for parallel systems. The novel approach introduced with TRAPPER is the support of the different stages of the soft-ware engineering process with emphasis on the spectfic problems of parallel systems. The programming environ-ment contains components for the software design, hard-ware configuration, mapping, monitoring, software visual-ization and performance monitoring of parallel applica-tions and systems. TRAPPER provides a graphical design methodology which allows a hierarchical spectfication of the parallel structure of the application software. The design methodology is based on process graphs which are an ap-propriate graphical notation for the partitioning of the ap-plicaiion into co-operating sequential processes. The graphical notations are used during the whole development cycle thus providing a homogeneous view of the parallel sys-tem at all development stages. 1.

