Results 1 -
7 of
7
Causality Representation and Cancellation Mechanism in Time Warp Simulations
- In PADS ’01: Proceedings of the fifteenth workshop on Parallel and distributed simulation
, 2001
"... The Time Warp synchronization protocol allows causality errors and then recovers from them with the assistance of a cancellation mechanism. Cancellation can cause the rollback of several other simulation objects that may trigger a cascading rollback situation where the rollback cycles back to the or ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The Time Warp synchronization protocol allows causality errors and then recovers from them with the assistance of a cancellation mechanism. Cancellation can cause the rollback of several other simulation objects that may trigger a cascading rollback situation where the rollback cycles back to the original simulation object. These cycles of rollback can cause the simulation to enter a unstable (or thrashing) state where little real forward simulation progress is achieved. To address this problem, knowledge of causal relations between events can be used during cancellation to avoid cascading rollbacks and to initiate early recovery operations from causality errors. In this paper, we describe a logical time representation for Time Warp simulations that is used to disseminate causality information. The new timestamp representation, called Total Clocks, has two components: (i) a virtual time component, and (ii) a vector of event counters similar to Vector clocks. The virtual time component provides a one dimensional global simulation time, and the vector of event counters records event processing rates by the simulation objects. This time representation allows us to disseminate causality information during event execution that can be used to allow early recovery during cancellation. We propose a cancellation mechanism using Total Clocks that avoids cascading rollbacks in Time Warp simulations that have FIFO communication channels.
POM: a Parallel Observable Machine
- In Proceedings of PARCO’95
, 1995
"... POM is a Parallel Observable Machine featuring mechanisms for building and observing distributed applications. It comes in the form of a library built upon the many communication kernels available on current parallel architectures and a loader that provides the user with a homogeneous syntax for lau ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
POM is a Parallel Observable Machine featuring mechanisms for building and observing distributed applications. It comes in the form of a library built upon the many communication kernels available on current parallel architectures and a loader that provides the user with a homogeneous syntax for launching parallel applications on any parallel platform. The prior goal of POM is not to offer numerous services to the application programmer as it is done in PVM [2], MPI [9] or P4 [3]. It mostly aims at masking the specificities of the various communication kernels of today's machines with no significant degradation of performances. In that sense, our approach is quite similar to that of projects PICL [5] and PARMACS [4]. Yet, one of our main priorities while designing POM was to define a model of virtual machine and to clearly specify the semantics of the communications in this model. We also wanted to define an easily portable machine -- i.e. a machine...
POM: a Virtual Parallel Machine Featuring Observation Mechanisms
- PI 902, IRISA
, 1995
"... : We describe in this paper a Parallel Observable virtual Machine (POM), which provides a homogeneous interface upon the communication kernels of parallel architectures. POM was designed so as to be ported easily and efficiently on numerous parallel platforms. It provides sophisticated features for ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
: We describe in this paper a Parallel Observable virtual Machine (POM), which provides a homogeneous interface upon the communication kernels of parallel architectures. POM was designed so as to be ported easily and efficiently on numerous parallel platforms. It provides sophisticated features for observing distributed executions. Key-words: Distributed memory parallel computers, virtual machine, communication library, observation, traces (R'esum'e : tsvp) guidec@irisa.fr maheo@irisa.fr Unite de recherche INRIA Rennes IRISA, Campus universitaire de Beaulieu, 35042 RENNES Cedex (France) Telephone : (33) 99 84 71 00 -- Telecopie : (33) 99 84 71 POM : une machine parall`ele virtuelle incorporant des m'ecanismes d'observation R'esum'e : Nous d'ecrivons dans cet article une machine parall`ele virtuelle observable, la POM. Celle-ci offre une interface homog`ene au dessus des syst`emes de communication des architectures parall`eles. Elle a 'et'e con¸cue en vue d'un portage ais'e et ...
Self-Organizing Hierarchical Cluster Timestamps
- EuroPar'01 Parallel Processing, volume LNCS 2150 of Lecture Notes in Computer Science
, 2001
"... Distributed-system observation tools require an efficient data structure to store and query the partial-order of execution. Such data structures typically use vector timestamps to efficiently answer precedence queries. Many current vector-timestamp algorithms either have a poor time/space complex ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Distributed-system observation tools require an efficient data structure to store and query the partial-order of execution. Such data structures typically use vector timestamps to efficiently answer precedence queries. Many current vector-timestamp algorithms either have a poor time/space complexity tradeoff or are static. This limits the scalability of such observation tools. One algorithm, centralized hierarchical cluster timestamps, has potentially a good time/space tradeoff provided that the clusters accurately capture communication locality.
Clustering Strategies for Cluster Timestamps
- Proceedings of the 2004 International Conference on Parallel Processing
, 2004
"... Distributed-system observation tools require an efficient data structure to store and query the partial-order of execution. Such data structures typically use vector timestamps to efficiently answer precedence queries. Many current vector-timestamp algorithms either have a poor time/space complexity ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Distributed-system observation tools require an efficient data structure to store and query the partial-order of execution. Such data structures typically use vector timestamps to efficiently answer precedence queries. Many current vector-timestamp algorithms either have a poor time/space complexity tradeoff or are static. This limits the scalability of such observation tools. The self-organizing hierarchical cluster timestamp, introduced by Ward and Taylor, potentially has a good time/space tradeoff provided that the clusters accurately capture communication locality. However, the problem of accurately capturing communication locality has not been adequately addressed. In particular, the only clustering algorithm for which results have been presented is the merge-onfirst -communication approach. That strategy has limited applicability, as it is very sensitive to the order of event processing and to the maximum cluster size permitted.
Issues in Scalable . . .
, 2001
"... Distributed-system management concerns the observation of a distributed computation and then using the information gained by that observation to control the computation. This necessitates collecting the information required to determine the partial order of execution, and then reasoning about that p ..."
Abstract
- Add to MetaCart
Distributed-system management concerns the observation of a distributed computation and then using the information gained by that observation to control the computation. This necessitates collecting the information required to determine the partial order of execution, and then reasoning about that partial order. This in turn requires a partial-order data structure and, if the reasoning is being performed by a human, a system for visualizing that partial order. Both creating such a data structure and visualizing it are hard problems. Current partial-order data structure techniques suffer various shortcomings. Potentially scalable mechanisms, such as Ore timestamps, are static. Dynamic algorithms, on the other hard, either require a significant search operation to answer basic questions, or they require a vector of size equal to the width of the partial order for each element stored in the order. Scalable visualization of a partial order is hard for the same reasons that drawing any large graph is hard. Any visualization that will be meaningful to a user requires appropriate abstractions on the data structure, while preserving the core meaning of the data structure. Such abstraction is difficult. This report formalizes these problems and identifies the specific difficulties that must be solved to enable scalable

