### Parallel Partial Order Reduction with Topological Sort Proviso

"... Abstract—Partial order reduction and distributed-memory processing are the two essential techniques to fight the wellknown state space explosion problem in explicit state model checking. Unfortunately, these two techniques have not been integrated yet to a satisfactory degree. While for verification ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract—Partial order reduction and distributed-memory processing are the two essential techniques to fight the wellknown state space explosion problem in explicit state model checking. Unfortunately, these two techniques have not been integrated yet to a satisfactory degree. While for verification of safety properties, there are a few rather successful approaches to parallel partial order reduction, for LTL model checking all suggested approaches are either too technically involved to be smoothly incorporated with the existing parallel algorithms, or they are simply weak in the sense that the achieved reduction in the size of the state space is minor. The main source of difficulties is the cycle proviso that requires one fully expanded state on every cycle in the reduced state space graph. This can be easily achieved in the sequential case by employing depthfirst search strategy for state space generation. Unfortunately, this strategy is incompatible with parallel (hence distributedmemory) processing, which limits application of partial order reduction technique to the sequential case. In this paper we suggest a new technique that guarantees correct construction of the reduced state space graph w.r.t. the cycle proviso. Our new technique is fully compatible with the parallel graph traversal procedure while at the same time it provides competitive reduction of the state space if compared to the serial case. The new technique has been implemented within the parallel and distributed-memory LTL model checker DIVINE and its performance is reported in this paper.

### Verifying Very Large Industrial Circuits Using 100 Processes and Beyond

"... Abstract. Recent advances in scheduling and networking have cleared the way for efficient exploitation of large-scale distributed computing platforms, such as computational grids and huge clusters. Such infrastructures hold great promise for the highly resource-demanding task of verifying and checki ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract. Recent advances in scheduling and networking have cleared the way for efficient exploitation of large-scale distributed computing platforms, such as computational grids and huge clusters. Such infrastructures hold great promise for the highly resource-demanding task of verifying and checking large models, given that model checkers would be designed with a high degree of scalability and flexibility in mind. In this paper we focus on the mechanisms required to execute a highperformance, distributed, symbolic model checker on top of a large-scale distributed environment. We develop a hybrid algorithm for slicing the state space and dynamically distribute the work among the worker processes. We show that the new approach is faster, more effective, and thus much more scalable than previous slicing algorithms. We then present a checkpoint-restart module that has very low overhead. This module can be used to combat failures which become probable with the size of the computing platform. However, checkpoint-restart is even more handy for the scheduling system: it can be used to avoid reserving large numbers of workers, thus making the distributed computation work-efficient. Finally, we discuss for the first time the effect of reorder on the distributed model checker and show how the distributed system performs more efficient reordering than the sequential one. We implemented our contributions on a network of 200 processors, using a distributed scalable scheme that employs a high-performance industrial model checker from Intel. Our results show that the system was able to verify real-life models much larger than was previously possible. 1

### AN ANTICIPATED FIRING SATURATION ALGORITHM FOR SHARED-MEMORY ARCHITECTURES

"... Parallelising symbolic state-space generation algorithms, such as Saturation, is known to be difficult as it often incurs high parallel overheads. To improve efficiency, related work on a distributed-memory implementation of Saturation proposed using idle processors for speculatively firing events a ..."

Abstract
- Add to MetaCart

Parallelising symbolic state-space generation algorithms, such as Saturation, is known to be difficult as it often incurs high parallel overheads. To improve efficiency, related work on a distributed-memory implementation of Saturation proposed using idle processors for speculatively firing events and caching the obtained results, in the hope that these results will be needed lateron. This paper investigates a variant of this anticipated firing approach for shared-memory architectures, such as multi-core PCs. Rather than parallelising Saturation, the idea is to run the sequential Saturation algorithm on one core, while the others are given speculative work. Since computing the optimal strategy for selecting useful work is likely to be an NP-complete problem, the paper devises and implements various heuristics. The obtained experimental results show that moderate speed-ups can be achieved as a result of using anticipated firing. However, the proposed heuristics require further work in order to be truly useful in practice. 1

### Caching, Hashing, and Garbage Collection for Distributed State Space Construction Abstract

"... The Saturation algorithm for symbolic state-space generation is a recent advance in exhaustive verification of complex systems, in particular globally-asynchronous/ locally-synchronous systems. The distributed version of Saturation uses the overall memory available on a network of workstations (NOW) ..."

Abstract
- Add to MetaCart

(Show Context)
The Saturation algorithm for symbolic state-space generation is a recent advance in exhaustive verification of complex systems, in particular globally-asynchronous/ locally-synchronous systems. The distributed version of Saturation uses the overall memory available on a network of workstations (NOW) to efficiently spread the memory load during its highly irregular exploration. A crucial factor in limiting the memory consumption in symbolic state-space generation is the ability to perform garbage collection to free up the memory occupied by dead nodes. However, garbage collection over a NOW requires a nontrivial communication overhead. In addition, operation cache policies become critical while analyzing large-scale systems using a symbolic approach. In this paper, we develop a garbage collection scheme and several operation cache policies to help the analysis of complex systems. Experiments show that our schemes improve the performance of the original distributed implementation, SmArTN ow, in terms of both time and memory efficiency.

### DISTRIBUTED SATURATION

"... The Saturation algorithm for symbolic state-space generation, has been a recent breakthrough in the exhaustive verification of complex systems, in particular globally-asynchronous/locally-synchronous systems. The algorithm uses a very compact Multiway Decision Diagram (MDD) encoding for states and t ..."

Abstract
- Add to MetaCart

(Show Context)
The Saturation algorithm for symbolic state-space generation, has been a recent breakthrough in the exhaustive verification of complex systems, in particular globally-asynchronous/locally-synchronous systems. The algorithm uses a very compact Multiway Decision Diagram (MDD) encoding for states and the fastest symbolic exploration algorithm to date. The distributed version of Saturation uses the overall memory available on a network of workstations (NOW) to efficiently spread the memory load during the highly irregular exploration. A crucial factor in limiting the memory consumption during the symbolic state-space generation is the ability to perform garbage collection to free up the memory occupied by dead nodes. However, garbage collection over a NOW requires a nontrivial communication overhead. In addition, operation cache policies become critical while analyzing large-scale systems using the symbolic approach. In this technical report, we develop a garbage collection scheme and several operation cache policies to help on solving extremely complex systems. Experiments show that our schemes improve the performance of the original distributed implementation, SmArTN ow, in terms of time and memory efficiency. 1

### Speculative Image Computation for Distributed Symbolic Reachability Analysis

- JOURNAL OF LOGIC AND COMPUTATION ADVANCE ACCESS
, 2009

"... The Saturation-style fixpoint iteration strategy for symbolic reachability analysis is particularly effective for globally asynchronous locally synchronous discrete-state systems. However, its inherently sequential nature makes it difficult to parallelize Saturation on a network workstations (NOW). ..."

Abstract
- Add to MetaCart

The Saturation-style fixpoint iteration strategy for symbolic reachability analysis is particularly effective for globally asynchronous locally synchronous discrete-state systems. However, its inherently sequential nature makes it difficult to parallelize Saturation on a network workstations (NOW). We then propose the idea of using idle workstation time to perform speculative image computations. Since an unrestrained prediction may make excessive use of computational resources, we introduce a history-based approach to dynamically recognize image computation (event firing) patterns and explore only firings that conform to these patterns. In addition, we employ an implicit encoding for the patterns, so that the actual image computation history can be efficiently preserved. Experiments not only show that image speculation works on a realistic model, but also indicate that the use of an implicit encoding together with two heuristics results in a better informed speculation.

### DOI 10.1007/s10009-008-0094-x SPECIAL SECTION ON SPIN Parallel and distributed model checking in Eddy

, 2008

"... Abstract Model checking of safety properties can be scaled up by pooling the CPU and memory resources of multiple computers. As compute clusters containing 100s of nodes, with each node realized using multi-core (e.g., 2) CPUs will be widespread, a model checker based on the parallel (shared memory) ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract Model checking of safety properties can be scaled up by pooling the CPU and memory resources of multiple computers. As compute clusters containing 100s of nodes, with each node realized using multi-core (e.g., 2) CPUs will be widespread, a model checker based on the parallel (shared memory) and distributed (message passing) paradigms will more efficiently use the hardware resources. Such a model checker can be designed by having each node employ two shared memory threads that run on the (typically) two CPUs of a node, with one thread responsible for state generation, and the other for efficient communication, including (1) performing overlapped asynchronous message passing, and (2) aggregating the states to be sent into larger chunks in order to improve communication network utilization. We present the design details of such a novel model checking architecture called Eddy. We describe the design rationale, details of how the threads interact and yield control, exchange messages, as well as detect termination. We have realized an instance of this architecture for the Murphi modeling language. Called Eddy_Murphi, we report its performance over the number of nodes as well as communication parameters such as those controlling state aggregation. Nearly linear reduction of compute time with increasing number of nodes is observed. Our thread task partition is done in such a way that it is modular, easy to port across different modeling languages, and easy to tune across a variety of platforms.

### Software Tools for Technology Transfer manuscript No. (will be inserted by the editor) Parallel and Distributed Model Checking in Eddy ⋆

"... Abstract. Model checking of safety properties can be scaled up by pooling the CPU and memory resources of multiple computers. As compute clusters containing 100s of nodes, with each node realized using multi-core (e.g., 2) CPUs will be widespread, a model checker based on the parallel (shared memory ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract. Model checking of safety properties can be scaled up by pooling the CPU and memory resources of multiple computers. As compute clusters containing 100s of nodes, with each node realized using multi-core (e.g., 2) CPUs will be widespread, a model checker based on the parallel (shared memory) and distributed (message passing) paradigms will more efficiently use the hard-ware resources. Such a model checker can be designed by having each node employ two shared memory threads that run on the (typically) two CPUs of a node, with one thread responsible for state generation, and the other for efficient communication, including (i) performing over-lapped asynchronous message passing, and (ii) aggregat-ing the states to be sent into larger chunks in order to improve communication network utilization. We present the design details of such a novel model checking archi-tecture called Eddy. We describe the design rationale, details of how the threads interact and yield control, ex-⋆ Supported in part by NSF award CNS-0509379 and SRC Con-tract 2005-TJ-1318 change messages, as well as detect termination. We have realized an instance of this architecture for the Murphi modeling language. Called Eddy Murphi, we report its performance over the number of nodes as well as commu-nication parameters such as those controlling state ag-gregation. Nearly linear reduction of compute time with increasing number of nodes is observed. Our thread task partition is done in such a way that it is modular, easy to port across different modeling languages, and easy to tune across a variety of platforms. 1

### This work is licensed under the Creative Commons Attribution License. Parallelizing Deadlock Resolution in Symbolic Synthesis of Distributed Programs∗

"... Previous work has shown that there are two major complexity barriers in the synthesis of fault-tolerant distributed programs: (1) generation of fault-span, the set of states reachable in the presence of faults, and (2) resolving deadlock states, from where the program has no outgoing transitions. Of ..."

Abstract
- Add to MetaCart

(Show Context)
Previous work has shown that there are two major complexity barriers in the synthesis of fault-tolerant distributed programs: (1) generation of fault-span, the set of states reachable in the presence of faults, and (2) resolving deadlock states, from where the program has no outgoing transitions. Of these, the former closely resembles with model checking and, hence, techniques for efficient verification are directly applicable to it. Hence, we focus on expediting the latter with the use of multi-core technology. We present two approaches for parallelization by considering different design choices. The first approach is based on the computation of equivalence classes of program transitions (called group computation) that are needed due to the issue of distribution (i.e., inability of processes to atomically read and write all program variables). We show that in most cases the speedup of this approach is close to the ideal speedup and in some cases it is superlinear. The second approach uses traditional technique of partitioning deadlock states among multiple threads. However, our experiments show that the speedup for this approach is small. Consequently, our analysis demonstrates that a simple approach of parallelizing the group computation is likely to be the effective method for using multi-core computing in the context of deadlock resolution.