## Parallel Logic Simulation of Digital Circuits (1998)

Citations: | 2 - 0 self |

### BibTeX

@TECHREPORT{Kim98parallellogic,

author = {Hong Kyu Kim},

title = {Parallel Logic Simulation of Digital Circuits},

institution = {},

year = {1998}

}

### OpenURL

### Abstract

Parallel discrete event simulation (PDES) is efficient in simulating a large digital circuit. In this dissertation, two techniques are proposed to improve the performance of PDES in logic simulation. One is a partitioning algorithm and the other is a hybrid parallel simulation protocol. Experiments were performed to demonstrate that the two proposed techniques together provide significant reduction in parallel simulation time. Unlike most other partitioning algorithms, the proposed partitioning algorithm preserves circuit concurrency by assigning circuit gates that can be evaluated at about the same time to different processors. As a result, the concurrency preserving partitioning (CPP) algorithm can provide instantaneous load balancing, instead of only aggregated load balancing, throughout the period of a parallel simulation. This is especially important when the algorithm is used together with a Time Warp simulation where a high degree of concurrency can lead to fewer rollbacks and better performance. In addition, a new concurrency metric is proposed to evaluate partitioning algorithms before the execution of parallel simulations. Even though PDES can reduce the logic simulation time for large circuits considerably, it generates more events than necessary for certain high activity circuits and produces inconsistent speedup over different circuits. The proposed Event Lookahead Time Warp (ETW) algorithm can look ahead and combine and execute multiple events at each gate optimistically so that the probability of unnecessary events can be reduced. As a result, it can reduce rollback cost, obtain better load balance, and achieve more consistent execution times and reasonable speedups.

### Citations

1109 |
1970] An ecient heuristic procedure for partitioning graphs
- Kernighan, Lin
(Show Context)
Citation Context ...he amount of redundant computation. Speedups of 3 to 4 on an 8 node SparcServe 1000, a shared-memory multiprocessor, were achieved with large ISCAS circuits. Iterative Partitioning Min-cut algorithms =-=[52, 28]-=- were proposed to reduce the edge-cut of a graph (or a circuit). These graph-based heuristic algorithms divide a circuit into two pieces so that communication is minimized between two sub-circuits. Th... |

892 | Virtual time
- Jefferson
- 1985
(Show Context)
Citation Context ...egories: conservativesand optimistic. The Chandy-Misra's algorithm [22] is a well-known conservative algorithm which strictly avoids the violation of the causality constraint. The Time Warp algorithm =-=[35, 43]-=- is a typical optimistic algorithm which optimistically processes events at first, detects the causality violation afterward, and then recovers from the causality error using a rollback mechanism. In ... |

722 |
Parallel discrete event Simulation
- Fujimoto
- 1990
(Show Context)
Citation Context ...egories: conservativesand optimistic. The Chandy-Misra's algorithm [22] is a well-known conservative algorithm which strictly avoids the violation of the causality constraint. The Time Warp algorithm =-=[35, 43]-=- is a typical optimistic algorithm which optimistically processes events at first, detects the causality violation afterward, and then recovers from the causality error using a rollback mechanism. In ... |

386 |
Self-adjusting binary search trees
- Sleator, Tarjan
- 1985
(Show Context)
Citation Context ...smallest-timestamp-first method can reduce rollbacks to a reasonable level. The most commonly used data structure to support the method is either a priority queue based on a splay tree data structure =-=[48, 100]-=- or the Calendar queue [13]. (More detail in performance comparison can be found in [85].) In either case, the events for different logical processes (gates/flip-flops) are mixed together as long as t... |

333 |
Combinational profiles of sequential benchmark circuits
- Brglez, Bryan, et al.
- 1989
(Show Context)
Citation Context ....1 20717 38 304 1426 Circuit Systems To get reasonable test results for performance analysis, we use several kinds of benchmark circuits: ISCAS'85 benchmark circuits [11], ISCAS'89 benchmark circuits =-=[12]-=-, n-bit accumulators with flip-flops, and n-bit array multipliers. They have different characteristics as test circuits. ISCAS'85 circuits are combinational circuits, while ISCAS'89 circuits are seque... |

246 |
Asynchronous distributed simulation via a sequence of parallel computations
- Chandy, Misra
- 1981
(Show Context)
Citation Context ...rocess executes arriving events according to a specific scheduling policy. Asynchronous PDES mechanism can be classified into two categories: conservativesand optimistic. The Chandy-Misra's algorithm =-=[22]-=- is a well-known conservative algorithm which strictly avoids the violation of the causality constraint. The Time Warp algorithm [35, 43] is a typical optimistic algorithm which optimistically process... |

181 |
An Introduction to Stochastic Modeling
- Taylor, Karlin
- 1994
(Show Context)
Citation Context ...ssertation. The workload of simulations can be characterized as number of events. This estimate is normally obtained from real sequential simulation [20, 102] or modeled by using probability analysis =-=[1, 18, 32, 75, 101]-=-. It is difficult to obtain accurate apriori estimates of the computational loads of processes and the communication frequency on various arcs of a circuit graph. Due to this difficulty, two kinds of ... |

169 |
Calendar Queues: A Fast O(1) Priority Queue Implementation of the Simulation Event Set Problem
- Brown
- 1988
(Show Context)
Citation Context ...can reduce rollbacks to a reasonable level. The most commonly used data structure to support the method is either a priority queue based on a splay tree data structure [48, 100] or the Calendar queue =-=[13]-=-. (More detail in performance comparison can be found in [85].) In either case, the events for different logical processes (gates/flip-flops) are mixed together as long as those processes are assigned... |

162 | Transition Density: A New Measure of Activity in Digital Circuits - Najm - 1992 |

138 | Efficient algorithms for distributed snapshots and global virtual time approximation
- Mattern
- 1993
(Show Context)
Citation Context ... synchronization messages. Since the global synchronization-based GVT computation caused the degradation of performance, distributed GVT computation algorithms were proposed to get better performance =-=[30, 70, 71, 81]-=-. Our GVT computation algorithm is a modification of the token passing algorithm proposed by Preiss [81]. In our token ring passing method, each processor does not have the exact GVT value, but has it... |

125 |
Multiple-way Network Partitioning
- Sanchis
- 1989
(Show Context)
Citation Context ...ioning, either a vertex exchange scheme or a one-way vertex moving scheme is used to iteratively reduce the interprocessor communication subject to some constraints on processor workload [28, 52]. In =-=[87]-=-, Sanchis applied iterative improvements to multiple-way partitioning with a more frequent information updating after each vertex exchange or moving and obtained better results. Nandy and Loucks appli... |

106 |
Time warp on a shared memory multiprocessor
- Fujimoto
- 1989
(Show Context)
Citation Context ...ost of deadlock recovery. However, it is difficult to implement. Several other conservative simulation approaches have been developed in [25, 46, 82, 95]. Optimistic Simulation In Time Warp algorithm =-=[43, 34]-=-, the most popular optimistic paradigm, a process may execute an event as soon as the event arrives without considering possible data dependencies. It is possible to recover from an error afterward wh... |

87 |
An empirical comparison of priority-queues and event-set implementations
- Jones
- 1986
(Show Context)
Citation Context ...smallest-timestamp-first method can reduce rollbacks to a reasonable level. The most commonly used data structure to support the method is either a priority queue based on a splay tree data structure =-=[48, 100]-=- or the Calendar queue [13]. (More detail in performance comparison can be found in [85].) In either case, the events for different logical processes (gates/flip-flops) are mixed together as long as t... |

87 |
Breathing Time Warp, in
- Steinman
- 1993
(Show Context)
Citation Context ...niques are proposed by researchers who believe that the TW algorithm with low overhead will be the best: adaptive process scheduling [79], bounding window [19, 91, 103], synchronization granularities =-=[98]-=-, memory management [2, 24, 39, 44, 66, 67], and partitioning and load balancing [19, 57, 60, 69, 96]. In the Time Warp algorithm, logical processes may advance too far ahead and produce a significant... |

80 | Parallel Simulation Today - Nicol, Fujimoto - 1994 |

65 |
Global virtual time algorithms
- Bellenot
- 1990
(Show Context)
Citation Context ... Passing to obtain good performance of parallel simulations, efficient and fast GVT algorithms are needed in parallel and distributed environments. Several GVT computation algorithms were proposed in =-=[10, 65, 88]-=- for garbage collection, but they incur significant message traffic. In [88], a central GVT manager is employed to send out a GVT-start message to all processors and receive acknowledge messages from ... |

63 | A message passing standard for MPP and workstations - Dongarra, Otto, et al. - 1996 |

60 |
Performance evaluation of the bounded time warp algorithm
- Turner, Xu
- 1992
(Show Context)
Citation Context ...order to obtain better performance, many techniques are proposed by researchers who believe that the TW algorithm with low overhead will be the best: adaptive process scheduling [79], bounding window =-=[19, 91, 103]-=-, synchronization granularities [98], memory management [2, 24, 39, 44, 66, 67], and partitioning and load balancing [19, 57, 60, 69, 96]. In the Time Warp algorithm, logical processes may advance too... |

57 | STATA user’s guide
- Corporation
- 2008
(Show Context)
Citation Context ... The Paragon XP/S supercomputer uses the i860XP 50MHz microprocessor, which includes a RISC integer core processing unit and three separated on-chip caches for page translation, data and instructions =-=[41]-=-. The Air Force Intel paragon has 48 processors with 32Mbytes local memory and peak performance of 266 MFLOPS each. Each node has two identical 50 MHz Intel i-860XP processors and 32 Mbytes of memory.... |

54 |
Determining the global virtual time in a distributed simulation
- Lin, Lazowska
- 1989
(Show Context)
Citation Context ... Passing to obtain good performance of parallel simulations, efficient and fast GVT algorithms are needed in parallel and distributed environments. Several GVT computation algorithms were proposed in =-=[10, 65, 88]-=- for garbage collection, but they incur significant message traffic. In [88], a central GVT manager is employed to send out a GVT-start message to all processors and receive acknowledge messages from ... |

53 |
The Yaddes distributed discrete event simulation specification language and execution environments
- Preiss
- 1989
(Show Context)
Citation Context ... synchronization messages. Since the global synchronization-based GVT computation caused the degradation of performance, distributed GVT computation algorithms were proposed to get better performance =-=[30, 70, 71, 81]-=-. Our GVT computation algorithm is a modification of the token passing algorithm proposed by Preiss [81]. In our token ring passing method, each processor does not have the exact GVT value, but has it... |

53 |
SPEEDES: Synchronous Parallel Environment for Emulation and Discrete Event Simulation
- Steinman
- 1991
(Show Context)
Citation Context ...the overhead, hybrid techniques are used to combine these two types of parallel simulation algorithms. A variation of Time Warp, which runs in a synchronous parallel environment, has been proposed in =-=[97, 98]-=-. In a parallel simulation algorithm, called ADAPT [46], logical processes may choose dynamically to use either a conservative approach or an optimistic approach. The main limitation is the mode needs... |

45 |
Distributed Simulation, Algorithms and Performance Analysis
- Samadi
- 1985
(Show Context)
Citation Context ... Passing to obtain good performance of parallel simulations, efficient and fast GVT algorithms are needed in parallel and distributed environments. Several GVT computation algorithms were proposed in =-=[10, 65, 88]-=- for garbage collection, but they incur significant message traffic. In [88], a central GVT manager is employed to send out a GVT-start message to all processors and receive acknowledge messages from ... |

44 |
An Adaptive Memory Management Protocol for Time Warp Parallel Simulation
- Das, Fujimoto
- 1994
(Show Context)
Citation Context ...esearchers who believe that the TW algorithm with low overhead will be the best: adaptive process scheduling [79], bounding window [19, 91, 103], synchronization granularities [98], memory management =-=[2, 24, 39, 44, 66, 67]-=-, and partitioning and load balancing [19, 57, 60, 69, 96]. In the Time Warp algorithm, logical processes may advance too far ahead and produce a significant rollback cost. Limiting optimism may be a ... |

43 | Probabilistic adaptive direct optimism control in time warp - Ferscha - 1995 |

42 |
Mattheyses, "A linear-time heuristic for improving network partitions
- Fiduccia, M
- 1982
(Show Context)
Citation Context ...he amount of redundant computation. Speedups of 3 to 4 on an 8 node SparcServe 1000, a shared-memory multiprocessor, were achieved with large ISCAS circuits. Iterative Partitioning Min-cut algorithms =-=[52, 28]-=- were proposed to reduce the edge-cut of a graph (or a circuit). These graph-based heuristic algorithms divide a circuit into two pieces so that communication is minimized between two sub-circuits. Th... |

37 | Optimal memory management for Time Warp parallel simulation
- Lin, Preiss
- 1991
(Show Context)
Citation Context ...esearchers who believe that the TW algorithm with low overhead will be the best: adaptive process scheduling [79], bounding window [19, 91, 103], synchronization granularities [98], memory management =-=[2, 24, 39, 44, 66, 67]-=-, and partitioning and load balancing [19, 57, 60, 69, 96]. In the Time Warp algorithm, logical processes may advance too far ahead and produce a significant rollback cost. Limiting optimism may be a ... |

36 | A comparative study of parallel and sequential priority queue algorithms
- Rönngren, Ayani
- 1997
(Show Context)
Citation Context ... used data structure to support the method is either a priority queue based on a splay tree data structure [48, 100] or the Calendar queue [13]. (More detail in performance comparison can be found in =-=[85]-=-.) In either case, the events for different logical processes (gates/flip-flops) are mixed together as long as those processes are assigned to the same processor. Note that for a typical simulation, a... |

31 | Clustered Time Warp and logic simulation - Avril, Tropper - 1995 |

30 | R.D.: Parallel logic simulation of VLSI systems
- Bailey, Briner, et al.
- 1994
(Show Context)
Citation Context ...he global graph structure is important for large graphs. However, their work only considers minimizing the edge-cut of a graph. 21 2.4 PDES Performance Parallelization of sequential logic simulations =-=[9]-=- has been performed using techniques ranging from synchronous parallel simulation [74] to asynchronous optimistic simulation [3, 7, 18, 23]. Basically, synchronous simulations are performed by synchro... |

30 |
An analysls of several approaches to circuit partitioning for parallel logic simulation
- SMITH, UNDERWOOD, et al.
- 1987
(Show Context)
Citation Context ... compromise among three competing goals: interprocessor communication, load balancing, and concurrency. The only partitioning algorithm that considers concurrency is the string partitioning algorithm =-=[89]-=-. However, because it does not consider factors other than concurrency, it does not produce good performance [89]. Different parallel simulation protocola lead to major performance differences. For ex... |

28 |
High performance parallel logic simulation on a network of workstations
- Manjikian, Loucks
- 1993
(Show Context)
Citation Context ...rcs of a circuit graph. Due to this difficulty, two kinds of real sequential simulation runs are used, one using estimates from a "pre-simulation" and the other using estimates from a full-s=-=imulation [69]-=-. The use of pre-simulation information to characterize gate activity was evaluated and presented in [29, 20]. The following partitioning algorithms are commonly used in parallel logic simulation envi... |

27 |
A Static Partitioning and Mapping Algorithm for Conservative Parallel Simulations
- Boukerche, Tropper
- 1994
(Show Context)
Citation Context ...function is required to measure multiple goals for a high quality of partitioning. The simulated annealing method was previously used with pre-simulation information [20], an adaptive search schedule =-=[14]-=-, and an accurate cost function considering null messages for conservative 20 parallel simulation [42, 49]. Even though simulated annealing methods [29, 40, 42, 105] produced good partitions, the runn... |

27 |
Virtual Time II: The Cancelback Protocol for Storage Management in Time Warp
- Jefferson
- 1990
(Show Context)
Citation Context ...esearchers who believe that the TW algorithm with low overhead will be the best: adaptive process scheduling [79], bounding window [19, 91, 103], synchronization granularities [98], memory management =-=[2, 24, 39, 44, 66, 67]-=-, and partitioning and load balancing [19, 57, 60, 69, 96]. In the Time Warp algorithm, logical processes may advance too far ahead and produce a significant rollback cost. Limiting optimism may be a ... |

26 |
Parallel Discrete Event Simulation Using Shared Memory
- Malony, McCredie
- 1988
(Show Context)
Citation Context ...roposed in order to eliminate the use of null messages at the cost of deadlock recovery. However, it is difficult to implement. Several other conservative simulation approaches have been developed in =-=[25, 46, 82, 95]-=-. Optimistic Simulation In Time Warp algorithm [43, 34], the most popular optimistic paradigm, a process may execute an event as soon as the event arrives without considering possible data dependencie... |

25 |
VLSI design techniques for analog and digital circuits, McGraw-Hill
- Geiger, Allen, et al.
- 1990
(Show Context)
Citation Context ... their simulation becomes more time-consuming. Circuit simulations can be classified into four groups: analog simulation, switch-level simulation, gate-level simulation, and function-level simulation =-=[18, 37]-=-. Only gate-level simulation is considered in this work where a circuit contains a set of logic gates such as NOT, AND, OR, NAND, NOR gates, and flip-flops. Gate-level logic simulation is a primary to... |

24 |
Corolla partitioning for distributed logic simulation of VLSI-circmts
- C, BAUER
- 1993
(Show Context)
Citation Context ...e used in several partitioning algorithms. Recently several heuristic algorithms using clustering techniques have been proposed and their good performance for parallel logic simulation is reported in =-=[96, 69]-=-. Sporrer and Bauer [96] present a hierarchical partitioning approach which consists of a fine grained clustering phase and a coarse grained phase using a connectivity matrix. Considerable computation... |

23 | Lecsim: A levelized event driven compiled logic simulator - Wang, Maurer - 1990 |

22 |
Parallelism analyzers for parallel discrete event simulation
- Lin
- 1992
(Show Context)
Citation Context |

21 |
The effect of memory capacity on Time Warp performance
- Akylidiz, Chen, et al.
- 1993
(Show Context)
Citation Context |

21 | The Dynamic Load Balancing of Clustered Time Warp for Logic Simulation - Avril, Tropper |

20 | A general method for compiling event-driven simulations
- French, Lam, et al.
- 1995
(Show Context)
Citation Context ...ut vector. In sequential compiled-code simulation, all gates are levelized statically to satisfy the requirement. Previously hybrid techniques combining the compiled-code and event-driven simulations =-=[33, 60, 63, 73, 74, 104]-=- were considered for sequential or parallel logic simulations so to reduce the multiple evaluation problem of event-driven simulations. However, even though the sequential simulation produces good per... |

20 |
An evaluation of the ChandyMisra-Bryant algorithm for digital logic simulation
- Soule, Gupta
- 1992
(Show Context)
Citation Context ...roposed in order to eliminate the use of null messages at the cost of deadlock recovery. However, it is difficult to implement. Several other conservative simulation approaches have been developed in =-=[25, 46, 82, 95]-=-. Optimistic Simulation In Time Warp algorithm [43, 34], the most popular optimistic paradigm, a process may execute an event as soon as the event arrives without considering possible data dependencie... |

18 |
Scheduling DAG’s for asynchronous multiprocessor execution
- Malloy, Lloyd, et al.
- 1994
(Show Context)
Citation Context ...circuit is partitioned into disjointed subcircuits without overlapping. (2) The assignment of each node is based on parent nodes. This is different from other schemes that are based on children nodes =-=[68, 89]-=-. The advantage is in the higher concurrency that can be preserved in this scheme. (3) A node is assigned only after all its parent nodes are assigned and the greedy assignment is adopted. In some pre... |

17 |
A hierarchical compiled code event-driven logic simulator
- Lewis
- 1991
(Show Context)
Citation Context ...ut vector. In sequential compiled-code simulation, all gates are levelized statically to satisfy the requirement. Previously hybrid techniques combining the compiled-code and event-driven simulations =-=[33, 60, 63, 73, 74, 104]-=- were considered for sequential or parallel logic simulations so to reduce the multiple evaluation problem of event-driven simulations. However, even though the sequential simulation produces good per... |

17 | Discrete-Event Simulation and the Event Horizon Part 2 - Steinman - 1996 |

16 |
Performance analysis of synchronized iterative algorithm on multiprocessor systems
- AGRAWAL, CHAKRAADHAR
- 1992
(Show Context)
Citation Context ...ssertation. The workload of simulations can be characterized as number of events. This estimate is normally obtained from real sequential simulation [20, 102] or modeled by using probability analysis =-=[1, 18, 32, 75, 101]-=-. It is difficult to obtain accurate apriori estimates of the computational loads of processes and the communication frequency on various arcs of a circuit graph. Due to this difficulty, two kinds of ... |

15 |
An improved cost function for static partitioning of parallel circuit simulations using a conservative synchronization protocol
- Kapp, Hartrum, et al.
- 1995
(Show Context)
Citation Context ...ot considered. In order to reduce this problem, similar algorithms based on depth-first search (DFS) or breadth-first search (BFS) techniques have been proposed to reduce the amount of communications =-=[42, 49]-=-. Cone Partitioning Smith et al [89] introduced a cone partitioning technique to improve the communication overhead. Under their approach, a circuit is partitioned into subcircuits such that both the ... |

15 | Automated Parallelization of Timed Petri-Net Simulations - Nicol, Mao - 1995 |

14 | Parallel logic simulation of VLSI systems
- Chamberlain
- 1995
(Show Context)
Citation Context ... event driven simulation is better for low-activity circuits. Most parallel logic simulations using discrete event-driven techniques do not consistently perform well over different circuits simulated =-=[9, 21]-=-. For some circuits with high activity, the PDES can perform poorly compared to other simulations because of multiple evaluations at each gate for the same input vector. In this chapter, an optimistic... |