## Parallel Logic Simulation of Digital Circuits (1998)

Citations: | 2 - 0 self |

### BibTeX

@TECHREPORT{Kim98parallellogic,

author = {Hong Kyu Kim},

title = {Parallel Logic Simulation of Digital Circuits},

institution = {},

year = {1998}

}

### OpenURL

### Abstract

Parallel discrete event simulation (PDES) is efficient in simulating a large digital circuit. In this dissertation, two techniques are proposed to improve the performance of PDES in logic simulation. One is a partitioning algorithm and the other is a hybrid parallel simulation protocol. Experiments were performed to demonstrate that the two proposed techniques together provide significant reduction in parallel simulation time. Unlike most other partitioning algorithms, the proposed partitioning algorithm preserves circuit concurrency by assigning circuit gates that can be evaluated at about the same time to different processors. As a result, the concurrency preserving partitioning (CPP) algorithm can provide instantaneous load balancing, instead of only aggregated load balancing, throughout the period of a parallel simulation. This is especially important when the algorithm is used together with a Time Warp simulation where a high degree of concurrency can lead to fewer rollbacks and better performance. In addition, a new concurrency metric is proposed to evaluate partitioning algorithms before the execution of parallel simulations. Even though PDES can reduce the logic simulation time for large circuits considerably, it generates more events than necessary for certain high activity circuits and produces inconsistent speedup over different circuits. The proposed Event Lookahead Time Warp (ETW) algorithm can look ahead and combine and execute multiple events at each gate optimistically so that the probability of unnecessary events can be reduced. As a result, it can reduce rollback cost, obtain better load balance, and achieve more consistent execution times and reasonable speedups.

### Citations

1120 |
An Efficient Heuristic Procedure for Partitioning Graphs
- Kernighan, Lin
- 1970
(Show Context)
Citation Context ...he amount of redundant computation. Speedups of 3 to 4 on an 8 node SparcServe 1000, a shared-memory multiprocessor, were achieved with large ISCAS circuits. Iterative Partitioning Min-cut algorithms =-=[52, 28]-=- were proposed to reduce the edge-cut of a graph (or a circuit). These graph-based heuristic algorithms divide a circuit into two pieces so that communication is minimized between two sub-circuits. Th... |

899 | Virtual Time
- Jefferson
- 1985
(Show Context)
Citation Context ...egories: conservativesand optimistic. The Chandy-Misra's algorithm [22] is a well-known conservative algorithm which strictly avoids the violation of the causality constraint. The Time Warp algorithm =-=[35, 43]-=- is a typical optimistic algorithm which optimistically processes events at first, detects the causality violation afterward, and then recovers from the causality error using a rollback mechanism. In ... |

729 |
Parallel discrete event simulation
- FUJIMOTO
- 1990
(Show Context)
Citation Context ...egories: conservativesand optimistic. The Chandy-Misra's algorithm [22] is a well-known conservative algorithm which strictly avoids the violation of the causality constraint. The Time Warp algorithm =-=[35, 43]-=- is a typical optimistic algorithm which optimistically processes events at first, detects the causality violation afterward, and then recovers from the causality error using a rollback mechanism. In ... |

338 |
Combinational profiles of sequential benchmark circuits
- Brglez, Bryan, et al.
- 1989
(Show Context)
Citation Context ....1 20717 38 304 1426 Circuit Systems To get reasonable test results for performance analysis, we use several kinds of benchmark circuits: ISCAS'85 benchmark circuits [11], ISCAS'89 benchmark circuits =-=[12]-=-, n-bit accumulators with flip-flops, and n-bit array multipliers. They have different characteristics as test circuits. ISCAS'85 circuits are combinational circuits, while ISCAS'89 circuits are seque... |

250 |
Asynchronous Distributed Simulation via a Sequence of Parallel Computations
- Chandy, Misra
- 1981
(Show Context)
Citation Context ...rocess executes arriving events according to a specific scheduling policy. Asynchronous PDES mechanism can be classified into two categories: conservativesand optimistic. The Chandy-Misra's algorithm =-=[22]-=- is a well-known conservative algorithm which strictly avoids the violation of the causality constraint. The Time Warp algorithm [35, 43] is a typical optimistic algorithm which optimistically process... |

171 |
Calendar Queues: A Fast O(1) Priority Queue Implementation for the Simulation Event Set Problem
- Brown
- 1988
(Show Context)
Citation Context ...can reduce rollbacks to a reasonable level. The most commonly used data structure to support the method is either a priority queue based on a splay tree data structure [48, 100] or the Calendar queue =-=[13]-=-. (More detail in performance comparison can be found in [85].) In either case, the events for different logical processes (gates/flip-flops) are mixed together as long as those processes are assigned... |

106 |
Time warp on a shared memory multiprocessor
- Fujimoto
- 1989
(Show Context)
Citation Context ...ost of deadlock recovery. However, it is difficult to implement. Several other conservative simulation approaches have been developed in [25, 46, 82, 95]. Optimistic Simulation In Time Warp algorithm =-=[43, 34]-=-, the most popular optimistic paradigm, a process may execute an event as soon as the event arrives without considering possible data dependencies. It is possible to recover from an error afterward wh... |

87 |
An empirical comparison of priority-queues and event-set implementations
- Jones
- 1986
(Show Context)
Citation Context ...smallest-timestamp-first method can reduce rollbacks to a reasonable level. The most commonly used data structure to support the method is either a priority queue based on a splay tree data structure =-=[48, 100]-=- or the Calendar queue [13]. (More detail in performance comparison can be found in [85].) In either case, the events for different logical processes (gates/flip-flops) are mixed together as long as t... |

68 |
Global Virtual Time Algorithms
- Bellenot
- 1990
(Show Context)
Citation Context ... Passing to obtain good performance of parallel simulations, efficient and fast GVT algorithms are needed in parallel and distributed environments. Several GVT computation algorithms were proposed in =-=[10, 65, 88]-=- for garbage collection, but they incur significant message traffic. In [88], a central GVT manager is employed to send out a GVT-start message to all processors and receive acknowledge messages from ... |

64 | A message passing standard for MPP and workstations - Dongarra, Otto, et al. - 1996 |

59 | SLAA4SYSTU userâ€™s guide
- Corporation
- 1992
(Show Context)
Citation Context ... The Paragon XP/S supercomputer uses the i860XP 50MHz microprocessor, which includes a RISC integer core processing unit and three separated on-chip caches for page translation, data and instructions =-=[41]-=-. The Air Force Intel paragon has 48 processors with 32Mbytes local memory and peak performance of 266 MFLOPS each. Each node has two identical 50 MHz Intel i-860XP processors and 32 Mbytes of memory.... |

44 |
An Adaptive Memory Management Protocol for Time Warp Parallel Simulation
- Das, Fujimoto
- 1994
(Show Context)
Citation Context ...esearchers who believe that the TW algorithm with low overhead will be the best: adaptive process scheduling [79], bounding window [19, 91, 103], synchronization granularities [98], memory management =-=[2, 24, 39, 44, 66, 67]-=-, and partitioning and load balancing [19, 57, 60, 69, 96]. In the Time Warp algorithm, logical processes may advance too far ahead and produce a significant rollback cost. Limiting optimism may be a ... |

43 | Probabilistic adaptive direct optimism control in time warp - Ferscha - 1995 |

42 |
Mattheyses, "A linear-time heuristic for improving network partitions
- Fiduccia, M
- 1982
(Show Context)
Citation Context ...he amount of redundant computation. Speedups of 3 to 4 on an 8 node SparcServe 1000, a shared-memory multiprocessor, were achieved with large ISCAS circuits. Iterative Partitioning Min-cut algorithms =-=[52, 28]-=- were proposed to reduce the edge-cut of a graph (or a circuit). These graph-based heuristic algorithms divide a circuit into two pieces so that communication is minimized between two sub-circuits. Th... |

32 | Clustered Time Warp and logical simulation - Avril, Tropper - 1995 |

30 | R.D.: Parallel logic simulation of VLSI systems
- Bailey, Briner, et al.
- 1994
(Show Context)
Citation Context ...he global graph structure is important for large graphs. However, their work only considers minimizing the edge-cut of a graph. 21 2.4 PDES Performance Parallelization of sequential logic simulations =-=[9]-=- has been performed using techniques ranging from synchronous parallel simulation [74] to asynchronous optimistic simulation [3, 7, 18, 23]. Basically, synchronous simulations are performed by synchro... |

27 |
A Static Partitioning and Mapping Algorithm for Conservative Parallel Simulations
- Boukerche, Tropper
- 1994
(Show Context)
Citation Context ...function is required to measure multiple goals for a high quality of partitioning. The simulated annealing method was previously used with pre-simulation information [20], an adaptive search schedule =-=[14]-=-, and an accurate cost function considering null messages for conservative 20 parallel simulation [42, 49]. Even though simulated annealing methods [29, 40, 42, 105] produced good partitions, the runn... |

27 |
Virtual Time II: The Cancelback Protocol for Storage Management in Time Warp
- Jefferson
- 1990
(Show Context)
Citation Context ...esearchers who believe that the TW algorithm with low overhead will be the best: adaptive process scheduling [79], bounding window [19, 91, 103], synchronization granularities [98], memory management =-=[2, 24, 39, 44, 66, 67]-=-, and partitioning and load balancing [19, 57, 60, 69, 96]. In the Time Warp algorithm, logical processes may advance too far ahead and produce a significant rollback cost. Limiting optimism may be a ... |

25 |
VLSI design techniques for analog and digital circuits, McGraw-Hill
- Geiger, Allen, et al.
- 1990
(Show Context)
Citation Context ... their simulation becomes more time-consuming. Circuit simulations can be classified into four groups: analog simulation, switch-level simulation, gate-level simulation, and function-level simulation =-=[18, 37]-=-. Only gate-level simulation is considered in this work where a circuit contains a set of logic gates such as NOT, AND, OR, NAND, NOR gates, and flip-flops. Gate-level logic simulation is a primary to... |

21 |
The effect of memory capacity on Time Warp performance
- Akylidiz, Chen, et al.
- 1993
(Show Context)
Citation Context ...esearchers who believe that the TW algorithm with low overhead will be the best: adaptive process scheduling [79], bounding window [19, 91, 103], synchronization granularities [98], memory management =-=[2, 24, 39, 44, 66, 67]-=-, and partitioning and load balancing [19, 57, 60, 69, 96]. In the Time Warp algorithm, logical processes may advance too far ahead and produce a significant rollback cost. Limiting optimism may be a ... |

21 | The Dynamic Load Balancing of Clustered Time Warp for Logic Simulation - Avril, Tropper |

20 | A general method for compiling event-driven simulations
- French, Lam, et al.
- 1995
(Show Context)
Citation Context ...ut vector. In sequential compiled-code simulation, all gates are levelized statically to satisfy the requirement. Previously hybrid techniques combining the compiled-code and event-driven simulations =-=[33, 60, 63, 73, 74, 104]-=- were considered for sequential or parallel logic simulations so to reduce the multiple evaluation problem of event-driven simulations. However, even though the sequential simulation produces good per... |

16 |
Performance analysis of synchronized iterative algorithm on multiprocessor systems
- AGRAWAL, CHAKRAADHAR
- 1992
(Show Context)
Citation Context ...ssertation. The workload of simulations can be characterized as number of events. This estimate is normally obtained from real sequential simulation [20, 102] or modeled by using probability analysis =-=[1, 18, 32, 75, 101]-=-. It is difficult to obtain accurate apriori estimates of the computational loads of processes and the communication frequency on various arcs of a circuit graph. Due to this difficulty, two kinds of ... |

15 |
An improved cost function for static partitioning of parallel circuit simulations using a conservative synchronization protocol
- Kapp, Hartrum, et al.
- 1995
(Show Context)
Citation Context ...ot considered. In order to reduce this problem, similar algorithms based on depth-first search (DFS) or breadth-first search (BFS) techniques have been proposed to reduce the amount of communications =-=[42, 49]-=-. Cone Partitioning Smith et al [89] introduced a cone partitioning technique to improve the communication overhead. Under their approach, a circuit is partitioned into subcircuits such that both the ... |

14 | Parallel logic simulation of VLSI systems
- Chamberlain
- 1995
(Show Context)
Citation Context ... event driven simulation is better for low-activity circuits. Most parallel logic simulations using discrete event-driven techniques do not consistently perform well over different circuits simulated =-=[9, 21]-=-. For some circuits with high activity, the PDES can perform poorly compared to other simulations because of multiple evaluations at each gate for the same input vector. In this chapter, an optimistic... |

12 |
Evaluating the use of pre-simulation in VLSI circuit partitioning
- CHAMBERLAIN, HENDERSON
- 1994
(Show Context)
Citation Context ...c partitioning algorithms are considered in this dissertation. The workload of simulations can be characterized as number of events. This estimate is normally obtained from real sequential simulation =-=[20, 102]-=- or modeled by using probability analysis [1, 18, 32, 75, 101]. It is difficult to obtain accurate apriori estimates of the computational loads of processes and the communication frequency on various ... |

12 |
The time of next event algorithm
- Groselj, Tropper
- 1988
(Show Context)
Citation Context ...approach. A local deadlock resolution algorithm, called the Time of Next Event algorithm (TNE), is proposed to obtain the greatest lower bound on the input links of processes on the same processor in =-=[38]-=-. The algorithm is improved later by Boukerche [15]. To obtain even more parallelism, an asynchronous optimistic approach (called Time Warp) was used. In this approach, by ignoring possible data-depen... |

11 |
A Case Against Event-driven Simulation for Digital System Design
- Jennings
- 1991
(Show Context)
Citation Context ...d the frequent rollback overhead caused by overoptimism. For some circuits with high activity, the discrete-event optimistic simulation model is not a reasonable alternative over oblivious simulation =-=[45]-=-. Moreover, overoptimism in most optimistic simulations produces many erroneous events. As a result, most optimistic algorithms do not lead to good performance consistently over different circuits. 1.... |

8 |
How circuit size affects parallelism
- BAILEY
- 1992
(Show Context)
Citation Context ...rithms using a global clock have not achieved reasonable speedup due to the excessive computational load of global synchronizations. It suffers from not being able to explore asynchronous concurrency =-=[8, 17, 20, 29]-=-. In order to obtain more parallelism, the conservative Chandy-Misra approach [22] was used by Soule and Gupta [95]. They showed that a synchronous method is faster than a conservative approach. A loc... |

8 |
Multilevel Graph Partition and Sparse Matrix Ordering
- Karypis, Kumar
- 1995
(Show Context)
Citation Context ...d in signal activity. However, multiple runs are required in order to obtain good quality of partitioning. Recently several partitioning algorithms have been combined at multiple levels. Karpis et al =-=[50]-=- present a graph partitioning algorithm that transforms a graph to a coarser and smaller graph, partitions this graph using a recursive bisection technique, and then uncoarsens it to construct a parti... |

7 |
SGTNE: Semi-Global Time of the Next Event Algorithm
- Boukerche, Tropper
- 1995
(Show Context)
Citation Context ...lled the Time of Next Event algorithm (TNE), is proposed to obtain the greatest lower bound on the input links of processes on the same processor in [38]. The algorithm is improved later by Boukerche =-=[15]-=-. To obtain even more parallelism, an asynchronous optimistic approach (called Time Warp) was used. In this approach, by ignoring possible data-dependencies, events with different timestamps may be ex... |

6 |
Parallel gate-level circuit simulation on shared memory architectures
- Bagrodia, Chen, et al.
- 1995
(Show Context)
Citation Context ...21 2.4 PDES Performance Parallelization of sequential logic simulations [9] has been performed using techniques ranging from synchronous parallel simulation [74] to asynchronous optimistic simulation =-=[3, 7, 18, 23]-=-. Basically, synchronous simulations are performed by synchronizing events with the same timestamp. Synchronous algorithms using a global clock have not achieved reasonable speedup due to the excessiv... |

6 |
Relaxing Synchronization in Distributed Simulated Annealing
- Hong, McMillin
(Show Context)
Citation Context ...n information [20], an adaptive search schedule [14], and an accurate cost function considering null messages for conservative 20 parallel simulation [42, 49]. Even though simulated annealing methods =-=[29, 40, 42, 105]-=- produced good partitions, the running time is usually prohibitively high in cases when high quality partitions are needed. Clustering Techniques and Combined Algorithms Clustering techniques are used... |

5 | Conservative circuit simulation on shared-memory multIprocessors - Rauber, a1 - 1996 |

4 |
Taking Advantage of Optimal On-Chip Parallelism for Parallel Discrete Event Simulation
- Briner, Ellis, et al.
- 1988
(Show Context)
Citation Context ...rithms using a global clock have not achieved reasonable speedup due to the excessive computational load of global synchronizations. It suffers from not being able to explore asynchronous concurrency =-=[8, 17, 20, 29]-=-. In order to obtain more parallelism, the conservative Chandy-Misra approach [22] was used by Soule and Gupta [95]. They showed that a synchronous method is faster than a conservative approach. A loc... |

4 |
Parallel simulated annealing using speculative computation
- E, Franklin
- 1991
(Show Context)
Citation Context ...e using estimates from a "pre-simulation" and the other using estimates from a full-simulation [69]. The use of pre-simulation information to characterize gate activity was evaluated and pre=-=sented in [29, 20]-=-. The following partitioning algorithms are commonly used in parallel logic simulation environments. Random Partitioning Random partitioning algorithm, the easiest and fastest technique, has been used... |

3 |
Fast parallel simulation of digital systems
- Jr, J
- 1991
(Show Context)
Citation Context ...rhead of frequent global synchronizations. A major problem of parallel logic simulation techniques is that they do not produce consistent performance on different circuits and different input vectors =-=[16, 18, 21]-=-. Recently, clock cycle-based simulation has attracted a lot of interest even though it is only usable for synchronous circuits. However, the performance of parallel clock cycle-based simulation is ve... |

2 |
A unified framework for parallel event-driven logic simulation
- Arvind, Smart
- 1991
(Show Context)
Citation Context ...21 2.4 PDES Performance Parallelization of sequential logic simulations [9] has been performed using techniques ranging from synchronous parallel simulation [74] to asynchronous optimistic simulation =-=[3, 7, 18, 23]-=-. Basically, synchronous simulations are performed by synchronizing events with the same timestamp. Synchronous algorithms using a global clock have not achieved reasonable speedup due to the excessiv... |

2 |
Parallel Mixed-Level Simulation of Digital Circuits Using Virtual Time
- Jr
- 1990
(Show Context)
Citation Context ... their simulation becomes more time-consuming. Circuit simulations can be classified into four groups: analog simulation, switch-level simulation, gate-level simulation, and function-level simulation =-=[18, 37]-=-. Only gate-level simulation is considered in this work where a circuit contains a set of logic gates such as NOT, AND, OR, NAND, NOR gates, and flip-flops. Gate-level logic simulation is a primary to... |

2 |
Breaking the Barrier of Parallel Simulation
- Briner, Ellis, et al.
- 1991
(Show Context)
Citation Context ...order to obtain better performance, many techniques are proposed by researchers who believe that the TW algorithm with low overhead will be the best: adaptive process scheduling [79], bounding window =-=[19, 91, 103]-=-, synchronization granularities [98], memory management [2, 24, 39, 44, 66, 67], and partitioning and load balancing [19, 57, 60, 69, 96]. In the Time Warp algorithm, logical processes may advance too... |

2 |
De Vries. "Reducing Null Messages in Misra's Distributed Discrete Event Simulation
- C
- 1990
(Show Context)
Citation Context ...roposed in order to eliminate the use of null messages at the cost of deadlock recovery. However, it is difficult to implement. Several other conservative simulation approaches have been developed in =-=[25, 46, 82, 95]-=-. Optimistic Simulation In Time Warp algorithm [43, 34], the most popular optimistic paradigm, a process may execute an event as soon as the event arrives without considering possible data dependencie... |

2 | Performance Analysis of Time Warp with Homogenous Processors and Exponential Task Times
- Gupta, Akyildiz, et al.
- 1991
(Show Context)
Citation Context |

2 | Parallel optimistic logic simulation with event lookahead - Kim, Jean - 1998 |

2 |
Concurrency preserving partitioning algorithm for parallel logic simulation
- Kim, Jean
- 1999
(Show Context)
Citation Context ...ning algorithm is required to minimize interprocessor communication and to achieve better concurrency. Such partitioning algorithm was developed for parallel logic simulation and used in this chapter =-=[55, 54]-=-. 4.3 Governor Heap Among current scheduling algorithms, the smallest-timestamp-first method can reduce rollbacks to a reasonable level. The most commonly used data structure to support the method is ... |

1 |
Accelerated ATPG and Fault Grading via Analysis
- Brglez, Pownall, et al.
- 1985
(Show Context)
Citation Context ...34 s38417 23843 28 106 1636 s38584.1 20717 38 304 1426 Circuit Systems To get reasonable test results for performance analysis, we use several kinds of benchmark circuits: ISCAS'85 benchmark circuits =-=[11]-=-, ISCAS'89 benchmark circuits [12], n-bit accumulators with flip-flops, and n-bit array multipliers. They have different characteristics as test circuits. ISCAS'85 circuits are combinational circuits,... |

1 |
Logic Simulation for VLSI Design on
- Chung
- 1992
(Show Context)
Citation Context ...21 2.4 PDES Performance Parallelization of sequential logic simulations [9] has been performed using techniques ranging from synchronous parallel simulation [74] to asynchronous optimistic simulation =-=[3, 7, 18, 23]-=-. Basically, synchronous simulations are performed by synchronizing events with the same timestamp. Synchronous algorithms using a global clock have not achieved reasonable speedup due to the excessiv... |

1 |
Scholten "Termination Detection for Diffusing Computation
- Dijkstra, S
- 1980
(Show Context)
Citation Context ... synchronization messages. Since the global synchronization-based GVT computation caused the degradation of performance, distributed GVT computation algorithms were proposed to get better performance =-=[30, 70, 71, 81]-=-. Our GVT computation algorithm is a modification of the token passing algorithm proposed by Preiss [81]. In our token ring passing method, each processor does not have the exact GVT value, but has it... |

1 |
Nwana "On the Complexity of Boolean Functions Computed by Lazy Oracles
- Dunne, Leng, et al.
- 1995
(Show Context)
Citation Context ...ssertation. The workload of simulations can be characterized as number of events. This estimate is normally obtained from real sequential simulation [20, 102] or modeled by using probability analysis =-=[1, 18, 32, 75, 101]-=-. It is difficult to obtain accurate apriori estimates of the computational loads of processes and the communication frequency on various arcs of a circuit graph. Due to this difficulty, two kinds of ... |

1 |
Improving Conservative VHDL Simulation Performance by Reduction of Feedback
- Hurford, Hartrum
- 1996
(Show Context)
Citation Context ...ot considered. In order to reduce this problem, similar algorithms based on depth-first search (DFS) or breadth-first search (BFS) techniques have been proposed to reduce the amount of communications =-=[42, 49]-=-. Cone Partitioning Smith et al [89] introduced a cone partitioning technique to improve the communication overhead. Under their approach, a circuit is partitioned into subcircuits such that both the ... |

1 |
An Integrated framework for Parallel Simulation
- Jha
- 1996
(Show Context)
Citation Context ...roposed in order to eliminate the use of null messages at the cost of deadlock recovery. However, it is difficult to implement. Several other conservative simulation approaches have been developed in =-=[25, 46, 82, 95]-=-. Optimistic Simulation In Time Warp algorithm [43, 34], the most popular optimistic paradigm, a process may execute an event as soon as the event arrives without considering possible data dependencie... |