Results 1 - 10
of
21
The PARSEC benchmark suite: Characterization and architectural implications
- IN PRINCETON UNIVERSITY
, 2008
"... This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited ..."
Abstract
-
Cited by 150 (1 self)
- Add to MetaCart
This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited number of synchronization methods. PARSEC includes emerging applications in recognition, mining and synthesis (RMS) as well as systems applications which mimic large-scale multithreaded commercial programs. Our characterization shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic. The benchmark suite has been made available to the public.
Hardware-assisted simulated annealing with application for fast fpga placement
- in Proceedings of the International Symposium on Field-Programmable Gate Arrays
, 2003
"... To truly exploit FPGAs for rapid turn-around development and prototyping, placement times must be reduced to seconds; latebound, reconfigurable computing applications may demand placement times as short as microseconds. In this paper, we show how a systolic structure can accelerate placement by assi ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
To truly exploit FPGAs for rapid turn-around development and prototyping, placement times must be reduced to seconds; latebound, reconfigurable computing applications may demand placement times as short as microseconds. In this paper, we show how a systolic structure can accelerate placement by assigning one processing element to each possible location for an FPGA LUT from a design netlist. We demonstrate that our technique approaches the same quality point as traditional simulated annealing as measured by a simple linear wirelength metric. Experimental results look ahead to compare quality against VPR’s fast placer when considering the minimum channel width required to route as the primary optimization criteria. Preliminary results from an FPGA implementation show the feasibility of accelerating simulated annealing by three orders of magnitude using this approach. This means we can place the largest design in the University of Toronto’s “FPGA
A Parallel Algorithm for Fault Simulation based on PROOFS
- in Proceedings of the International Conference on Computer Design
, 1995
"... Fault simulation for sequential circuits numbers among the highly compute-intensive tasks in the integrated circuit design process. In the quest for rapid design turn-around, parallelization has been proposed to speed fault simulation. In this paper, we introduce ProperPROOFS, a parallel extension ..."
Abstract
-
Cited by 12 (8 self)
- Add to MetaCart
Fault simulation for sequential circuits numbers among the highly compute-intensive tasks in the integrated circuit design process. In the quest for rapid design turn-around, parallelization has been proposed to speed fault simulation. In this paper, we introduce ProperPROOFS, a parallel extension of the PROOFS fault simulation package. ProperPROOFS exploits parallelism based on fault partitioning, incorporating static and dynamic partitioning schemes and a new asynchronous and distributed method of fault redistribution. We present results for circuits in the ISCAS89 benchmark set across several parallel architectures. A detailed evaluation of results provides new insight into the use of fault partitioning to parallelize high performance serial fault simulation applications. 1 Introduction Fault simulation is used to determine the set of faults in a circuit that are covered by a set of test vectors. It is typically used during automatic test pattern generation in order to minimize te...
Performance Trade-Offs In A Parallel Test Generation/ Fault Simulation Environment
, 1991
"... As parallel processing hardware becomes more common and affordable, multiprocessors are being increasingly used to accelerate VLSI CAD algorithms. The problem of partitioning faults in a parallel test generation/ fault simulation (TG/FS) environment has received very little attention in the past. In ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
As parallel processing hardware becomes more common and affordable, multiprocessors are being increasingly used to accelerate VLSI CAD algorithms. The problem of partitioning faults in a parallel test generation/ fault simulation (TG/FS) environment has received very little attention in the past. In a parallel TG/FS environment, the fault partitioning method used can have a significant impact on the overall test length and speedup. We propose heuristics to partition faults for parallel test generation with minimization of both the overall run time and test length as an objective. Also, for efficient utilization of available processors, the work load has to be balanced at all times. Since it is very difficult to predict a priori how difficult it is to generate a test for a particular fault, we propose a load balancing method which uses static partitioning initially and then uses dynamic allocation of work for processors which become idle. We present a theoretical model to predict the pe...
Parallel Genetic Algorithms for Simulation-Based Sequential Circuit Test Generation
- IEEE VLSI Design Conference
, 1997
"... The problem of test generation belongs to the class of NP-complete problems and it is becoming more and more difficult as the complexity of VLSI circuits increases, and as long as execution times pose an additional problem. Parallel implementations can potentially provide significant speedups while ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
The problem of test generation belongs to the class of NP-complete problems and it is becoming more and more difficult as the complexity of VLSI circuits increases, and as long as execution times pose an additional problem. Parallel implementations can potentially provide significant speedups while retaining good quality results. In this paper, we present three parallel genetic algorithms for simulation-based sequential circuit test generation. Simulation-based test generators are more capable of handling the constraints of complex design features than deterministic test generators. The three parallel genetic algorithm implementations are portable and scalable over a wide range of distributed and shared memory MIMD machines. Significant speedups were obtained, and fault coverages were similar to and occasionally better than those obtained using a sequential genetic algorithm, due to the parallel search strategies adopted.
Overcoming the Serial Logic Simulation Bottleneck in Parallel Fault Simulation
- Proc. of the IEEE International Test Conference (ITC
, 1997
"... We propose a new approach to parallelizing fault simulation in which the test set is partitioned among the available processors. The approach can be used for any of the sequential circuit fault simulation algorithms commonly used, and it can be implemented on various different parallel architectures ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
We propose a new approach to parallelizing fault simulation in which the test set is partitioned among the available processors. The approach can be used for any of the sequential circuit fault simulation algorithms commonly used, and it can be implemented on various different parallel architectures. This approach for the first time overcomes the limitations of serial logic simulation. In addition, the excessive redundant computations required in the traditional fault-partitioning approach are also considerably reduced. Significant improvements in speedup were observed as compared to previous approaches. An average speedup of 5.7 was obtained for test set partitioning over 10 processors for the benchmark circuits studied. Although pessimistic fault coverage may be reported in some cases, the proposed approach was found to be very accurate for the circuits studied. I Introduction Fault simulation is an important step in the electronic design process and is used to identify faults that...
A Class Library Approach To Concurrent Object-Oriented Programming With Applications To VLSI CAD
, 1994
"... PARALLEL ARCHITECTURE : : : : : : : : : : : : : : : : 56 4.1 Thread Management : : : : : : : : : : : : : : : : : : : : : : : : : : : : 56 4.2 Resource Management : : : : : : : : : : : : : : : : : : : : : : : : : : : 64 4.3 Communication Management : : : : : : : : : : : : : : : : : : : : : : : 69 4. ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
PARALLEL ARCHITECTURE : : : : : : : : : : : : : : : : 56 4.1 Thread Management : : : : : : : : : : : : : : : : : : : : : : : : : : : : 56 4.2 Resource Management : : : : : : : : : : : : : : : : : : : : : : : : : : : 64 4.3 Communication Management : : : : : : : : : : : : : : : : : : : : : : : 69 4.4 Configuration Management : : : : : : : : : : : : : : : : : : : : : : : : 72 4.5 Performance : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 76 4.6 Evaluation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 78 4.7 Other Models and Implementations : : : : : : : : : : : : : : : : : : : : vii 5 META-PROGRAMMABILITY : : : : : : : : : : : : : : : : : : : : : : : : 85 5.1 Local Meta-programmability : : : : : : : : : : : : : : : : : : : : : : : : 85 5.2 Global Meta-programmability : : : : : : : : : : : : : : : : : : : : : : : 95 5.3 Evaluation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 99 5.4 Other Models and Implementations : : : : : : ...
A Parallel Algorithm for State Assignment of Finite State Machines
- IEEE Transactions on Computers
, 1996
"... Optimization of huge sequential circuits has become unmanageable in CAD of VLSI due to enormous time and memory requirements. In this paper, we report a parallel algorithm for the state assignment problem for finite state machines. Our algorithm has three significant contributions: It is an asynchro ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Optimization of huge sequential circuits has become unmanageable in CAD of VLSI due to enormous time and memory requirements. In this paper, we report a parallel algorithm for the state assignment problem for finite state machines. Our algorithm has three significant contributions: It is an asynchronous parallel algorithm portable across different MIMD machines. Time and memory requirements reduce by a factor of P (the number of processors), enabling it to handle large problem sizes which the sequential algorithm fails to handle. The quality of the results for multiprocessor runs remains comparable to the sequential algorithm on which it is based. Index Terms : Encoding Hypercube, State Assignment, Memory Scalabilty, Conflict Resolution. 1 Introduction With the rapid improvement in VLSI technology, circuit design is becoming extremely complex and is placing increasing demands on CAD tools. Parallel processing is becoming an attractive solution to reduce the inordinate amount of time...
Parallel Algorithms for Layout Verification
, 1995
"... parallel architecture interface Applications: ProperFAULT: fault simulator ProperTEST: test generation ProperPLACE: cell placement ProperROUTE: wire routing ProperSYN: logic synthesis ProperSIM: logic simulation ProperCADII runtime object-oriented library Parallel algorithm ProperEXT: circuit extr ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
parallel architecture interface Applications: ProperFAULT: fault simulator ProperTEST: test generation ProperPLACE: cell placement ProperROUTE: wire routing ProperSYN: logic synthesis ProperSIM: logic simulation ProperCADII runtime object-oriented library Parallel algorithm ProperEXT: circuit extraction Sun and HP Sun SPARCserver 1000 Distributed-memory Shared-memory Figure 1.1: Overview of the ProperCAD II project In the Actor paradigm, continuations fill the role of parallel function calls. Continuation calls differ from traditional C++ member function calls in the following ways: Continuation execution occurs asynchronously, so that the actor invoking the continuation can overlap computations with the time required for the communication to take place, and continuation calls do not return a value, due to their asynchronous nature. Figure 1.2 shows a continuation call from one thread to another, in which a return value may eventually be returned from the second thread to the first b...
A Parallel Hierarchical Algorithm For Module Placement Based On Sparse Linear Equations
- In Proceedings of the 1996 International Conference on Circuits and Systems
, 1996
"... We present a fast and effective module placement algorithm which is based on the PROUD algorithm. The PROUD algorithm uses a hierarchical decomposition technique and the solution of sparse linear systems of equations based on a resistive network analogy. It has been shown that the PROUD algorithm is ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We present a fast and effective module placement algorithm which is based on the PROUD algorithm. The PROUD algorithm uses a hierarchical decomposition technique and the solution of sparse linear systems of equations based on a resistive network analogy. It has been shown that the PROUD algorithm is suitable for solving the placement problem for very large circuits, and obtains placement qualities that are comparable to the best placement algorithms based on simulated annealing, but is several orders of magnitude faster. In this paper, we first report on an improved hierarchical placement algorithm which is based on perturbing the matrices in the matrix equation solution stage of the PROUD algorithm. The new modified PROUD algorithm performs much faster that the original PROUD algorithm. We subsequently propose parallel versions of the original and modified algorithms that combine both fine grain and coarse grain parallelism to obtain another order of magnitude improvement in the runti...

