Results 1 - 10
of
11
Fast Timing-driven Partitioning-based Placement for Island Style FPGAs
- in Proceedings of the ACM/IEEE Design Automation Conference
, 2003
"... In this paper we propose a partitioning-based placement algorithm for FPGAs. The method incorporates simple, but effective heuristics that target delay minimization. The placement engine incorporates delay estimations obtained from previously placed and routed circuits using VPR [6]. As a result, th ..."
Abstract
-
Cited by 14 (3 self)
- Add to MetaCart
In this paper we propose a partitioning-based placement algorithm for FPGAs. The method incorporates simple, but effective heuristics that target delay minimization. The placement engine incorporates delay estimations obtained from previously placed and routed circuits using VPR [6]. As a result, the delay predictions during placement more accurately resemble those observed after detailed routing, which in turn leads to better delay optimization. An efficient terminal alignment heuristic for delay minimization is employed to further optimize the delay of the circuit in the routing phase. Simulation results show that the proposed technique can achieve comparable circuit delays (after routing) to those obtained with VPR while achieving a 7-fold speedup in placement runtime.
Stream computations organized for reconfigurable execution
, 2006
"... Reconfigurable systems can offer the high spatial parallelism and fine-grained, bit-level resource control traditionally associated with hardware implementations, along with the flexibility and adaptability characteristic of software. While reconfigurable systems create new opportunities for enginee ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Reconfigurable systems can offer the high spatial parallelism and fine-grained, bit-level resource control traditionally associated with hardware implementations, along with the flexibility and adaptability characteristic of software. While reconfigurable systems create new opportunities for engineering and delivering high-performance programmable systems, the traditional approaches to programming and managing computations used for hardware systems (e.g., Verilog, VHDL) and software systems (e.g., C, Fortran, Java) are inappropriate and inadequate for exploiting reconfigurable platforms. To address this need, we develop a stream-oriented compute model, system architecture, and execution patterns which can capture and exploit the parallelism of spatial computations while simultaneously abstracting software applications from hardware details (e.g., timing, device capacity, and microarchitectural implementation details) and consequently allowing applications to scale to exploit newer, larger, and faster hardware platforms. Further, we describe hardware and software techniques that make this late-bound platform mapping viable and efficient.
GraphStep: A System Architecture for Sparse-Graph Algorithms
- In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE
, 2006
"... Abstract — Many important applications are organized around long-lived, irregular sparse graphs (e.g., data and knowledge bases, CAD optimization, numerical problems, simulations). The graph structures are large, and the applications need regular access to a large, data-dependent portion of the grap ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract — Many important applications are organized around long-lived, irregular sparse graphs (e.g., data and knowledge bases, CAD optimization, numerical problems, simulations). The graph structures are large, and the applications need regular access to a large, data-dependent portion of the graph for each operation (e.g., the algorithm may need to walk the graph, visiting all nodes, or propagate changes through many nodes in the graph). On conventional microprocessors, the graph structures exceed on-chip cache capacities, making main-memory bandwidth and latency the key performance limiters. To avoid this “memory wall, ” we introduce a concurrent system architecture for sparse graph algorithms that places graph nodes in small distributed memories paired with specialized graph processing nodes interconnected by a lightweight network. This gives us a scalable way to map these applications so that they can exploit the high-bandwidth and low-latency capabilities of embedded memories (e.g., FPGA Block RAMs). On typical spreadingactivation queries on the ConceptNet Knowledge Base, a sample application, this translates into an order of magnitude speedup per FPGA compared to a state-of-the-art Pentium processor. I.
New Timing and Routability Driven Placement Algorithms for FPGA Synthesis ABSTRACT
"... We present new timing and congestion driven FPGA placement algorithms with minimal runtime overhead. By predicting the post-routing critical edges and estimating congestion accurately, our algorithms simultaneously reduce the critical path delay and the minimum number of routing tracks. The core of ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We present new timing and congestion driven FPGA placement algorithms with minimal runtime overhead. By predicting the post-routing critical edges and estimating congestion accurately, our algorithms simultaneously reduce the critical path delay and the minimum number of routing tracks. The core of our algorithm consists of a criticalityhistory record of connection edges and a congestion map. This approach is applied to the 20 largest MCNC benchmark circuits. Experimental results show that compared with VPR [2], our algorithms yield an average of 8.1 % reduction (maximum 30.5%) in the critical path delay and 5% reduction in channel width. Meanwhile, the average runtime of our algorithms is only 2.3X as of VPR’s.
Acceleration of FPGA placement
, 2005
"... 1 Introduction and motivation FPGAs are circuits that can be programmed (and reprogrammed) in the field. Logic functions are typically implemented with look-up tables and flip-flops. The routing between the various blocks is also programmable with the ability to connect the output of virtually any l ..."
Abstract
- Add to MetaCart
1 Introduction and motivation FPGAs are circuits that can be programmed (and reprogrammed) in the field. Logic functions are typically implemented with look-up tables and flip-flops. The routing between the various blocks is also programmable with the ability to connect the output of virtually any logic block with any other. Through synthesis of a hardware-description language like Verilog or VHDL, user logic is mapped to these logic blocks. After this mapping, it is necessary to make decisions as to the physical location of these logic blocks and which routing resources should be dedicated to which nets. To perform this placement and routing in an optimal manner is a proven NP-complete problem [1]. There are several methods for providing solutions that are acceptable to the designer in a tractable amount of time. Some of these include simulated annealing (SA), forcedirected placement, min-cut placement, placement by numerical optimization, and evolutionbased placement[2]. In this paper, we are mostly concerned with SA, though evolutionary techniques will make a brief appearance.
Criticality History Guided FPGA Placement Algorithm for Timing Optimization
"... We present an efficient timing-driven placement algorithm for FPGAs. Our major contribution is a criticality history guided (CHG) approach that can simultaneously reduce the critical path delay and computation time. The proposed approach keeps track of the timing criticality history of each edge and ..."
Abstract
- Add to MetaCart
We present an efficient timing-driven placement algorithm for FPGAs. Our major contribution is a criticality history guided (CHG) approach that can simultaneously reduce the critical path delay and computation time. The proposed approach keeps track of the timing criticality history of each edge and utilizes this information to effectively guide the placer. We also present a cooling schedule that optimizes both timing and run time when combined with the CHG method. The proposed algorithm is applied to the 20 largest MCNC benchmark circuits. Experimental results show that compared with VPR [1], our placement algorithm yields an average of 21.7 % reduction (maximum 45.8%) in the critical path delay and it runs 2.2X faster than VPR. In addition, our approach outperforms other algorithms discussed in the literature in both delay and run time.
General Terms
"... In this paper, we describe the application of two parallelization strategies to the Quartus II FPGA placer. The first uses a pipelining approach and achieves speedups of 1.3x on two processing cores. The second uses a parallel moves approach and achieves speedups of 2.2x on four cores. Unlike all pr ..."
Abstract
- Add to MetaCart
In this paper, we describe the application of two parallelization strategies to the Quartus II FPGA placer. The first uses a pipelining approach and achieves speedups of 1.3x on two processing cores. The second uses a parallel moves approach and achieves speedups of 2.2x on four cores. Unlike all previous parallel moves algorithms, ours is deterministic and always gives the same answer as the serial version of the algorithm, without any significant reduction in performance. We also describe a process to quantify multi-core performance effects, such as memory subsystem limitations and explicit synchronization overhead, and fully describe these effects on a CAD tool for the first time. Memory limitations alone are found to cost up to 35 % of total runtime. Unlike previous algorithms, our algorithms have negligible explicit synchronization overhead. These results are relevant to both CAD designers and to any developers seeking to parallelize existing software. Categories and Subject Descriptors
Embedded Systems and Applications Group
"... Abstract: Future nano scale devices will expose different characteristics than todays silicon devices. While the exponential growth of non recurring expenses (NRE, mostly due to mask sets) can be anticipated even for new technologies, problems such as the dramatically increased defect density requir ..."
Abstract
- Add to MetaCart
Abstract: Future nano scale devices will expose different characteristics than todays silicon devices. While the exponential growth of non recurring expenses (NRE, mostly due to mask sets) can be anticipated even for new technologies, problems such as the dramatically increased defect density require new approaches to build functional devices at reasonable prices. Improved CAD algorithms can help to solve these problems, or in some cases, they can be seen as enabling technology to broaden the use of paradigms such as reconfigurable computing. In this work we discuss in which stages of design, manufacturing, and deployment new CAD algorithms are required. 1 How to make Productive use of Billions of Logic Gates Following the road of Moore’s law, the number of transistors on a chip doubles every 24 months. After being valid for more than 40 years, the end of Moore’s law has been forecast many times now. Yet, technological advances have keep the progress intact. While the technological forecast for the next 5 to 10 years still concentrates on traditional CMOS logic realized on silicon, it seems likely that other technologies will take over in this time frame. A good candidate is CMOL[SFG + 03][DL05], which uses carbon nano
Abstract A C to Register Transfer Level Algorithm Using Structured Circuit Templates: A Case Study with Simulated Annealing
, 2008
"... A tool flow is presented for deriving simulated annealing accelerator circuits on a fieldprogrammable gate array (FPGA) from C source code by exploring architecture solutions that conform to a preset template through scheduling and mapping algorithms. A case study carried out on simulated annealing- ..."
Abstract
- Add to MetaCart
A tool flow is presented for deriving simulated annealing accelerator circuits on a fieldprogrammable gate array (FPGA) from C source code by exploring architecture solutions that conform to a preset template through scheduling and mapping algorithms. A case study carried out on simulated annealing-based Autonomous Mission Planning and Scheduling (AMPS) software used for autonomous spacecraft systems is explained. The goal of the research is an automated method for the derivation of a hardware design that maximizes performance while minimizing the FPGA footprint. Results obtained are compared with a peer C to register transfer level (RTL) logic tool, a state-of-the-art space-borne embedded processor and a commodity desktop processor for a variety of problems. The automatically derived hardware circuits consistently outperform other methods by one or more orders of magnitude. (131 pages) iv Acknowledgments Several people deserve recognition for the success of this document and what it represents. First, Dr. Aravind Dasu has stuck with me for four years of highs and lows. Dr.

