Results 1  10
of
59
Fractional cut: improved recursive bisection placement
 in Proc. Int. Conf. on Computer Aided Design
, 2003
"... In this paper, we present improvements to recursive bisection based placement. In contrast to prior work, our horizontal cut lines are not restricted to row boundaries; this avoids a “narrow region” problem. To support these new cut line positions, a dynamic programming based legalization algorithm ..."
Abstract

Cited by 33 (7 self)
 Add to MetaCart
(Show Context)
In this paper, we present improvements to recursive bisection based placement. In contrast to prior work, our horizontal cut lines are not restricted to row boundaries; this avoids a “narrow region” problem. To support these new cut line positions, a dynamic programming based legalization algorithm has been developed. The combination of these has improved the stability and lowered the wire lengths produced by our Feng Shui placement tool. On benchmarks derived from industry partitioning examples, our results are close to those of the annealing based tool Dragon, while taking only a fraction of the run time. On synthetic benchmarks, our wire lengths are nearly 23 % better than those of Dragon. For both benchmark suites, our results are substantially better than those of the recursive bisection based tool Capo and the analytic placement tool Kraftwerk. 1.
Faster SAT and Smaller BDDs via Common Function Structure
 University of Michigan
, 2001
"... The increasing popularity of SAT and BDD techniques in verification and synthesis encourages the search for additional speedups. Since typical SAT and BDD algorithms are exponential in the worstcase, the structure of realworld instances is a natural source of improvements. While SAT and BDD techn ..."
Abstract

Cited by 29 (8 self)
 Add to MetaCart
The increasing popularity of SAT and BDD techniques in verification and synthesis encourages the search for additional speedups. Since typical SAT and BDD algorithms are exponential in the worstcase, the structure of realworld instances is a natural source of improvements. While SAT and BDD techniques are often presented as mutually exclusive alternatives, our work points out that both can be improved via the use of the same structural properties of instances. Our proposed methods are based on efficient problem partitioning and can be easily applied as preprocessing with arbitrary SAT solvers and BDD packages without source code modifications. Finding a better variableordering is a well recognized problem for both SAT solvers and BDD packages. Currently, all leading edge variableordering algorithms are dynamic, in the sense that they are invoked many times in the course of the “host ” algorithm that solves SAT or manipulates BDDs. Examples include the DLCS ordering for SAT solvers and variablesifting during BDD manipulations. In this work we propose a universal variableordering MINCE (MIN Cut Etc.) that preprocesses a given Boolean formula in CNF. MINCE is completely independent from target algorithms and outperforms both DLCS for SAT and variable sifting for BDDs. We argue that MINCE tends to capture structural properties of Boolean functions arising from realworld applications. Our contribution is validated on the ISCAS circuits and the DIMACS benchmarks. Empirically, our technique often outperforms existing techniques by a factor of two or more. Our results motivate search for stronger dynamic ordering heuristics and combined static/dynamic techniques. 3 1
Multiobjective hypergraph partitioning algorithms for cut and maximum subdomain degree minimization
 IEEE TRANSACTIONS ON COMPUTER AIDED DESIGN
, 2005
"... In this paper we present a family of multiobjective hypergraph partitioning algorithms based on the multilevel paradigm, which are capable of producing solutions in which both the cut and the maximum subdomain degree are simultaneously minimized. This type of partitionings are critical for existin ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
(Show Context)
In this paper we present a family of multiobjective hypergraph partitioning algorithms based on the multilevel paradigm, which are capable of producing solutions in which both the cut and the maximum subdomain degree are simultaneously minimized. This type of partitionings are critical for existing and emerging applications in VLSI CAD as they allow to both minimize and evenly distribute the interconnects across the physical devices. Our experimental evaluation on the ISPD98 benchmark show that our algorithms produce solutions that when compared against those produced by hMETIS have a maximum subdomain degree that is reduced by up to 36 % while achieving comparable quality in terms of cut.
PacketSwitched vs. TimeMultiplexed FPGA Overlay Networks
 in Proceedings of the IEEE Symposium on FieldProgrammable Custom Computing Machines. IEEE
, 2006
"... Abstract — Dedicated, spatially configured FPGA interconnect is efficient for applications that require high throughput connections between processing elements (PEs) but with a limited degree of PE interconnectivity (e.g. wiring up gates and datapaths). Applications which virtualize PEs may require ..."
Abstract

Cited by 27 (9 self)
 Add to MetaCart
(Show Context)
Abstract — Dedicated, spatially configured FPGA interconnect is efficient for applications that require high throughput connections between processing elements (PEs) but with a limited degree of PE interconnectivity (e.g. wiring up gates and datapaths). Applications which virtualize PEs may require a large number of distinct PEtoPE connections (e.g. using one PE to simulate 100s of operators, each requiring input data from thousands of other operators), but with each connection having low throughput compared with the PE’s operating cycle time. In these highly interconnected conditions, dedicating spatial interconnect resources for all possible connections is costly and inefficient. Alternatively, we can time share physical network resources by virtualizing interconnect links, either by statically scheduling the sharing of resources prior to runtime or by dynamically negotiating resources at runtime. We explore the tradeoffs (e.g. area, route latency, route quality) between timemultiplexed and packetswitched networks overlayed on top of commodity FPGAs. We demonstrate modular and scalable networks which operate on a Xilinx XC2V60004 at 166MHz. For our applications, timemultiplexed, offline scheduling offers up to a 63 % performance increase over online, packetswitched scheduling for equivalent topologies. When applying designs to equivalent area, packetswitching is up to 2 × faster for small area designs while timemultiplexing is up to 5 × faster for larger area designs. When limited to the capacity of a XC2V6000, if all communication is known, timemultiplexed routing outperforms packetswitching; however when the active set of links drops below 40 % of the potential links, packetswitched routing can outperform timemultiplexing. I.
Parkway 2.0: A parallel multilevel hypergraph partitioning tool
 in Proc. 19th International Symposium on Computer and Information Sciences
, 2004
"... Algorithms for serial hypergraph partitioning have been studied extensively [9,2, 14] and tool support exists (e.g. hMeTiS [13] and PaToH [4]). However, these ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
(Show Context)
Algorithms for serial hypergraph partitioning have been studied extensively [9,2, 14] and tool support exists (e.g. hMeTiS [13] and PaToH [4]). However, these
Multilevel direct Kway hypergraph partitioning with multiple constraints and fixed vertices
, 2007
"... ..."
GraphStep: A System Architecture for SparseGraph Algorithms
 In Proceedings of the IEEE Symposium on FieldProgrammable Custom Computing Machines. IEEE
, 2006
"... Abstract — Many important applications are organized around longlived, irregular sparse graphs (e.g., data and knowledge bases, CAD optimization, numerical problems, simulations). The graph structures are large, and the applications need regular access to a large, datadependent portion of the grap ..."
Abstract

Cited by 17 (7 self)
 Add to MetaCart
(Show Context)
Abstract — Many important applications are organized around longlived, irregular sparse graphs (e.g., data and knowledge bases, CAD optimization, numerical problems, simulations). The graph structures are large, and the applications need regular access to a large, datadependent portion of the graph for each operation (e.g., the algorithm may need to walk the graph, visiting all nodes, or propagate changes through many nodes in the graph). On conventional microprocessors, the graph structures exceed onchip cache capacities, making mainmemory bandwidth and latency the key performance limiters. To avoid this “memory wall, ” we introduce a concurrent system architecture for sparse graph algorithms that places graph nodes in small distributed memories paired with specialized graph processing nodes interconnected by a lightweight network. This gives us a scalable way to map these applications so that they can exploit the highbandwidth and lowlatency capabilities of embedded memories (e.g., FPGA Block RAMs). On typical spreadingactivation queries on the ConceptNet Knowledge Base, a sample application, this translates into an order of magnitude speedup per FPGA compared to a stateoftheart Pentium processor. I.
Optimality, Scalability and Stability Study of Partitioning and Placement Algorithms
, 2003
"... stateoftheart partitioning and placement algorithms. We present algorithms to construct two classes of benchmarks, one for partitioning and the other for placement, which have known upper bounds of their optimal solutions, and can match any given net distribution vector. Using these partitioni ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
stateoftheart partitioning and placement algorithms. We present algorithms to construct two classes of benchmarks, one for partitioning and the other for placement, which have known upper bounds of their optimal solutions, and can match any given net distribution vector. Using these partitioning and placement benchmarks, we studied the optimality of stateofthe art algorithms by comparing their solutions with the upper bounds of the optimal solutions, and their scalability and stability by varying the sizes and characteristics of the benchmarks. The conclusions from this study are: 1) Stateofthe art, multilevel two way partitioning algorithms scale very well and are able to find solutions very close to the upper bounds of the optimal solutions of our benchmarks. This suggests that existing circuit partitioning techniques are fairly mature. There is not much room for improvement for cutsize minimization for problems of the current sizes. Multiway partitioning algorithms, on the other hand, do not perform that well. Their results can be up to 18% worse than our estimated upper bounds. 2) The stateoftheart placement algorithms produce significantly inferior results compared with the estimated optimal solutions. There is still significant room for improvement in circuit placement. 3) Existing placement algorithms are not stable. Their effectiveness varies considerably depending on the characteristics of the benchmarks. New hybrid techniques are probably needed for future generation placement engines that are more scalable and stable.
MINCE: A Static Global VariableOrdering for SAT and BDD
, 2001
"... Many popular algorithms that work with Boolean functions are dramatically dependent on the order of variables in input representations of Boolean functions. Such algorithms include satisfiability (SAT) solvers that are critical in formal verification and Binary Decision Diagrams (BDDs) manipulation ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Many popular algorithms that work with Boolean functions are dramatically dependent on the order of variables in input representations of Boolean functions. Such algorithms include satisfiability (SAT) solvers that are critical in formal verification and Binary Decision Diagrams (BDDs) manipulation algorithms that are increasingly popular in synthesis and verification. Finding better variableorderings is a wellrecognized problem in each of those contexts. Currently, all leadingedge variableordering algorithms are dynamic in the sense that they are invoked many times in the course of the "host" algorithm that solves SAT or manipulates BDDs. Examples include the DLIS ordering for SAT solvers and variable sifting during BDD manipulations. In this work we propose a universal variable ordering MINCE (MIN Cut Etc.) that preprocesses a given Boolean formula in CNF. MINCE is completely independent from target algorithms and outperforms both DLIS for SAT and variable sifting for BDDs. We argue that MINCE tends to capture structural properties of Boolean functions arising from realworld applications.
A.: Parallelizing sparse Matrix Solve for SPICE circuit simulation using FPGAs
 In: Proc. FieldProgrammable Technology
, 2009
"... Finegrained dataflow processing of sparse MatrixSolve computation (A~x = ~b) in the SPICE circuit simulator can provide an order of magnitude performance improvement on modern FPGAs. Matrix Solve is the dominant component of the simulator especially for large circuits and is invoked repeatedly dur ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
(Show Context)
Finegrained dataflow processing of sparse MatrixSolve computation (A~x = ~b) in the SPICE circuit simulator can provide an order of magnitude performance improvement on modern FPGAs. Matrix Solve is the dominant component of the simulator especially for large circuits and is invoked repeatedly during the simulation, once for every iteration. We process sparsematrix computation generated from the SPICEoriented KLU solver in dataflow fashion across multiple spatial floatingpoint operators coupled to highbandwidth onchip memories and interconnected by a lowlatency network. Using this approach, we are able to show speedups of 1.264 × (geometric mean of 8.8×) for a range of circuits and benchmark matrices when comparing doubleprecision implementations on a 250MHz Xilinx Virtex5 FPGA (65nm) and an Intel Core i7 965 processor (45nm). I.