Results 1  10
of
122
Retiming Synchronous Circuitry
 ALGORITHMICA
, 1991
"... This paper describes a circuit transformation called retiming in which registers are added at some points in a circuit and removed from others in such a way that the functional behavior of the circuit as a whole is preserved. We show that retiming can be used to transform a given synchronous circui ..."
Abstract

Cited by 372 (3 self)
 Add to MetaCart
This paper describes a circuit transformation called retiming in which registers are added at some points in a circuit and removed from others in such a way that the functional behavior of the circuit as a whole is preserved. We show that retiming can be used to transform a given synchronous circuit into a more efficient circuit under a variety of different cost criteria. We model a circuit as a graph in which the vertex set Visa collection of combinational logic elements and the edge set E is the set of interconnections, each of which may pass through zero or more registers. We give an 0(V E lgV) algorithm for determining an equivalent retimed circuit with the smallest possible clock period. We show that the problem of determining an equivalent retimed circuit with minimum state (total number of registers) is polynomialtime solvable. This result yields a polynomialtime optimal solution to the problem of pipelining combinational circuitry with minimum register cost. We also give a characterization of optimal retiming based on an efficiently solvable mixedinteger linearprogramming problem.
Sparse matrix solvers on the GPU: conjugate gradients and multigrid
 ACM Trans. Graph
, 2003
"... Permission to make digital/hard copy of part of all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given ..."
Abstract

Cited by 277 (3 self)
 Add to MetaCart
Permission to make digital/hard copy of part of all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission
Power Minimization in IC Design: Principles and Applications
 ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS
, 1996
"... Low power has emerged as a principal theme in today’s electronics industry. The need for low power has caused a major paradigm shift in which power dissipation is as important as performance and area. This article presents an indepth survey of CAD methodologies and techniques for designing low powe ..."
Abstract

Cited by 190 (28 self)
 Add to MetaCart
Low power has emerged as a principal theme in today’s electronics industry. The need for low power has caused a major paradigm shift in which power dissipation is as important as performance and area. This article presents an indepth survey of CAD methodologies and techniques for designing low power digital CMOS circuits and systems and describes the many issues facing designers at architectural, logic and physical levels of design abstraction. It reviews some of the techniques and tools that have been proposed to overcome these difficulties and outlines the future challenges that must be met to design low power, high performance systems.
PathBased Scheduling for Synthesis
 IEEE TRANSACTIONS ON COMPUTERAIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
, 1991
"... In the context of synthesis, scheduling assigns operations to control steps. Operations are the atomic components used for describing behavior, for example, arithmetic and Boolean operations. They are ordered partially by data dependencies (dataflow graph) and by control constructs such as condit ..."
Abstract

Cited by 104 (0 self)
 Add to MetaCart
In the context of synthesis, scheduling assigns operations to control steps. Operations are the atomic components used for describing behavior, for example, arithmetic and Boolean operations. They are ordered partially by data dependencies (dataflow graph) and by control constructs such as conditional branches and loops (controlflow graph). A control step usually corresponds to one state, one clock cycle, or one microprogram step. This paper presents a new, pathbased scheduling algorithm. It yields solutions with the minimum number of control steps, taking into account arbitrary constraints that limit the amount of operations in each control step. The result is a finite state machine that implements the control. Although the complexity of the algorithm is proportional to the number of paths in the controlflow graph, it is shown to be practical for large examples with thousands of nodes.
MULTIPROCESSOR SCHEDULING TO ACCOUNT FOR INTERPROCESSOR COMMUNICATION
, 1991
"... Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essenti ..."
Abstract

Cited by 72 (11 self)
 Add to MetaCart
Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essential for attaining efficient hardware utilization. This thesis introduces two new compiletime heuristics for scheduling precedence graphs onto multiprocessor architectures, which account for interprocessor communication overheads and interconnection constraints in the architecture. These algorithms perform scheduling and routing simultaneously to account for irregular interprocessor interconnections, and schedule all communications as well as all computations to eliminate shared resource contention. The first technique, called dynamiclevel scheduling, modifies the classical HLFET list scheduling strategy to account for IPC and synchronization overheads. By using dynamically changing priorities to match nodes and processors at each step, this technique attains an equitable tradeoff between load balancing and interprocessor communication cost. This method is fast, flexible, widely targetable, and displays promising perforrnance. The second technique, called declustering, establishes a parallelism hierarchy upon the precedence graph using graphanalysis techniques which explicitly address the tradeoff between exploiting parallelism and incurring communication cost. By systematically decomposing this hierarchy, the declustering process exposes parallelism instances in order of importance, assuring efficient use of the available processing resources. In contrast with traditional clustering schemes, this technique can adjust the level of cluster granularity to suit the characteristics of the specified architecture, leading to a more effective solution.
A survey of optimization techniques targeting low power circuits
 in Proc. Design Automation Conf
, 1995
"... Abstract—We survey stateoftheart optimization methods that target low power dissipation in VLSI circuits. Optimizations at the circuit, logic, architectural and system levels are considered. Keywords—low power, optimization, synthesis I. ..."
Abstract

Cited by 66 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We survey stateoftheart optimization methods that target low power dissipation in VLSI circuits. Optimizations at the circuit, logic, architectural and system levels are considered. Keywords—low power, optimization, synthesis I.
Logic Decomposition during Technology Mapping. submitted to
 IEEE Trans. CAD
, 1995
"... A problem in technology mapping is that quality of the final implementation depends significantly on the initially provided circuit structure. To resolve this problem, conventional techniques iteratively but separately apply technology independent transformations and technology mapping. In this pape ..."
Abstract

Cited by 62 (1 self)
 Add to MetaCart
(Show Context)
A problem in technology mapping is that quality of the final implementation depends significantly on the initially provided circuit structure. To resolve this problem, conventional techniques iteratively but separately apply technology independent transformations and technology mapping. In this paper, we propose a procedure which performs logic decomposition and technology mapping simultaneously. We show that the procedure effectively explores all possible algebraic decompositions. It finds an optimal tree implementation over all the circuit structures examined, while the run time is typically logarithmic in the number of decompositions. 1
Optimizing TwoPhase, LevelClocked Circuitry (Extended Abstract)
"... We investigate two strategies for reducing the clock period of a twophase, levelclocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches. These methods can be used to convert a circuit with edgetriggered latches into a faster ..."
Abstract

Cited by 59 (16 self)
 Add to MetaCart
(Show Context)
We investigate two strategies for reducing the clock period of a twophase, levelclocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches. These methods can be used to convert a circuit with edgetriggered latches into a faster levelclocked one. We model a twophase circuit as a graph whose vertex set V is a collection of combinational logic blocks, and whose edge set E is a set of interconnections. Each interconnection passes through 0 or more latches, where each latch is clocked by one of two periodic, nonoverlapping waveforms, or phases. We give efficient polynomialtime algorithms for problems involving the timing verification and optimization of twophase circuitry. Included are algorithms for ffl verifyi...
Scheduling And Behavioral Transformations For Parallel Systems
, 1993
"... In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually ..."
Abstract

Cited by 43 (3 self)
 Add to MetaCart
In a parallel system, either a VLSI architecture in hardware or a parallel program in software, the quality of the final design depends on the ability of a synthesis system to exploit the parallelism hidden in the input description of applications. Since iterative or recursive algorithms are usually the most timecritical parts of an application, the parallelism embedded in the repetitive pattern of an iterative algorithm needs to be explored. This thesis studies techniques and algorithms to expose the parallelism in an iterative algorithm so that the designer can find an implementation achieving a desired execution rate. In particular, the objective is to find an efficient schedule to be executed iteratively. A form of dataflow graphs is used to model the iterative part of an application, e.g. a digital signal filter or the while/for loop of a program. Nodes in the graph represent operations to be performed and edges represent both intraiteration and interiteration precedence relat...
Understanding Retiming through Maximum AverageDelay Cycles
 Mathematical Systems Theory
, 1994
"... A synchronous circuit built of functional elements and registers is a simple implementation of the semisystolic model of computation that can be used to design parallel algorithms. Retiming is a wellknown technique that transforms a given circuit into a faster circuit by relocating its registers. W ..."
Abstract

Cited by 36 (8 self)
 Add to MetaCart
(Show Context)
A synchronous circuit built of functional elements and registers is a simple implementation of the semisystolic model of computation that can be used to design parallel algorithms. Retiming is a wellknown technique that transforms a given circuit into a faster circuit by relocating its registers. We give tight bounds on the minimum clock period that can be achieved by retiming a synchronous circuit. These bounds are expressed in terms of the maximum delaytoregister ratio of the cycles in the circuit graph and the maximum propagation delay d max of the circuit components. Our bounds do not depend on the size of the circuit, and they are of theoretical as well as practical interest. They characterize exactly the minimum clock period that can be achieved by retiming a unitdelay circuit, and they lead to more efficient algorithms for several important problems related to retiming. Specifically, we give an O(V 1=2 E lg V ) algorithm for minimum clock period retiming of unitdelay circu...