Results 1  10
of
24
Recent Developments in HighLevel Synthesis
 ACM Transactions on Design Automation of Electronic Systems
, 1997
"... ing with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept, ACM Inc., 1515 Broadway, New York, N ..."
Abstract

Cited by 42 (0 self)
 Add to MetaCart
(Show Context)
ing with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept, ACM Inc., 1515 Broadway, New York, NY 10036 USA, fax +1 (212) 8690481, or permissions@acm.org Recent Development in High Level Synthesis y YounLong Lin Department of Computer Science Tsing Hua University HsinChu, Taiwan 30043, R. O. C. Abstract We survey recent development in high level synthesis technology for VLSI design. The need for higher level design automation tools are first discussed. We then describe some basic techniques for various subtasks of high level synthesis. Techniques that have been proposed in the past few years (since 1994) for various subtasks of high level synthesis are surveyed. We also survey some new synthesis objectives including testability, power efficiency and reliability. Keywords: High ...
Synthesis of application specific instruction sets
 IEEE TCAD
, 1995
"... An instruction set serves as the interface between hardware and software in a computer system. In an application specific environment, the system performance can be improved by designing an instruction set that matches the characteristics of hardware and the application. We present a systematic app ..."
Abstract

Cited by 34 (6 self)
 Add to MetaCart
(Show Context)
An instruction set serves as the interface between hardware and software in a computer system. In an application specific environment, the system performance can be improved by designing an instruction set that matches the characteristics of hardware and the application. We present a systematic approach to generate applicationspecific instruction sets so that software applications can be efficiently mapped to a given pipelined microarchitecture. The approach synthesizes instruction sets from application benchmarks, given a machine model, an objective function, and a set of design constraints. In addition, assembly code is generated to show how the benchmarks can be compiled with the synthesized instruction set. The problem of designing instruction sets is formulated as a modified scheduling problem. A binary tuple is proposed to model the semantics of instructions and integrate the instruction formation process into the scheduling process. A simulated annealing scheme is used to solve for the schedules. Experiments have shown that the approach is capable of synthesizing powerful instructions for modern pipelined microprocessors, and running with reasonable time and a modest amount of memory for large applications.
Optimal Code Placement of Embedded Software for Instruction Caches
 In Proc. of European Design and Test Conference
, 1996
"... This paper presents a new code placement method for embedded software to maximize hit ratios of instruction caches. We formulate the code placement problem as an integer linear programming problem. One of the advantages of our method is that code can be moved beyond boundaries of functions, so that ..."
Abstract

Cited by 25 (5 self)
 Add to MetaCart
(Show Context)
This paper presents a new code placement method for embedded software to maximize hit ratios of instruction caches. We formulate the code placement problem as an integer linear programming problem. One of the advantages of our method is that code can be moved beyond boundaries of functions, so that code placement is optimized globally. Experimental results show our method achieves 35% (max 45%) reduction of cache misses. 1 Introduction In design of an embedded system, several design goals such as high performance, low cost, and low power consumption of the system must be achieved simultaneously. But these design goals are often mutually exclusive. Consider a system which consists of a processor core, main memories and cache memories. The performance of the system is expressed as the following formula: Performance = 1 Execution time = F IC 2 (CPI +(10CHR)2CMP) (1) where F , IC,CPI,CHR and CMP denotes the clock frequency, the instruction count to be executed, clock cycles per in...
Synthesis of instruction sets for pipelined microprocessors
 in Proc. 31st DAC
, 1994
"... We present a systematic approach to synthesize an instruction set such that the given application software can be efficiently mapped to a parameterized, pipelined microarchitecture. In addition, the assembly code is generated to show how the application can be compiled with the synthesized instructi ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
(Show Context)
We present a systematic approach to synthesize an instruction set such that the given application software can be efficiently mapped to a parameterized, pipelined microarchitecture. In addition, the assembly code is generated to show how the application can be compiled with the synthesized instruction set. The design of instruction sets is formulated as a modified scheduling problem. A binary tuple is proposed to model the semantics of instructions and integrate the instruction formation process into the scheduling process. A simulated annealing scheme is used to solve for the schedules. Experiments have shown that the approach is capable of synthesizing powerful instructions for modern pipelined microprocessors. The synthesis algorithm ran with reasonable time and a modest amount of memory for large benchmarks. 1.
A transformationbased method for loop folding
 IEEE TRANSACTIONS ON COMPUTERAIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
, 1994
"... We propose a transformationbased scheduling algorithm for the problem given a loop construct, a target initiation interval and a set of resource constraints, schedule the loop in a pipelined fashion such that the iteration time of executing an iteration of the loop is minimized. The iteration tim ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
We propose a transformationbased scheduling algorithm for the problem given a loop construct, a target initiation interval and a set of resource constraints, schedule the loop in a pipelined fashion such that the iteration time of executing an iteration of the loop is minimized. The iteration time is an important quality measure of a data path design because it affects both storage and control costs. Our algorithm first performs an As Soon As Possible Pipelined (ASAPp) scheduling regardless the resource constraint. It then resolves resource constraint violations by rescheduling some operations. The software system implementing the proposed algorithm, called Theda.Fold, can deal with behavioral loop descriptions that contain chained, multicycle and/or structural pipelined operations as well as those having data dependencies across iteration boundaries. Experiment on a number of benchmarks is reported.
A Mathematical Formulation of the Loop Pipelining Problem
 XI Design of integrated Circuits and Systems Conference (DCIS'96
, 1995
"... A mathematical model for the loop pipelining problem is presented. The model considers several parameters for optimization and supports any combination of resource and timing constraints. The unrolling degree of the loop is one of the variables explored by the model. By using Farey's series, an ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
A mathematical model for the loop pipelining problem is presented. The model considers several parameters for optimization and supports any combination of resource and timing constraints. The unrolling degree of the loop is one of the variables explored by the model. By using Farey's series, an optimal exploration of the unrolling degree is performed and optimal solutions not considered by other methods are obtained. Finding an optimal schedule that minimizes resource requirements (including registers) is solved by an ILP model. A novel paradigm called branch and prune is proposed to efficiently converge towards the optimal schedule and prune the search tree for integer solutions, thus drastically reducing the running time. This is the first formulation that combines the unrolling degree of the loop with timing and resource constraints in a mathematical model that guarantees optimal solutions. 1 1 Introduction It is well known that loops monopolize most execution time of programs. I...
CoSynthesis of Instruction Sets and Microarchitectures
, 1994
"... The design of an instruction set processor includes several related design tasks: instruction set design, microarchitecture design, and code generation. Although there have been automatic approaches for each individual task, the investigation of the interaction between these tasks still primarily re ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
The design of an instruction set processor includes several related design tasks: instruction set design, microarchitecture design, and code generation. Although there have been automatic approaches for each individual task, the investigation of the interaction between these tasks still primarily relies on designers' experience and ingenuity. It is thus the goal of this research to develop formal models and algorithms to investigate such interaction systematically. This dissertation presents a twophase cosynthesis approach to the problem. In the architectural level, given a set of application benchmarks and a pipeline structure, the ASIA (Automatic Synthesis of Instruction set Architecture) design automation system generates an instruction set and allocates hardware resources which best fit the applications, and, at the same time, maps the applications to assembly code with the synthesized instruction set. This approach formulates the codesign problem as a modified scheduling/allocat...
A Genetic Approach to the Overlapped Scheduling of Iterative DataFlow Graphs for Target Architectures with Communication Delays
 ProRISC Workshop on Circuits, Systems and Signal Processing
, 1997
"... This paper presents a method to solve the overlapped fullystatic multiprocessor scheduling problem. An iterative dataflow graph (IDFG) is mapped on a target architecture that allows finegrain parallelism. The goal is the minimization of the iteration period. The method can deal with nonzero delay ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
This paper presents a method to solve the overlapped fullystatic multiprocessor scheduling problem. An iterative dataflow graph (IDFG) is mapped on a target architecture that allows finegrain parallelism. The goal is the minimization of the iteration period. The method can deal with nonzero delay times to communicate data between processors as well as with link capacities in the interconnection network. Excellent results for benchmark IDFGs have been obtained by the method that consists of three layers, each concentrating on a different aspect of the optimization problem. I. Introduction An algorithm that contains computations that can be executed simultaneously, offers possibilities of exploiting the parallelism present by implementing it on appropriate hardware such as a multiprocessor system. The class of algorithms considered in this paper is limited to algorithms that can be represented by homogeneous synchronous dataflow graphs [1], also called iterative dataflow graphs (ID...
An integer linear programming approach to the overlapped scheduling of iterative dataflow graphs for target architectures with communication delays
 In PROGRESS 2000 Workshop on Embedded Systems
, 2000
"... Abstract — This paper considers the scheduling of homogeneous synchronous dataflow graphs also called iterative dataflow graphs (IDFGs) on a multiprocessor system. Algorithms described by such graphs consist of a core computation that is iterated “infinitely often”. The computation does not contai ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Abstract — This paper considers the scheduling of homogeneous synchronous dataflow graphs also called iterative dataflow graphs (IDFGs) on a multiprocessor system. Algorithms described by such graphs consist of a core computation that is iterated “infinitely often”. The computation does not contain datadependent decisions. All scheduling decisions for such algorithms can be taken at compile time. Finegrain parallelism is assumed where the basic tasks are primitive operations (such as additions) and the interprocessor communication times are just a few clock cycles. Scheduling methods for such a model have recently been presented by several authors. These approaches assign operations to processors and data transfers to links at appropriate times. The work presented here extends the one reported in [16] based on integer linear programming. Optimal results to problems of reasonable size were found after acceptable computation times. I.
Automatic Resolutions of Pipeline Hazards
 University of Southern California
, 1993
"... Abstract — One major problem in pipeline synthesis is the detection and resolution of pipeline hazards. In this paper we present a new solution to the problem in the domain of pipelined applicationspecific instruction set processors, based on hardware/software concurrent engineering approach. An ex ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Abstract — One major problem in pipeline synthesis is the detection and resolution of pipeline hazards. In this paper we present a new solution to the problem in the domain of pipelined applicationspecific instruction set processors, based on hardware/software concurrent engineering approach. An extended taxonomy of interinstruction dependencies is proposed for the analysis of pipeline hazards. Hardware/software resolution candidates are then associated with these dependencies. Algorithms using the taxonomy and the resolutions are developed to detect and resolve pipeline hazards, and to explore the hardware and software design space. Application benchmarks are used to evaluate the designs and guide the design decision. The power of these tools are demonstrated through the pipeline synthesis of two processors including industrial one. Compared with other approaches, our method achieves higher throughput, and provides a way to explore the hardware/ software tradeoff. Our method can be combined with current approaches to achieve even higher performance since they are orthogonal. 1.