Results 1 - 10
of
13
Hardware-Software Co-Design of Embedded Reconfigurable Architectures
, 2000
"... In this paper we describe a new hardware/software partitioning approach for embedded reconfigurable architectures consisting of a general-purpose processor (CPU), a dynamically reconfigurable datapath (e.g. an FPGA), and a memory hierarchy. We have developed a framework called Nimble that automatica ..."
Abstract
-
Cited by 56 (2 self)
- Add to MetaCart
In this paper we describe a new hardware/software partitioning approach for embedded reconfigurable architectures consisting of a general-purpose processor (CPU), a dynamically reconfigurable datapath (e.g. an FPGA), and a memory hierarchy. We have developed a framework called Nimble that automatically compiles system-level applications specified in C to executables on the target platform. A key component of this framework is a hardware/software partitioning algorithm that performs finegrained partitioning (at loop and basic-block levels) of an application to execute on the combined CPU and datapath. The partitioning algorithm optimizes the global application execution time, including the software and hardware execution times, communication time and datapath reconfiguration time. Experimental results on real applications show that our algorithm is effective in rapidly finding close to optimal solutions.
Spatial Computation
- in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS
, 2004
"... This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the ..."
Abstract
-
Cited by 37 (10 self)
- Add to MetaCart
This paper describes a computer architecture, Spatial Computation (SC), which is based on the translation of high-level language programs directly into hardware structures. SC program implementations are completely distributed, with no centralized control. SC circuits are optimized for wires at the expense of computation units. In this paper we investigate a particular implementation of SC: ASH (Application-Specific Hardware). Under the assumption that computation is cheaper than communication, ASH replicates computation units to simplify interconnect, building a system which uses very simple, completely dedicated communication channels. As a consequence, communication on the datapath never requires arbitration; the only arbitration required is for accessing memory. ASH relies on very simple hardware primitives, using no associative structures, no multiported register files, no scheduling logic, no broadcast, and no clocks. As a consequence, ASH hardware is fast and extremely power efficient.
Pegasus: An efficient intermediate representation
, 2002
"... We present Pegasus, a compact and expressive intermediate representation for imperative languages. The representation is suitable for target architectures supporting predicated execution and aggressive speculation. In Pegasus information about the global dataflow of the program is encoded in local s ..."
Abstract
-
Cited by 28 (9 self)
- Add to MetaCart
We present Pegasus, a compact and expressive intermediate representation for imperative languages. The representation is suitable for target architectures supporting predicated execution and aggressive speculation. In Pegasus information about the global dataflow of the program is encoded in local structures, enabling compact and efficient algorithms for program optimizations. As a proof of the versatility of Pegasus, we have used it in a compiler translating C programs to hardware implementations. 1
Compiling application-specific hardware
- Montpellier (La Grande-Motte
, 2002
"... Abstract. In this paper we describe ASH, an architectural framework for implementing Application-Specific Hardware. ASH is based on automatic hardware synthesis from high-level languages. The generated circuits use only localized computation structures; in consequence, we expect these circuits to be ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Abstract. In this paper we describe ASH, an architectural framework for implementing Application-Specific Hardware. ASH is based on automatic hardware synthesis from high-level languages. The generated circuits use only localized computation structures; in consequence, we expect these circuits to be fast, to use little power and to scale well with program complexity. We present in detail CASH, a scalable compiler framework for ASH, which generates hardware from programs written in C. Our compiler exploits instruction level parallelism by using aggressive speculation and dynamic scheduling. Based on this compilation scheme, we evaluate the computational resources necessary for implementing complex integer-based programs, and we suggest architectural features that would support the ASH framework. 1
Reconfigurable Instruction Set Processors: A survey
- IEEE Transactions on Software Engineering
, 2000
"... Future interactive multimedia applications are characterized by a large variety of compression algorithms with highly parallel nested loops. It will not be efficient to design custom processors suitable for this wide range of applications due to the uncertainty on what is going to be executed. Inste ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
Future interactive multimedia applications are characterized by a large variety of compression algorithms with highly parallel nested loops. It will not be efficient to design custom processors suitable for this wide range of applications due to the uncertainty on what is going to be executed. Instead, we must find ways to cope with such dynamic and compute intensive tasks. Reconfigurable instruction set processors can cope with this dynamism by specializing the hardware to the algorithm at hand at runtime. They achieve this thanks to a flexible fabric of coarse-grained processing elements that can be reconfigured to perform different complex algorithms. This paper analyzes the performance improvements obtained by such programmable structures and discusses some of the critical issues, such as reconfiguration times. 1
C to Asynchronous Dataflow Circuits: An End-to-End Toolflow
, 2004
"... We present a complete toolflow that translates ANSI-C programs into asynchronous circuits. The toolflow is built around a compiler that converts C into a functional dataflow intermediate representation, exposing instruction-level, pipeline and memory parallelism. ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
We present a complete toolflow that translates ANSI-C programs into asynchronous circuits. The toolflow is built around a compiler that converts C into a functional dataflow intermediate representation, exposing instruction-level, pipeline and memory parallelism.
Resolution, Optimization, and Encoding of Pointer Variables for the Behavioral Synthesis from C
, 2001
"... As designers may model mixed hardware--software systems using a subset of or ++, we present SpC, a solution to synthesize and optimize hardware models with pointers. In hardware, a pointer is not only the address of data in memory, but it may also reference data mapped to registers, ports, or wires. ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
As designers may model mixed hardware--software systems using a subset of or ++, we present SpC, a solution to synthesize and optimize hardware models with pointers. In hardware, a pointer is not only the address of data in memory, but it may also reference data mapped to registers, ports, or wires. Pointer analysis is used to find the set of locations each pointer may reference in a program at compile time. In this paper, we address the problem of synthesizing and optimizing pointers to multiple variables or array elements. The value of the pointers are encoded and branching statements are used to dynamically access data referenced by pointers. A heuristic is used to efficiently encode the values of the pointers. Compiler techniques are also used to reduce storage before loads and stores. An implementation using the SUIF framework (Wilson et al., 1994; SUIF Compiler Framework) is presented, followed by some case studies and experimental results.
Peer-to-peer hardware-software interfaces for reconfigurable fabrics
- In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM
, 2002
"... In this paper we describe a peer-to-peer interface between processor cores and reconfigurable fabrics. The main advantage of the peer-to-peer model is that it greatly expands the scope of application for reconfigurable computing and hence its potential benefits. The primary extension in our model is ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
In this paper we describe a peer-to-peer interface between processor cores and reconfigurable fabrics. The main advantage of the peer-to-peer model is that it greatly expands the scope of application for reconfigurable computing and hence its potential benefits. The primary extension in our model is that “code ” on the reconfigurable hardware unit is allowed to invoke routines both on the reconfigurable unit itself and on the fixed logic processor. We describe the software constructs and compilation mechanisms needed for such an architecture, including a detailed description of the interface between the two parts of the application. 1
Stream computations organized for reconfigurable execution
, 2006
"... Reconfigurable systems can offer the high spatial parallelism and fine-grained, bit-level resource control traditionally associated with hardware implementations, along with the flexibility and adaptability characteristic of software. While reconfigurable systems create new opportunities for enginee ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Reconfigurable systems can offer the high spatial parallelism and fine-grained, bit-level resource control traditionally associated with hardware implementations, along with the flexibility and adaptability characteristic of software. While reconfigurable systems create new opportunities for engineering and delivering high-performance programmable systems, the traditional approaches to programming and managing computations used for hardware systems (e.g., Verilog, VHDL) and software systems (e.g., C, Fortran, Java) are inappropriate and inadequate for exploiting reconfigurable platforms. To address this need, we develop a stream-oriented compute model, system architecture, and execution patterns which can capture and exploit the parallelism of spatial computations while simultaneously abstracting software applications from hardware details (e.g., timing, device capacity, and microarchitectural implementation details) and consequently allowing applications to scale to exploit newer, larger, and faster hardware platforms. Further, we describe hardware and software techniques that make this late-bound platform mapping viable and efficient.
Application-Specific Hardware: Computing without CPUs
, 2001
"... In this paper we propose a new architecture for general-purpose computing which combines a reconfigurable-hardware substrate and compiler technology to generate Application-Specific Hardware (ASH). The novelty of this architecture is that resources are not shared: each different static program instr ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In this paper we propose a new architecture for general-purpose computing which combines a reconfigurable-hardware substrate and compiler technology to generate Application-Specific Hardware (ASH). The novelty of this architecture is that resources are not shared: each different static program instruction can have its own dedicated hardware implementation. ASH enables the synthesis of circuits with only local computation structures, which promise to be fast, inexpensive and use very little power. This paper also presents a scalable compiler framework for ASH, which generates hardware from programs written in C and some evaluations of the resources necessary for implementing realistic programs.

