Results 1 - 10
of
88
Theory of latency-insensitive design
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
, 2001
"... Abstract—The theory of latency-insensitive design is presented as the foundation of a new correct-by-construction methodology to design complex systems by assembling intellectual property components. Latency-insensitive designs are synchronous distributed systems and are realized by composing functi ..."
Abstract
-
Cited by 75 (10 self)
- Add to MetaCart
Abstract—The theory of latency-insensitive design is presented as the foundation of a new correct-by-construction methodology to design complex systems by assembling intellectual property components. Latency-insensitive designs are synchronous distributed systems and are realized by composing functional modules that exchange data on communication channels according to an appropriate protocol. The protocol works on the assumption that the modules are stallable, a weak condition to ask them to obey. The goal of the protocol is to guarantee that latency-insensitive designs composed of functionally correct modules behave correctly independently of the channel latencies. This allows us to increase the robustness of a design implementation because any delay variations of a channel can be “recovered ” by changing the channel latency while the overall system functionality remains unaffected. As a consequence, an important application of the proposed theory is represented by the latency-insensitive methodology to design large digital integrated circuits by using deep submicrometer technologies. Index Terms—Deep submicrometer design, formal methods, latency-insensitive protocols, system design. I.
An introduction to asynchronous circuit design
- THE ENCYCLOPEDIA OF COMPUTER SCIENCE AND TECHNOLOGY
, 1997
"... ..."
RAPPID: An Asynchronous Instruction Length Decoder
, 1999
"... This paper describes an investigation of potential advantages and risks of applying an aggressive asynchronous design methodology to Intel Architecture. RAPPID ("Revolving Asynchronous Pentium Processor Instruction Decoder"), a prototype IA32 instruction length decoding and steering unit, was implem ..."
Abstract
-
Cited by 50 (31 self)
- Add to MetaCart
This paper describes an investigation of potential advantages and risks of applying an aggressive asynchronous design methodology to Intel Architecture. RAPPID ("Revolving Asynchronous Pentium Processor Instruction Decoder"), a prototype IA32 instruction length decoding and steering unit, was implemented using self-timed techniques. RAPPID chip was fabricated on a 0.25µ CMOS process and tested successfully. Results show significant advantages---in particular, performance of 2.5-4.5 instructions/nS---with manageable risks using this design technology. RAPPID achieves three times the throughput and half the latency, dissipating only half the power and requiring about the same area as an existing 400MHz clocked circuit.
Computer-Aided Synthesis And Verification Of Gate-Level Timed Circuits
, 1995
"... In recent years, there has been a resurgence of interest in the design of asynchronous circuits due to their ability to eliminate clock skew problems, achieve average case performance, adapt to processing and environmental variations, provide component modularity, and lower system power requirement ..."
Abstract
-
Cited by 42 (16 self)
- Add to MetaCart
In recent years, there has been a resurgence of interest in the design of asynchronous circuits due to their ability to eliminate clock skew problems, achieve average case performance, adapt to processing and environmental variations, provide component modularity, and lower system power requirements. Traditional academic asynchronous designs methods use unbounded delay assumptions, resulting in circuits that are verifiable, but ignore timing for simplicity, leading to unnecessarily conservative designs. In industry, however, timing is critical to reduce both chip area and circuit delay. Due to a lack of formal methods that handle timing information correctly, circuits with timing constraints usually require extensive simulation to gain confidence in the design. This thesis bridges this gap by introducing timed circuits in which explicit timing information is incorporated into the specification and utilized throughout the design procedure to optimize the implementation. Our timed circu...
Low-Power Encodings for Global Communication in CMOS VLSI
, 1997
"... Technology trends and especially portable applications are adding a third dimension (power) to the previously two-dimensional (speed, area) VLSI design space [30]. A large portion of power dissipation in high performance CMOS VLSI is due to the inherent difficulties in global communication at high r ..."
Abstract
-
Cited by 37 (2 self)
- Add to MetaCart
Technology trends and especially portable applications are adding a third dimension (power) to the previously two-dimensional (speed, area) VLSI design space [30]. A large portion of power dissipation in high performance CMOS VLSI is due to the inherent difficulties in global communication at high rates and we propose several approaches to address the problem. These techniques can be generalized at different levels in the design process. Global communication typically involves driving large capacitive loads which inherently require significant power. However, by carefully choosing the data representation, or encoding, of these signals, the average and peak power dissipation can be minimized. Redundancy can be added in space (number of bus lines), time (number of cycles) and voltage (number of distinct amplitude levels). The proposed codes can be used on a class of terminated off-chip board-level buses with level signaling, or on tri-state on-chip buses with level or transition signalin...
Towards Nanocomputer Architecture
, 2002
"... At the nanometer scale, the focus of micro-architecture will move from processing to communication. Most general computer architectures to date have been based on a "stored program" paradigm that differentiates between memory and processing and relies on communication over busses and other (relative ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
At the nanometer scale, the focus of micro-architecture will move from processing to communication. Most general computer architectures to date have been based on a "stored program" paradigm that differentiates between memory and processing and relies on communication over busses and other (relatively) long distance mechanisms. Nanometer-scale electronics -- nanoelectronics - promises to fundamentally change the ground-rules. Processing will be cheap and plentiful, interconnection expensive but pervasive. This will tend to move computer architecture in the direction of locallyconnected, reconfigurable hardware meshes that merge processing and memory. If the overheads associated with reconfigurability can be reduced or even eliminated, architectures based on non-volatile, reconfigurable, finegrained meshes with rich, local interconnect offer a better match to the expected characteristics of future nanoelectronic devices.
Relative Timing
, 1999
"... Relative Timing is introduced as an informal method for aggressive asynchronous design. It is demonstrated on three example circuits (C-Element, FIFO, and RAPPID Tag Unit), facilitating transformations from speed-independent circuits to burst-mode, relative timed, and pulse-mode circuits. Relative t ..."
Abstract
-
Cited by 28 (12 self)
- Add to MetaCart
Relative Timing is introduced as an informal method for aggressive asynchronous design. It is demonstrated on three example circuits (C-Element, FIFO, and RAPPID Tag Unit), facilitating transformations from speed-independent circuits to burst-mode, relative timed, and pulse-mode circuits. Relative timing enables improved performance, area, power and testability in all three cases. 1. Introduction The design of RAPPID, the asynchronous instruction length decoder, took more than two years to complete [13]. Beyond investigating whether asynchronous design could improve performance, we also wanted to find out which design styles and circuit families are most suitable for aggressive circuit design. We started with Speed Independent (SI) and Extended Burst Mode (XBM) specifications. However, existing synthesis tools [5, 17] yielded results that were less than satisfactory for critical paths. Next, we turned to timed design and employed a metric timing synthesis tool [9]. The resulting cir...
Elliptic curve cryptosystems on reconfigurable hardware
- MASTER’S THESIS, WORCESTER POLYTECHNIC INST
, 1998
"... Security issues will play an important role in the majority of communication and computer networks of the future. As the Internet becomes more and more accessible to the public, security measures will have to be strengthened. Elliptic curve cryptosystems allow for shorter operand lengths than other ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
Security issues will play an important role in the majority of communication and computer networks of the future. As the Internet becomes more and more accessible to the public, security measures will have to be strengthened. Elliptic curve cryptosystems allow for shorter operand lengths than other public-key schemes based on the discrete logarithm in finite fields and the integer factorization problem and are thus attractive for many applications. This thesis describes an implementation of a crypto engine based on elliptic curves. The underlying algebraic structures are composite Galois fields GF((2 n) m) in a standard base representation. As a major new feature, the system is developed for a reconfigurable platform based on Field Programmable Gate Arrays (FPGAs). FPGAs combine the flexibility of software solutions with the security of traditional hardware implementations. In particular, it is possible to easily change all algorithm parameters such as curve coefficients, field order, or field representation. The thesis deals with the design and implementation of elliptic curve point multiplicationarchitectures. The architectures are described in VHDL and mapped to Xilinx FPGA devices. Architectures over Galois fields of different order and representation were implemented and compared. Area and timing measurements are provided for all architectures. It is shown that a full point multiplication on elliptic curves of real-world size can be implemented on commercially available FPGAs.
FM-QoS: Real-time Communication Using Selfsynchronizing Schedules
- in Proceedings of Supercomputing Conference
, 1997
"... FM-QoS employs a novel communication architecture based on network feedback to provide predictable communication performance (e.g. deterministic latencies and guaranteed bandwidths) for high speed cluster interconnects. Network feedback is combined with self-synchronizing communication schedules to ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
FM-QoS employs a novel communication architecture based on network feedback to provide predictable communication performance (e.g. deterministic latencies and guaranteed bandwidths) for high speed cluster interconnects. Network feedback is combined with self-synchronizing communication schedules to achieve synchrony in the network interfaces (NIs). Based on this synchrony, the network can be scheduled to provide predictable performance without special network QoS hardware. We describe the key element of the FM-QoS approach, feedback-based synchronization (FBS), which exploits network feedback to synchronize senders. We use Petri nets to characterize the set of selfsynchronizing communication schedules for which FBS is effective and to describe the resulting synchronization overhead as a function of the clock drift across the network nodes. Analytic modeling suggests that for clocks of quality 300 ppm (such as found in the Myrinet NI), a synchronization overhead less than 1% of the tota...
A Fine-Grain Phased Logic CPU
- in IEEE Computer Society Annual Symposium on VLSI (ISVLSI
, 2003
"... A five-stage pipelined CPU based on the MIPs ISA is mapped to a self-timed logic family known as Phased Logic (PL). The mapping is performed automatically from a netlist of D-Flip-Flops and 4-input Lookup Tables (LUT4s) to a netlist of Phased Logic gates. Each PL gate implements a 4-input Lookup Tab ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
A five-stage pipelined CPU based on the MIPs ISA is mapped to a self-timed logic family known as Phased Logic (PL). The mapping is performed automatically from a netlist of D-Flip-Flops and 4-input Lookup Tables (LUT4s) to a netlist of Phased Logic gates. Each PL gate implements a 4-input Lookup Table in addition to control logic required for the PL control scheme. PL offers a speedup technique known as Early Evaluation that can be used to boost performance at the cost of additional PL gates. Several different PL gate-level implementations are produced to explore different architectural tradeoffs using early evaluation. Simulations run for five benchmark programs show an average speedup of 1.48 over the clocked netlist at the cost of 17% additional PL gates.

