Results 1  10
of
12
Optimization of NULL convention selftimed circuits
 INTEGRATION, THE VLSI JOURNAL
, 2004
"... Selftimed logic design methods are developed using Threshold Combinational Reduction (TCR) within the NULL Convention Logic (NCL) paradigm. NCL logic functions are realized using 27 dist inct t ansist or net works implement ng t e set of all funct ions of four or fewer variables,t hus facilit at in ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
Selftimed logic design methods are developed using Threshold Combinational Reduction (TCR) within the NULL Convention Logic (NCL) paradigm. NCL logic functions are realized using 27 dist inct t ansist or net works implement ng t e set of all funct ions of four or fewer variables,t hus facilit at ing a variet y of gat elevel opt imizat ions. TCR opt imizat ions are formalized for NCL andt hen assessed by comparing levels of gat delays, gat e count s, t ansist or count , and power ut ilizat on of t e result ing designs. The met hods are illust rat dt o produce (1) fundament al logic funct ions tat are 2.2 2.3t imes fast er and require 40 45% fewer t ansist orst han convent ional canonical designs, (2) a Full Adder wit h reduced crit ical pat delay and t ansist or count over various alt rnat ive gat elevel synt hesis approaches, result ing in a circuit witat least 48% fewer t ansist ors, half as many gat delays t generat e t e carry out ut , and t e same number of gat e delays t generat t e sum outut , as it nearest compet it ors, and (3)t ime, space, and power opt mized increment circuit s for a 4bit upcount er, result ing in at hroughputopt imized designt hat is 14% and 82% fast er t an area and poweropt mized designs, respect ively, an areaopt imized design tat requires 22% and 42% fewer t ansist orst hant he speed and poweropt imized designs, respect vely, and a poweropt imized design t at dissipat es 63% and 42% less power t an t e speed and areaopt imized designs, respect vely. Result s demonst rat e support for a variet y of opt mizat ions ut lizing convent onal Boolean minimization followed by tabledriven gate substitutions, providing for an NCL design method that is readily automatable.
Selftimed architecture of a reduced instruction set computer
 Asynchronous Design Methodologies
, 1993
"... An advanced SelfTimed Reduced Instruction Set Computer (STRISC) architecture is described. It is designed hierarchically, and is formally specified functionally at the various levels by a CSPlike language. The architectural features include decoupled data and branch processors, delayed branches w ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
An advanced SelfTimed Reduced Instruction Set Computer (STRISC) architecture is described. It is designed hierarchically, and is formally specified functionally at the various levels by a CSPlike language. The architectural features include decoupled data and branch processors, delayed branches with variable delay, unified data path and control, efficient nonredundant handshaking protocols, and novel selftimed building blocks such as combinational logic, masterslave registers, finite state machines, and FIFO elements.
DelayInsensitive gatelevel pipelining
 Integration, the VLSI journal
, 2001
"... GateLevel Pipelining (GLP) techniques are developed to design throughputoptimal delayinsensitive digital systems using NULL Convention Logic (NCL). Pipelined NCL systems consist of Combinational, Registration, and Completion circuits implemented using threshold gates equipped with hysteresis beha ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
GateLevel Pipelining (GLP) techniques are developed to design throughputoptimal delayinsensitive digital systems using NULL Convention Logic (NCL). Pipelined NCL systems consist of Combinational, Registration, and Completion circuits implemented using threshold gates equipped with hysteresis behavior. NCL Combinational circuits provide the desired processing behavior between Asynchronous Registers that regulate wavefront propagation. NCL Completion logic detects completed DATA or NULL output sets from each register stage. GLP techniques cascade registration and completion elements to systematically partition a combinational circuit and allow controlled overlapping of input wavefronts. Both fullword and bitwise completion strategies are applied progressively to select the optimal size grouping of operand and output data bits. To illustrate the methodology, GLP is applied to a case study of a 4bit by 4bit unsigned multiplier, yielding a speedup of 2.25 over the nonpipelined version, while maintaining delayinsensitivity. Even though delayinsensitive design methodologies do not utilize clocked control
Speedup of DelayInsensitive Digital Systems Using NULL Cycle Reduction
, 2001
"... A NULL Cycle Reduction (NCR) technique is developed to increase the throughput of delayinsensitive digital systems. NCR reduces the time required to flush complete DATA wavefronts, commonly referred to as the NULL or Empty cycle. The NCR technique exploits parallelism by partitioning input wavefron ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
A NULL Cycle Reduction (NCR) technique is developed to increase the throughput of delayinsensitive digital systems. NCR reduces the time required to flush complete DATA wavefronts, commonly referred to as the NULL or Empty cycle. The NCR technique exploits parallelism by partitioning input wavefronts such that one circuit processes a DATA wavefront, while its duplicate processes a NULL wavefront. To illustrate the technique, NCR is applied to a case study of a dualrail nonpipelined 4bit by 4bit unsigned multiplier, yielding a speedup of 1.61 over the standalone version, while maintaining delayinsensitivity. NCR is also applied to a single slow stage of a pipeline to boost the pipeline's overall throughput by 21%.
Design of a Logic Element for Implementing an Asynchronous FPGA
"... A reconfigurable logic element (LE) is developed for use in constructing a NULL Convention Logic (NCL) FPGA. It can be configured as any of the 27 fundamental NCL gates, including resettable and inverting variations, and can utilize embedded registration for gates with three or fewer inputs. The dev ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
A reconfigurable logic element (LE) is developed for use in constructing a NULL Convention Logic (NCL) FPGA. It can be configured as any of the 27 fundamental NCL gates, including resettable and inverting variations, and can utilize embedded registration for gates with three or fewer inputs. The developed LE is compared with a previous NCL LE, showing that the one developed herein yields a more area efficient NCL circuit implementation. The NCL FPGA logic element is simulated at the transistor level using the 1.8V, 180nm TSMC CMOS process.
Speedup of SelfTimed Digital Systems Using Early Completion”, The
 IEEE Computer Society Annual Symposium on VLSI
, 2002
"... An Early Completion technique is developed to significantly increase the throughput of NULL Convention selftimed digital systems without impacting latency or compromising their selftimed nature. Early Completion performs the completion detection for registration stagei at the input of the register ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
An Early Completion technique is developed to significantly increase the throughput of NULL Convention selftimed digital systems without impacting latency or compromising their selftimed nature. Early Completion performs the completion detection for registration stagei at the input of the register, instead of at the output of the register, as in standard NULL Convention Logic. This method requires that the singlerail completion signal from registration stagei+1, Koi+1, be used as an additional input to the completion detection circuitry for registration stagei, to maintain selftimed operation. However, Early Completion does necessitate an assumption of equipotential regions, introducing a few easily satisfiable timing assumptions, thus making the design potentially more delaysensitive. To illustrate the technique, Early Completion is applied to a case study of the optimally pipelined 4bit by 4bit unsigned multiplier utilizing fullword completion, presented in [1], where a speedup of 1.21 is achieved while selftimed operation is maintained and latency remains unchanged. 1.
Optimization of Robust Asynchronous Circuits by Local Input Completeness Relaxation
, 2007
"... As process, temperature and voltage variations become significant in deep submicron design, timing closure becomes a critical challenge using synchronous CAD flows. One attractive alternative is to use robust asynchronous circuits which gracefully accommodate timing discrepancies. However, these as ..."
Abstract
 Add to MetaCart
As process, temperature and voltage variations become significant in deep submicron design, timing closure becomes a critical challenge using synchronous CAD flows. One attractive alternative is to use robust asynchronous circuits which gracefully accommodate timing discrepancies. However, these asynchronous circuits typically suffer from high area and latency overhead. In this paper, an optimization algorithm is presented which reduces the area and delay of these circuits by relaxing their overlyrestrictive style. The algorithm was implemented and experiments performed on a subset of MCNC circuits. On average, 49.2 % of the gates could be implemented in a relaxed manner, 34.9 % area improvement was achieved, and 16.1 % delay improvement was achieved using a simple heuristic for targeting the critical path in the circuit. This is the first proposed approach that systematically optimizes asynchronous circuits based on the notion of local relaxation while still preserving the circuit’s overall timingrobustness.
NULL Convention Multiply and Accumulate Unit with Conditional Rounding, Scaling, and Saturation
"... Approaches for maximizing throughput of selftimed multiplyaccumulate units (MACs) are developed and assessed using the NULL Convention Logic (NCL) paradigm. In this class of selftimed circuits, the functional correctness is independent of any delays in circuit elements, through circuit constructi ..."
Abstract
 Add to MetaCart
Approaches for maximizing throughput of selftimed multiplyaccumulate units (MACs) are developed and assessed using the NULL Convention Logic (NCL) paradigm. In this class of selftimed circuits, the functional correctness is independent of any delays in circuit elements, through circuit construction, and independent of any wire delays, through the isochronic fork assumption [1, 2], where wire delays are assumed to be much less than gate delays. Therefore selftimed circuits provide distinct advantages for SystemonaChip applications. First, a number of alternative MAC algorithms are compared and contrasted in terms of throughput and area to determine which approach will yield the maximum throughput with the least area. It was determined that two algorithms that meet these criteria well are the Modified BaughWooley and Modified Booth2 algorithms. Dualrail nonpipelined versions of these algorithms were first designed using the Threshold Combinational Reduction (TCR) method [3]. The nonpipelined designs were then optimized for throughput using the GateLevel Pipelining (GLP) method [4]. Finally, each design was simulated using Synopsys to quantify the advantage of the dualrail pipelined Modified BaughWooley MAC, which yielded a speedup of 2.5 over its