Results 1 - 10
of
67
Asynchronous Design Methodologies: An Overview
- PROCEEDINGS OF THE IEEE
, 1995
"... Asynchronous design has been an active area of research since at least the mid 1950's, but has yet to achieve widespread use. We examine the benefits and problems inherent in asynchronous computations, and in some of the more notable design methodologies. These include Huffman asynchronous circui ..."
Abstract
-
Cited by 139 (0 self)
- Add to MetaCart
Asynchronous design has been an active area of research since at least the mid 1950's, but has yet to achieve widespread use. We examine the benefits and problems inherent in asynchronous computations, and in some of the more notable design methodologies. These include Huffman asynchronous circuits, burst-mode circuits, micropipelines, template-based and trace theory-based delay-insensitive circuits, signal transition graphs, change diagrams, and compilation-based quasi-delay-insensitive circuits.
Point-to-point connectivity between neuromorphic chips using address-events
- IEEE Trans. Circuits Syst. II
, 2000
"... Abstract — I discuss connectivity between neuromorphic chips, which use the timing of fixed-height, fixed-width, pulses to encode information. Address-events—log2 (N)-bit packets that uniquely identify one of N neurons—are used to transmit these pulses in real-time on a random-access, time-multiplex ..."
Abstract
-
Cited by 65 (15 self)
- Add to MetaCart
Abstract — I discuss connectivity between neuromorphic chips, which use the timing of fixed-height, fixed-width, pulses to encode information. Address-events—log2 (N)-bit packets that uniquely identify one of N neurons—are used to transmit these pulses in real-time on a random-access, time-multiplexed, communication channel. Activity is assumed to consist of neuronal ensembles—spikes clustered in space and in time. I quantify tradeoffs faced in allocating bandwidth, granting access, and queuing, as well as throughput requirements, and conclude that an arbitered channel design is the best choice. I implement the arbitered channel with a formal design methodology for asynchronous digital VLSI CMOS systems, after introducing the reader to this top-down synthesis technique. Following the evolution of three generations of designs, I show how the overhead of arbitrating, and encoding and decoding, can be reduced in area (from N to √ N) by organizing neurons into rows and columns, and reduced in time (from log2 (N) to 2) by exploiting locality in the arbiter tree and in the row–column architecture, and clustered activity. Throughput is boosted by pipelining and by reading spikes in parallel. Simple techniques that reduce crosstalk in these mixed analog–digital systems are described.
GasP: A Minimal FIFO Control
, 2001
"... The GasP family of asynchronous circuits provides controls for simple pipelines, for branching and joining pipelines, for round-robin scatter and gathel; for datadependent scatter and gathel; and for join on demand through arbitration. The family is designed so that each stage operates at the speed ..."
Abstract
-
Cited by 60 (1 self)
- Add to MetaCart
The GasP family of asynchronous circuits provides controls for simple pipelines, for branching and joining pipelines, for round-robin scatter and gathel; for datadependent scatter and gathel; and for join on demand through arbitration. The family is designed so that each stage operates at the speed of a three-inverter ring oscillator: Test chips in 0.35 micron technology exhibit throughput in excess of 1.5 giga data items per second
Power and Performance Evaluation of Globally Asynchronous Locally Synchronous Processors
, 2002
"... Due to shrinking technologies and increasing design sizes, it is becoming more difficult and expensive to distribute a global clock signal with low skew throughout a processor die. Asynchronous processor designs do not suffer from this problem since they do not have a global clock. However, a paradi ..."
Abstract
-
Cited by 56 (7 self)
- Add to MetaCart
Due to shrinking technologies and increasing design sizes, it is becoming more difficult and expensive to distribute a global clock signal with low skew throughout a processor die. Asynchronous processor designs do not suffer from this problem since they do not have a global clock. However, a paradigm shift from synchronous to asynchronous is unlikely to happen in the processor industry in the near future. Hence the study of Globally Asynchronous Locally Synchronous (or GALS) systems is relevant. In this paper we use a cycleaccurate simulation environment to study the impact of asynchrony in a superscalar processor architecture. Our results show that as expected, going from a synchronous to a GALS design causes a drop in performance, but elimination of the global clock does not lead to drastic power reductions. From a power perspective, GALS designs are inherently less efficient when compared to synchronous architectures. However, the flexibility offered by the independently controllable local clocks enables the effective use of other energy conservation techniques like dynamic voltage scaling. Our results show that for a 5-clock domain GALS processor, the drop in performance ranges between 5-15%, while power consumption is reduced by 10% on the average. Fine-grained voltage scaling reduces the gap between fully synchronous and GALS implementations, allowing for better power efficiency.
A Micropipelined ARM
, 1993
"... An asynchronous implementation of the ARM microprocessor is described. The design is based on Sutherland's Micropipelines, and allows considerable internal asynchronous concurrency. The rationale for the work is presented, the organisation of the chip described, and the characteristics of the chip d ..."
Abstract
-
Cited by 53 (13 self)
- Add to MetaCart
An asynchronous implementation of the ARM microprocessor is described. The design is based on Sutherland's Micropipelines, and allows considerable internal asynchronous concurrency. The rationale for the work is presented, the organisation of the chip described, and the characteristics of the chip described. The design displays unusual properties such as nondeterministic (but bounded) prefetch depth beyond a branch instruction. This work demonstrates the feasibility of building complex asynchronous systems and gives an indication of the costs and benefits of the Micropipeline approach. Keyword Codes: C.1.1; B.1.1; B.7.1 Keywords: Processor Architectures, Single Data Stream Architectures; Control Structures and Microprogramming, Control Design Styles; Integrated Circuits, Types and Design Styles 1. INTRODUCTION The power dissipation of high-performance CMOS VLSI microprocessors is becoming an increasing problem. Even when battery power and portability are not an issue the 20 to 30...
Counterflow Pipeline Processor Architecture
, 1994
"... : The counterflow pipeline processor architecture (cfpp) is a proposal for a family of microarchitectures for risc processors. The architecture derives its name from its fundamental feature, namely that instructions and results flow in opposite directions within a pipeline and interact as they pass. ..."
Abstract
-
Cited by 44 (0 self)
- Add to MetaCart
: The counterflow pipeline processor architecture (cfpp) is a proposal for a family of microarchitectures for risc processors. The architecture derives its name from its fundamental feature, namely that instructions and results flow in opposite directions within a pipeline and interact as they pass. The architecture seeks geometric regularity in processor chip layout, purely local control to avoid performance limitations of complex global pipeline stall signals, and simplicity that might lead to provably correct processor designs. Moreover, cfpp designs allow asynchronous implementations, in contrast to conventional pipeline designs where the synchronization required for operand forwarding makes asynchronous designs unattractive. This paper presents the cfpp architecture and a proposal for an asynchronous implementation. Detailed performance simulations of a complete processor design are not yet available. Keywords: processor design, risc architecture, micropipelines, fifo, asynchronou...
The Design of An Asynchronous Communications Chip
- IEEE Design & Test of Computers
, 1994
"... In this paper we describe a low-power infra-red communications receiver chip designed using asynchronous techniques. We focus on aspects that were difficult to implement asynchronously, and contrast our techniques with synchronous ones, where possible. We also detail the methodology used to design t ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
In this paper we describe a low-power infra-red communications receiver chip designed using asynchronous techniques. We focus on aspects that were difficult to implement asynchronously, and contrast our techniques with synchronous ones, where possible. We also detail the methodology used to design the chip, and the asynchronous toolset that was created to support it. 1. Introduction Asynchronous logic design is currently receiving much attention as an alternative to traditional synchronous design [2], [5], [6], [7], [8], [9], [13], [14]. Asynchronous designs do not use a global clock, simplifying global chip routing and eliminating problems due to clock skew. Furthermore, a large portion of a synchronous chip's power budget is dedicated to driving the clock, whereas elimination of the global clock in an asynchronous chip allows it to achieve near-zero standby power when quiescent. For this reason asynchronous design is particularly attractive for battery operated applications. Unfortu...
Scanning the Technology: Applications of Asynchronous Circuits
- Proceedings of the IEEE
, 1999
"... Abstract | A comparison with synchronous circuits suggests four opportunities for the application of asynchronous circuits: high performance, low power, improved noise and EMC properties, and a natural match with heterogeneous system timing. In this overview article each opportunity is reviewed in s ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
Abstract | A comparison with synchronous circuits suggests four opportunities for the application of asynchronous circuits: high performance, low power, improved noise and EMC properties, and a natural match with heterogeneous system timing. In this overview article each opportunity is reviewed in some detail, illustrated by examples, compared with synchronous alternatives, and accompanied by numerous pointers to the literature. Conditions for applying asynchronous circuit technology, such as the existence and availability of CAD tools, circuit libraries, and e ective test approaches, are discussed brie y. Asynchronous circuits do o er advantages for many applications, and their design methods and tools are now starting to become mature.
Bubbles Can Make Self-Timed Pipelines Fast
, 1990
"... We explore the practical limits on throughput imposed by timing in a long, self-timed, circulating pipeline (ring). We consider models with both fixed and random delays and derive exact results for pipelines where these delays are fixed or exponentially distributed random variables. We also give re ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
We explore the practical limits on throughput imposed by timing in a long, self-timed, circulating pipeline (ring). We consider models with both fixed and random delays and derive exact results for pipelines where these delays are fixed or exponentially distributed random variables. We also give relationships that provide upper and lower bounds on throughput for any pipeline where the delays are independent random variables. In each of these cases, we show that the asymptotic processor utilization is independent of the length of the pipeline; thus, linear speedup is achieved. We present conditions under which this utilization approaches 100%.
Self-Timed Logic Using Current-Sensing Completion Detection (CSCD)
- Journal of VLSI Signal Processing
, 1994
"... This article proposes a completion-detection method for efficiently implementing Boolean functions as self-timed logic structures. Current-Sensing Completion Detection, CSCD, allows self-timed circuits to be designed using single-rail variable encoding (one signal wire per logic variable) and imple ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
This article proposes a completion-detection method for efficiently implementing Boolean functions as self-timed logic structures. Current-Sensing Completion Detection, CSCD, allows self-timed circuits to be designed using single-rail variable encoding (one signal wire per logic variable) and implemented in about the same silicon area as an equivalent synchronous implementation. Compared to dual-rail encoding methods, CSCD can reduce the number of signal wires and transistors used by approximately 50%. CSCD implementations improved performance over equivalent dual-rail designs because of: (1) reduced parasitic capacitance, (2) removal of spacer tokens in the data stream, and (3) computation state similarity of consecutive data variables. Several CSCD configurations are described and evaluated and transistor-level implementations are provided for comparison.

