Results 1 -
8 of
8
Generalized Latency-Insensitive Systems for Single-Clock and Multi-Clock Architectures
- In Design, Automation and Test in Europe (DATE’04
, 2004
"... Latency-insensitive systems were recently proposed by Carloni et al. as a correct-by-construction methodology for single-clock system-on-a-chip (SoC) design using predesigned IP blocks. Their approach overcomes the problem of long latencies of global interconnects in deep-submicron technologies, whi ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Latency-insensitive systems were recently proposed by Carloni et al. as a correct-by-construction methodology for single-clock system-on-a-chip (SoC) design using predesigned IP blocks. Their approach overcomes the problem of long latencies of global interconnects in deep-submicron technologies, while still maintaining much of the inherent simplicity of synchronous design. In particular, wires whose latency is greater than a clock cycle are segmented using “relay stations, ” and IP blocks are made robust to arbitrary communication delays. This paper shows, however, that significant extensions are needed to make latency-insensitive systems useful for the practical design of large-scale SoC’s. In particular, this paper proposes three extensions. The first extension allows each synchronous module to treat its input and output channels in a much more flexible manner, i.e., with greater decoupling. The second extension generalizes inter-module communication from point-to-point channels to more complex networks of arbitrary topologies. Finally, the third extension is to target multi-clock SoC’s. The net impact of our extensions is the potential for improved throughput, reduced power consumption, and greater flexibility in design. 1.
A High Performance, Energy Efficient, GALS Processor Microarchitecture with Reduced Implementation Complexity
- In International Symposium on Performance Analysis of Systems and Software
, 2005
"... As the costs and challenges of global clock distribution grow with each new microprocessor generation, a Globally Asynchronous, Locally Synchronous (GALS) approach becomes an attractive alternative. One proposed GALS approach, called a Multiple Clock Domain (MCD) processor, achieves impressive energ ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
As the costs and challenges of global clock distribution grow with each new microprocessor generation, a Globally Asynchronous, Locally Synchronous (GALS) approach becomes an attractive alternative. One proposed GALS approach, called a Multiple Clock Domain (MCD) processor, achieves impressive energy savings for a relatively low performance cost. However, the approach requires separating the processor into four domains, including separating the integer and memory domains which complicates load scheduling, and the implementation of 32 voltage and frequency levels in each domain. In addition, the hardwarebased control algorithm, though effective overall, produces a significant performance degradation for some applications. In this paper, we devise modifications to the MCD design that retain many of its benefits while greatly reducing the implementation complexity. We first determine that the synchronization channels that are most responsible for the MCD performance degradation are those involving cache access, and propose merging the integer and memory domains to virtually eliminate this overhead. We further propose significantly reducing the number of voltage levels, separating the Reorder Buffer into its own domain to permit front-end frequency scaling, separating the L2 cache to permit standard power optimizations to be used, and a new online algorithm that provides consistent results across our benchmark suite. The overall result is a significant reduction in the performance degradation of the original MCD approach and greater energy savings, with a greatly simplified microarchitecture that is much easier to implement.
High Rate Data Synchronization in GALS SoCs
- IEEE Transactions on VLSI
, 2006
"... Abstract—Globally asynchronous, locally synchronous (GALS) systems-on-chip (SoCs) may be prone to synchronization failures if the delay of their locally-generated clock tree is not considered. This paper presents an in-depth analysis of the problem and proposes a novel solution. The problem is analy ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Abstract—Globally asynchronous, locally synchronous (GALS) systems-on-chip (SoCs) may be prone to synchronization failures if the delay of their locally-generated clock tree is not considered. This paper presents an in-depth analysis of the problem and proposes a novel solution. The problem is analyzed considering the magnitude of clock tree delays, the cycle times of the GALS module, and the complexity of the asynchronous interface controllers using a timed signal transition graph (STG) approach. In some cases, the problem can be solved by extracting all the delays and verifying whether the system is susceptible to metastability. In other cases, when high data bandwidth is not required, matched-delay asynchronous ports may be employed. A novel architecture for synchronizing inter-modular communications in GALS, based on locally delayed latching (LDL), is described. LDL synchronization does not require pausable clocking, is insensitive to clock tree delays, and supports high data rates. It replaces complex global timing constraints with simpler localized ones. Three different LDL ports are presented. The risk of metastability in the synchronizer is analyzed in a technology-independent manner. Index Terms—Asynchronous circuits, globally asynchronous, locally synchronous (GALS), synchronization, system-on-chip (SoC).
Generalized Latency-Insensitive Systems for GALS Architectures. To appear in
- Proceedings of FMGALS’03, 2003
, 2003
"... Abstract. Latency-insensitive systems were recently proposed by Carloni et al. for the design of single-clock systems-on-a-chip (SoC’s) using predesigned IP blocks. The goal of this paper is to extend and generalize latency-insensitive systems in such a way that they can be applied to GALS architect ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. Latency-insensitive systems were recently proposed by Carloni et al. for the design of single-clock systems-on-a-chip (SoC’s) using predesigned IP blocks. The goal of this paper is to extend and generalize latency-insensitive systems in such a way that they can be applied to GALS architectures with multiple clocks. In particular, we propose two extensions. The first extension allows each synchronous module to treat its input and output channels in a much more flexible manner (i.e., greater decoupling). As a result, significant improvement in throughput as well as power consumption may be obtained. The second extension generalizes inter-module communication from point-to-point channels to more complex networks of arbitrary topologies. 1
A scalable dual-clock FIFO for data transfers between arbitrary and haltable clock domains
- IEEE Trans. Very Large Scale Integr. (VLSI) Syst
, 2007
"... Abstract—A robust, scalable, and power efficient dual-clock first-input first-out (FIFO) architecture which is useful for transferring data between modules operating in different clock domains is presented. The architecture supports correct operation in applications where multiple clock cycles of la ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract—A robust, scalable, and power efficient dual-clock first-input first-out (FIFO) architecture which is useful for transferring data between modules operating in different clock domains is presented. The architecture supports correct operation in applications where multiple clock cycles of latency exist between the data producer, FIFO, and the data consumer; and with arbitrary clock frequency changes, halting, and restarting in either or both clock domains. The architecture is demonstrated in both a 0.18- m CMOS full-custom design and a 0.18- m CMOS standard cell design used in a globally asynchronous locally synchronous array processor. It achieves 580-MHz operation and 10.3-mW power dissipation while performing simultaneous FIFO
Interface Design for Rationally Clocked GALS Systems
- in Proceedings of IEEE International Symposium on Asynchronous Circuits and Systems
, 2006
"... We investigate the problem of designing interface circuits for rationally clocked modules in GALS systems. As a key contribution, we show that knowledge of flow-control protocols can be used to significantly optimize synchronization mechanisms. We present delayaugmented netcharts as a formalism for ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We investigate the problem of designing interface circuits for rationally clocked modules in GALS systems. As a key contribution, we show that knowledge of flow-control protocols can be used to significantly optimize synchronization mechanisms. We present delayaugmented netcharts as a formalism for representing communication protocols and describe techniques to analyze them. We use the results of our analysis to design a simple yet generic interface that is optimized for the given protocol and is free from synchronization failures. We show by means of case studies the inherent advantages of our methodology over an existing solution technique. 1
Contents lists available at ScienceDirect
"... INTEGRATION, the VLSI journal journal homepage: www.elsevier.com/locate/vlsi ..."
Abstract
- Add to MetaCart
INTEGRATION, the VLSI journal journal homepage: www.elsevier.com/locate/vlsi

