Results 1 - 10
of
29
Theory of latency-insensitive design
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
, 2001
"... Abstract—The theory of latency-insensitive design is presented as the foundation of a new correct-by-construction methodology to design complex systems by assembling intellectual property components. Latency-insensitive designs are synchronous distributed systems and are realized by composing functi ..."
Abstract
-
Cited by 75 (10 self)
- Add to MetaCart
Abstract—The theory of latency-insensitive design is presented as the foundation of a new correct-by-construction methodology to design complex systems by assembling intellectual property components. Latency-insensitive designs are synchronous distributed systems and are realized by composing functional modules that exchange data on communication channels according to an appropriate protocol. The protocol works on the assumption that the modules are stallable, a weak condition to ask them to obey. The goal of the protocol is to guarantee that latency-insensitive designs composed of functionally correct modules behave correctly independently of the channel latencies. This allows us to increase the robustness of a design implementation because any delay variations of a channel can be “recovered ” by changing the channel latency while the overall system functionality remains unaffected. As a consequence, an important application of the proposed theory is represented by the latency-insensitive methodology to design large digital integrated circuits by using deep submicrometer technologies. Index Terms—Deep submicrometer design, formal methods, latency-insensitive protocols, system design. I.
Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery
"... Diminutive devices and high clock frequency of future microprocessor generations are causing increased concerns for transient soft failures in hardware, necessitating fault detection and recovery mechanisms even in commodity processors. In this paper, we propose a fault-tolerant extension for modern ..."
Abstract
-
Cited by 58 (2 self)
- Add to MetaCart
Diminutive devices and high clock frequency of future microprocessor generations are causing increased concerns for transient soft failures in hardware, necessitating fault detection and recovery mechanisms even in commodity processors. In this paper, we propose a fault-tolerant extension for modern superscalar out-of-order datapath that can be supported by only modest additional hardware. In the proposed extensions, error-detection is achieved by verifying the redundant results of dynamically replicated threads of executions, while the error-recovery scheme employs the instruction-rewind mechanism to restart at a failed instruction. We study the performance impact of augmenting superscalar microarchitectures with this fault tolerance mechanism. An analytical performance model is used in conjunction with a performance simulator. The simulation results of 11 SPEC95 and SPEC2000 benchmarks show that in the absence of faults, error detection causes a 2 % to 45 % reduction in throughput, which is in line with other proposed detection schemes. In the presence of transient faults, the fast error recovery scheme contributes very little additional slowdown.
Optimizing pipelines for power and performance
- in International Symposium on Microarchitecture (MICRO35), Nov. 2002. Selected as one of the four Best IBM Research Papers in Computer Science, Electrical Engineering and Math published in
, 2002
"... During the concept phase and definition of next generation high-end processors, power and performance will need to be weighted appropriately to deliver competitive cost/performance. It is not enough to adopt a CPI-centric view alone in early-stage definition studies. One of the fundamental issues co ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
During the concept phase and definition of next generation high-end processors, power and performance will need to be weighted appropriately to deliver competitive cost/performance. It is not enough to adopt a CPI-centric view alone in early-stage definition studies. One of the fundamental issues confronting the architect at this stage is the choice of pipeline depth and target frequency. In this paper we present an optimization methodology that starts with an analytical power-performance model to derive optimal pipeline depth for a superscalar processor. The results are validated and further refined using detailed simulation based analysis. As part of the power-modeling methodology, we have developed equations that model the variation of energy as a function of pipeline depth. Our results using a set of SPEC2000 applications show that when both power and performance are considered for optimization, the optimal clock period is around 18 FO4. We also provide a detailed sensitivity analysis of the optimal pipeline depth against key assumptions of these energy models. 1
Performance Analysis and Optimization of Latency Insensitive Systems
, 2000
"... Latency insensitive design has been recently proposed in literature as a way to design complex digital systems, whose functional behavior is robust with respect to arbitrary variations in interconnect latency. However, this approach does not guarantee the same robustness for the performance of the d ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
Latency insensitive design has been recently proposed in literature as a way to design complex digital systems, whose functional behavior is robust with respect to arbitrary variations in interconnect latency. However, this approach does not guarantee the same robustness for the performance of the design, which indeed can experience big losses. This paper presents a simple, yet rigorous, method to (1) model the key properties of a latency insensitive system, (2) analyze the impact of interconnect latency on the overall throughput, and (3) optimize the performance of the final implementation.
Integrated analysis of power and performance for pipelined microprocessors
- IEEE Transactions on Computers
, 2004
"... been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be ..."
Abstract
-
Cited by 18 (8 self)
- Add to MetaCart
been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). Copies may be requested from IBM T. J. Watson Research Center, P.
Achieving Typical Delays in Synchronous Systems via Timing Error Toleration
, 2000
"... This paper introduces a hardware method of improving the performance of any synchronous digital system. We exploit the well-known observation that typical delays in synchronous systems are much less then the worst-case delays usually designed to, typically by factors of two or three or more. Our pro ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
This paper introduces a hardware method of improving the performance of any synchronous digital system. We exploit the well-known observation that typical delays in synchronous systems are much less then the worst-case delays usually designed to, typically by factors of two or three or more. Our proposed family of hardware solutions employs timing error toleration (TIMERRTOL) to take advantage of this characteristic. Briefly, TIMERRTOL works by operating the system at speeds corresponding to typical delays, detecting when timing errors occur, and then allocating more time for the signals to settle to their correct values. The reference paths in the circuitry operate at lower speeds so as to always exhibit correct values (worst-case delays). The nominal speedups of the solutions are the same as the ratio of worst-case to typical delays for the application system. The increases in cost and power dissipation are reasonable. We present the basic designs for a family of three solutions, and...
The Theory of Latency Insensitive Design
, 2001
"... The theory of latency insensitive design is presented as the foundation of a new correct by construction methodology to design complex systems by assembling Intellectual Property components. Latency insensitive designs are synchronous distributed systems and are realized by composing functional modu ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
The theory of latency insensitive design is presented as the foundation of a new correct by construction methodology to design complex systems by assembling Intellectual Property components. Latency insensitive designs are synchronous distributed systems and are realized by composing functional modules that exchange data on communication channels according to an appropriate protocol. The protocol works on the assumption that the modules are stallable, a weak condition to ask them to obey. The goal of the protocol is to guarantee that latency insensitive designs composed of functionally correct modules behave correctly independently of the channel latencies. This allows us to increase the robustness of a design implementation, because any delay variations of a channel can be "recovered" by changing the channel latency while the overall system functionality remains una#ected. As a consequence, an important application of the proposed theory is represented by the latency insensitive methodology to design large digital integrated circuits by using Deep Sub-Micron technologies. Keywords--- Formal Methods, System Design, Deep SubMicron Design, Latency Insensitive Protocols I.
Coming challenges in microarchitecture and architecture
- Proc. IEEE
, 2001
"... In the past several decades, the world of computers and especially that of microprocessors has witnessed phenomenal advances. Computers have exhibited ever-increasing performance and decreasing costs, making them more affordable and, in turn, accelerating additional software and hardware development ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
In the past several decades, the world of computers and especially that of microprocessors has witnessed phenomenal advances. Computers have exhibited ever-increasing performance and decreasing costs, making them more affordable and, in turn, accelerating additional software and hardware development that fueled this process even more. The technology that enabled this exponential growth is a combination of advancements in process technology, microarchitecture, architecture, and design and development tools. While the pace of this progress has been quite impressive over the last two decades, it has become harder and harder to keep up this pace. New process technology requires more expensive megafabs and new performance levels require larger die, higher power consumption, and enormous design and validation effort. Furthermore, as CMOS technology continues to advance, microprocessor design is exposed to a new set of challenges. In the near future, microarchitecture has to consider and explicitly manage the limits of semiconductor technology, such as wire delays, power dissipation, and soft errors. In this paper, we describe the role of microarchitecture in the computer world, present the challenges ahead of us, and highlight areas where microarchitecture can help address these challenges. Keywords—Design tradeoffs, microarchitecture, microarchitecture trends, microprocessor, performance improvements, power issues, technology scaling. I.
Uniprocessor Performance Enhancement Through Adaptive Clock Frequency Control
, 2003
"... This paper proposes a Timing Error Avoidance technique (TEAtime) to realize typical delays using standard synchronous design methodologies. The extra cost is very small, while the performance gains are substantial. The technique is applicable to any synchronous digital system. Correct results ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
This paper proposes a Timing Error Avoidance technique (TEAtime) to realize typical delays using standard synchronous design methodologies. The extra cost is very small, while the performance gains are substantial. The technique is applicable to any synchronous digital system. Correct results are ensured if the design guidelines are followed. Neither the base cycle time or the cycle count are affected by TEAtime. It is also easy to modify current designs to take advantage of TEAtime. In order to demonstrate TEAtimes capabilities and correct operation, we implemented a simple CPU and memory on a Xilinx FPGA (Field Programmable Gate Array) and ran it under various operating conditions. Over a wide range of temperatures TEAtime demonstrated performance improvements of about 34% over the baseline machines worst case specified performance. TEAtime adapted automatically to changing conditions, always stabilizing to a steady operating clock frequency. The remainder of this paper is organized as follows. Related work is reviewed in Section II. In Section III the basic ideas of timing error avoidance are presented, using our test CPU as a case study. Our experimental methodology is described in Section IV, with the experimental results presented in Section V. We conclude in Section VI. II. RELATED WORK There has been prior work somewhat similar to ours, but nothing that encompasses all of the attributes of our technique, not to mention actually demonstrating its functioning and characteristics with a real prototype. The closest work we are aware of is [10]. In this work a microcontroller has been modified so that it can self-tune its clock for "maximum" frequency. It does this by periodically pausing computation for up to 68 cycle...
Efficient Profile-Based Evaluation of Randomising Set Index Functions For Cache Memories
- In 2nd International Symposium on Performance Analysis of Systems and Software
, 2001
"... The performance of direct mapped caches is degraded by conflict misses. It has been shown that conflict misses can be reduced by using randomising set index functions, such that repeated conflicts are avoided. However, optimising the set index function requires time consuming simulations, because th ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
The performance of direct mapped caches is degraded by conflict misses. It has been shown that conflict misses can be reduced by using randomising set index functions, such that repeated conflicts are avoided. However, optimising the set index function requires time consuming simulations, because the design space of randomising set index functions is very large. Therefore, we developed a profilebased technique that allows one to make a fast estimation of the miss ratio incurred by a set index function. Using this technique, one can perform a fast, initial exploration of the design space of set index functions, followed by a slower, but more accurate, analysis using simulation. The profilebased technique is based on a new representation of randomising set index functions using null spaces. The profilebased technique consists of two phases. In the first phase, a program is profiled and in the second phase, a score is computed from the profile data and the null space of a set index function. We show that the computed score closely reflects the miss ratio incurred by that set index function. Computing a score is a simple operation that requires no simulation time. Therefore, only one profiling run is required to estimate the miss ratios for a wide range of set index functions. 1

