Results 1 - 10
of
16
Blueshift: Designing processors for timing speculation from the ground up
- In Proceesings of the 15th IEEE International Symposium on High Performance Computer Architecture (HPCA),pages
, 2009
"... Several recent processor designs have proposed to enhance performance by increasing the clock frequency to the point where timing faults occur, and by adding error-correcting support to guarantee correctness. However, such Timing Speculation (TS) proposals are limited in that they assume traditional ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Several recent processor designs have proposed to enhance performance by increasing the clock frequency to the point where timing faults occur, and by adding error-correcting support to guarantee correctness. However, such Timing Speculation (TS) proposals are limited in that they assume traditional design methodologies that are suboptimal under TS. In this paper, we present a new approach where the processor itself is designed from the ground up for TS. The idea is to identify and optimize the most frequently-exercised critical paths in the design, at the expense of the majority of the static critical paths, which are allowed to suffer timing errors. Our approach and design optimization algorithm are called BlueShift. We also introduce two techniques that, when applied under BlueShift, improve processor performance: On-demand Selective Biasing (OSB) and Path Constraint Tuning (PCT). Our evaluation with modules from the OpenSPARC T1 processor shows that, compared to conventional TS, BlueShift with OSB speeds up applications by an average of 8 % while increasing the processor power by an average of 12%. Moreover, compared to a high-performance TS design, BlueShift with PCT speeds up applications by an average of 6 % with an average processor power overhead of 23 % — providing a way to speed up logic modules that is orthogonal to voltage scaling. 1
Deployment of Better Than Worst-Case Design: Solutions and Needs
, 2005
"... The advent of nanometer feature sizes in silicon fabrication has triggered a number of new design challenges for computer designers. These challenges include design complexity and operation in the presence of environmental and device uncertainty. To make things worse, these new challenges add to the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The advent of nanometer feature sizes in silicon fabrication has triggered a number of new design challenges for computer designers. These challenges include design complexity and operation in the presence of environmental and device uncertainty. To make things worse, these new challenges add to the many challenges that designers already face in order to scale system performance while meeting power and reliability budgets. Current design objectives are being met by applying even more engineers and increasing overall design times, an unsustainable trend. This paper overviews a novel design strategy, called Better Than Worst-Case design, that addresses these challenges through a methodology based on separating the concerns of performance and reliability by coupling complex design components with simple reliable checker mechanisms. We present the key aspects of Better Than Worst-Case Design and cover some recently proposed solutions that deploy this technique in application domains ranging from microprocessors to digital signal processors. We then highlight a few aspects that need to be addressed to make this approach more practical in general contexts and suggest possible solutions.
Logic Synthesis and Circuit Customization Using Extensive External Don’t-Cares
"... Traditional digital circuit synthesis flows start from an HDL behavioral definition and assume that circuit functions are almost completely defined, making don’t-care conditions rare. However, recent design methodologies do not always satisfy these assumptions. For instance, third-party IP blocks us ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Traditional digital circuit synthesis flows start from an HDL behavioral definition and assume that circuit functions are almost completely defined, making don’t-care conditions rare. However, recent design methodologies do not always satisfy these assumptions. For instance, third-party IP blocks used in a system-on-chip are often over-designed for the requirements at hand. By focusing only on the input combinations occurring in a specific application, one could resynthesize the system to greatly reduce its area and power consumption. Therefore we extend modern digital synthesis with a novel technique, called SWEDE, that makes use of extensive external don’t-cares. In addition, we utilize such don’t-cares present implicitly in existing simulation-based verification environments for circuit customization. Experiments indicate that SWEDE scales to large ICs with half-million input vectors and handles practical cases well.
Superscalar Processor Performance Enhancement Through Reliable Dynamic Clock Frequency Tuning ∗
"... Synchronous circuits are typically clocked considering worst case timing paths so that timing errors are avoided under all circumstances. In the case of a pipelined processor, this has special implications since the operating frequency of the entire pipeline is limited by the slowest stage. Our goal ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Synchronous circuits are typically clocked considering worst case timing paths so that timing errors are avoided under all circumstances. In the case of a pipelined processor, this has special implications since the operating frequency of the entire pipeline is limited by the slowest stage. Our goal, in this paper, is to achieve higher performance in superscalar processors by dynamically varying the operating frequency during run time past worst case limits. The key objective is to see the effect of overclocking on superscalar processors for various benchmark applications, and analyze the associated overhead, in terms of extra hardware and error recovery penalty, when the clock frequency is adjusted dynamically. We tolerate timing errors occurring at speeds higher than what the circuit is designed to operate at by implementing an efficient error detection and recovery mechanism. We also study the limitations imposed by minimum path constraints on our technique. Experimental results show that an average performance gain up to 57% across all benchmark applications is achievable.
Exploring Circuit Timing-aware Languages and Compilation
"... By adjusting the design of the ISA and enabling circuit timingsensitive optimizations in a compiler, we can more effectively exploit timing speculation. While there has been growing interest in systems that leverage circuit-level timing speculation to improve the performance and power-efficiency of ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
By adjusting the design of the ISA and enabling circuit timingsensitive optimizations in a compiler, we can more effectively exploit timing speculation. While there has been growing interest in systems that leverage circuit-level timing speculation to improve the performance and power-efficiency of processors, most of the innovation has been at the microarchitectural level. We make the observation that some code sequences place greater demand on circuit timing deadlines than others. Furthermore, by selectively replacing these codes with instruction sequences which are semantically equivalent but reduce activity on timing critical circuit paths, we can trigger fewer timing errors and hence reduce recovery costs. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors—code generation, compilers, optimization; C.0 [General]: Hardware/software interfaces
Flikker: Saving DRAM Refresh-power through Critical Data Partitioning
"... Energy has become a first-class design constraint in computer systems. Memory is a significant contributor to total system power. This paper introduces Flikker, an application-level technique to reduce refresh power in DRAM memories. Flikker enables developers to specify critical and non-critical da ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Energy has become a first-class design constraint in computer systems. Memory is a significant contributor to total system power. This paper introduces Flikker, an application-level technique to reduce refresh power in DRAM memories. Flikker enables developers to specify critical and non-critical data in programs and the runtime system allocates this data in separate parts of memory. The portion of memory containing critical data is refreshed at the regular refresh-rate, while the portion containing non-critical data is refreshed at substantially lower rates. This partitioning saves energy at the cost of a modest increase in data corruption in the non-critical data. Flikker thus exposes and leverages an interesting trade-off between energy consumption and hardware correctness. We show that many applications are naturally tolerant to errors in the non-critical data, and in the vast majority of cases, the errors have little or no impact on the application’s final outcome. We also find that Flikker can save between 20-25 % of the power consumed by the memory sub-system in a mobile device, with negligible impact on application performance. Flikker is implemented almost entirely in software, and requires only modest changes to the hardware.
through Critical Data Partitioning
, 2009
"... Mobile devices are left in sleep mode for long periods of time. But even while in sleep mode, the contents of DRAM memory need to be periodically refreshed, which consumes a significant fraction of power in mobile devices. This paper introduces Flicker, an application-level technique to reduce refre ..."
Abstract
- Add to MetaCart
Mobile devices are left in sleep mode for long periods of time. But even while in sleep mode, the contents of DRAM memory need to be periodically refreshed, which consumes a significant fraction of power in mobile devices. This paper introduces Flicker, an application-level technique to reduce refresh power in DRAM memories. Flicker enables developers to specify critical and non-critical data in programs and the runtime system allocates this data in separate parts of memory. The portion of memory containing critical data is refreshed at the regular refresh-rate, while the portion containing non-critical data is refreshed at substantially lower rates. This saves energy at the cost of a modest increase in data corruption in the non-critical data. Flicker thus explores a novel and interesting trade-off between energy consumption and hardware correctness. We show that many mobile applications are naturally tolerant to errors in the non-critical data, and in the vast majority of cases, the errors have little or no impact on the application’s final outcome. We also find that Flicker can save between 20-25 % of the power consumed by the memory subsystem in a mobile device, with negligible impact on application performance. Flicker is implemented almost entirely in software, and requires only modest changes to the application, operating system and hardware. 1
Logic Synthesis for Better Than Worst-case Designs
"... In this paper we present a novel metric for measuring and optimizing the performance of circuits that operate with the clock period smaller than the worst-case delay. In particular, we developed an efficient logic optimization operation “balance ” and a library mapping algorithm named BTWLibMap. Tog ..."
Abstract
- Add to MetaCart
In this paper we present a novel metric for measuring and optimizing the performance of circuits that operate with the clock period smaller than the worst-case delay. In particular, we developed an efficient logic optimization operation “balance ” and a library mapping algorithm named BTWLibMap. Together they are able to reduce the probability of a timing error by 2.3X while only incurring a 4 % area overhead.
A Variation-Tolerant Scheduler for Better Than Worst-Case Behavioral Synthesis
"... Abstract – There has been a recent shift in design paradigms, with many turning towards yield-driven approaches to synthesize and design systems. A major cause of this shift is the continual scaling of transistors, making process variation impossible to ignore. Better than worstcase (BTW) designs al ..."
Abstract
- Add to MetaCart
Abstract – There has been a recent shift in design paradigms, with many turning towards yield-driven approaches to synthesize and design systems. A major cause of this shift is the continual scaling of transistors, making process variation impossible to ignore. Better than worstcase (BTW) designs also exploit these variation effects, while also addressing performance limits due to worst-case analysis. In this paper we first present the variation-tolerant stallable-FSM architecture, which provides fault detection and recovery, allowing circuits to be clocked at better than worst-case delays. Then we propose the BTW scheduler, a 0-1 integer linear programming (ILP) scheduling algorithm with the objective of minimizing the expected latency, to provide a high-level synthesis aid for the stallable-FSM architecture. We implemented the algorithm and ran it through many benchmarks, comparing the results with scheduling algorithms based on worst-case analysis. Our results were promising, showing up to 41 % latency reduction for the BTW scheduler, and up to 43 % latency reduction when combined with the variation-tolerant architecture. 1.
ECE Departments
"... Abstract—Current processor designs have a critical operating point that sets a hard limit on voltage scaling. Any scaling beyond the critical voltage results in exceeding the maximum allowable error rate, i.e., there are more timing errors than can be effectively and gainfully detected or corrected ..."
Abstract
- Add to MetaCart
Abstract—Current processor designs have a critical operating point that sets a hard limit on voltage scaling. Any scaling beyond the critical voltage results in exceeding the maximum allowable error rate, i.e., there are more timing errors than can be effectively and gainfully detected or corrected by an error-tolerance mechanism. This limits the effectiveness of voltage scaling as a knob for reliability/power tradeoffs. In this paper, we present power-aware slack redistribution, a novel design-level approach to allow voltage/reliability tradeoffs in processors. Techniques based on power-aware slack redistribution reapportion timing slack of the frequently-occurring, nearcritical timing paths of a processor in a power- and area-efficient manner, such that we increase the range of voltages over which the incidence of operational (timing) errors is acceptable. This results in soft architectures- designs that fail gracefully, allowing us to perform reliability/power tradeoffs by reducing voltage up to the point that produces maximum allowable errors for our application. The goal of our optimization is to minimize the voltage at which a soft architecture encounters the maximum allowable error rate, thus maximizing the range over which voltage scaling is possible and minimizing power consumption for a given error rate. Our experiments demonstrate 23 % power savings over the baseline design at an error rate of 1%. Observed power reductions are 29%, 29%, 19%, and 20 % for error rates of 2%, 4%, 8%, and 16 % respectively. Benefits are higher in the face of error recovery using Razor. Area overhead of our techniques is up to 2.7%. I.

