Results 1 -
8 of
8
Improving Dynamic Cluster Assignment for Clustered Trace Cache Processors
, 2003
"... This work examines dynamic cluster assignment for a clustered trace cache processor (CTCP). Previously proposed cluster assignment techniques run into unique problems as issue width and cluster count increase. Realistic design conditions, such as variable data forwarding latencies between clusters a ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
This work examines dynamic cluster assignment for a clustered trace cache processor (CTCP). Previously proposed cluster assignment techniques run into unique problems as issue width and cluster count increase. Realistic design conditions, such as variable data forwarding latencies between clusters and a heavily partitioned instruction window, increase the degree of difficulty for effective cluster assignment.
Cool-Fetch: A compiler-enabled IPC estimation based framework for energy reduction
- In ACM Computer architecture letters
, 2002
"... Abstract — With power consumption becoming an increasingly important factor, it is necessary to reevaluate traditional, power-intensive, architectural techniques and their relative performance benefits. We believe that combined architecturecompiler efforts open up new and efficient ways to retain th ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract — With power consumption becoming an increasingly important factor, it is necessary to reevaluate traditional, power-intensive, architectural techniques and their relative performance benefits. We believe that combined architecturecompiler efforts open up new and efficient ways to retain the performance benefits of modern architectures while addressing their power impact. In this paper, we present Cool-Fetch, an architecture-compiler based approach to reduce energy consumption in the processor. While we mainly target the fetch unit, an important side-effect of our approach is that we obtain energy savings in many other parts of the processor. The explanation is that the fetch unit often runs substantially ahead of execution, bringing in instructions to different stages in the processor that may never be executed. We have found that although the degree of Instruction Level Parallelism (ILP) of a program tends to vary over time, it can be statically estimated by the compiler. Our Instructions Per Clock (IPC) estimation scheme uses monotonic dataflow analysis and simple heuristics, to guide a fetch-throttling mechanism. We develop the necessary architecture support and include its power overhead. Using Mediabench and SPEC2000 applications, we obtain up to 15 % total energy savings in the processor with generally little performance degradation. We also provide a comparison of Cool-Fetch with previously proposed hardwareonly dynamic fetch-throttling schemes. I.
Performance and Energy Impact of Instruction-Level Value Predictor Filtering
, 2003
"... This work evaluates value predictor access filtering and its effects on performance and dynamic energy consumption in a wide-issue, high-frequency processor. New and previously proposed filtering strategies are analyzed with realistic predictor constraints, such as port restrictions and table access ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This work evaluates value predictor access filtering and its effects on performance and dynamic energy consumption in a wide-issue, high-frequency processor. New and previously proposed filtering strategies are analyzed with realistic predictor constraints, such as port restrictions and table access latency. Filters restrict access to the value predictor for instructions with unconsumed predictions, poorly predicted instructions, and quickly executing instructions. Read access filtering improves speedup due to value prediction from 16.1% to 23.6%, while reducing dynamic value predictor reads by 31.3%. Adding write filtering decreases update activity by 78.6%, while still providing 14.8% speedup. The overall reduction in activity leads to a value predictor energy consummption decrease of 52.6%.
On the Energy-Efficiency of Speculative Hardware
- 2005 ACM International Conference on Computing Frontiers
, 2005
"... Microprocessor trends are moving towards wider architectures and more aggressive speculation. With the increasing transistor budgets, energy consumption has become a critical design constraint. To address this problem, several researchers have proposed and evaluated energy-efficient variants of spec ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Microprocessor trends are moving towards wider architectures and more aggressive speculation. With the increasing transistor budgets, energy consumption has become a critical design constraint. To address this problem, several researchers have proposed and evaluated energy-efficient variants of speculation mechanisms. However, such hardware is typically evaluated in isolation and its impact on the energy consumption of the rest of the processor, for example, due to wrong-path executions, is ignored. Moreover, the available metrics that would provide a thorough evaluation of an architectural optimization employ somewhat complicated formulas with hard-to-measure parameters. In this paper, we introduce a simple method to accurately compare the energy-efficiency of speculative architectures. Our metric is based on runtime analysis of the entire processor chip and thus captures the energy consumption due to the positive as well as the negative activities that arise from the speculation activities. We demonstrate the usefulness of our metric on the example of value speculation, where we found some proposed value predictors, including low-power designs, not to be energy-efficient.
Compiler-Based Adaptive Fetch Throttling for Energy-Efficiency ∗
"... Front-end instruction delivery accounts for a significant fraction of energy consumption in dynamically scheduled superscalar processors. Different front-end throttling techniques have been introduced to reduce the chip-wide energy consumption caused by redundant fetching. Hardwarebased techniques, ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Front-end instruction delivery accounts for a significant fraction of energy consumption in dynamically scheduled superscalar processors. Different front-end throttling techniques have been introduced to reduce the chip-wide energy consumption caused by redundant fetching. Hardwarebased techniques, such as flow-based throttling, could reduce the energy consumption considerably, but with a high performance loss. On the other hand, compiler-based IPCestimation-driven software fetch throttling (CFT) techniques result in relatively low performance degradation, which is desirable for high-performance processors. However, their energy savings are limited by the fact that they typically use a predefined fixed low IPC-threshold to control throttling. In this paper, we propose a Compiler-based Adaptive Fetch Throttling (CAFT) technique that allows changing the throttling threshold dynamically at runtime. Instead of using a fixed threshold, our technique uses the Decode/Issue Difference (DID) to assist the fetch throttling decision based on the statically estimated IPC. Changing the threshold dynamically makes it possible to throttle at a higher estimated IPC, thus increasing the throttling opportunities and resulting in larger energy savings. We demonstrate that CAFT could increase the energy savings significantly compared to CFT, while preserving its benefit of low performance loss. Our simulation results show that the proposed technique doubles the energy-delay product (EDP) savings compared to the fixed threshold throttling and achieves a 6.7 % average EDP saving. 1.
Factored Multi-core Architectures
, 2006
"... Technology scaling trends have forced designers to consider alternatives to deeply pipelining aggressive cores with large amounts of performance accelerating hardware.
One alternative is to factor out or decouple large structures from critical pipeline loops.
In this work, we combine prior technique ..."
Abstract
- Add to MetaCart
Technology scaling trends have forced designers to consider alternatives to deeply pipelining aggressive cores with large amounts of performance accelerating hardware.
One alternative is to factor out or decouple large structures from critical pipeline loops.
In this work, we combine prior techniques in factoring into a cohesive framework and extend this paradigm to more of the processor core. We propose an architecture
where the large structures and latency tolerant performance accelerators are factored out of the processor core into helpers and the small and fast µ-core can be augmented
with these latency tolerant helpers. This design will reduce the number of accesses to the large, power hungry structures and hence provide power savings with minimal impact on performance. Also this architecture allows the use of slower, more power efficient circuit designs for helpers. As the demands placed on the processor core varies
between applications, and even between phases of an application, the benefit seen from any set of helpers will vary tremendously. If there is a single core, these auxiliary structures can be turned on and off dynamically to tune the energy/performance of the machine to the needs of the running application. This is achieved by taking advantage of the dynamic reconfigurability or polymorphism of helpers and allowing a core to adapt to changing applications, workloads, or phases.
Leakage-efficient design of value predictors through state and
, 2010
"... non-state preserving techniques ..."

