Results 1 -
5 of
5
Operation Tables for Scheduling in the Presence of Incomplete Bypassing
- in CODES+ISSS ’04: Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
, 2004
"... Register byp ssing is ap owerful and widely used feature in modernp rocessors to eliminate certain data hazards. Although comp lete byp assing is ideal forp erformance, byp assing has significantimp act on cycle time, area, andp ower consump4 on of the pq cessor. Due to the strict con ..."
Abstract
-
Cited by 12 (9 self)
- Add to MetaCart
Register byp ssing is ap owerful and widely used feature in modernp rocessors to eliminate certain data hazards. Although comp lete byp assing is ideal forp erformance, byp assing has significantimp act on cycle time, area, andp ower consump4 on of the pq cessor. Due to the strict constraints onp erformance, cost andp ower consump3 on in embedded p rocessors, architects need to evaluate and imp lement incompL [: register by p ssing mechanisms. However traditional data hazard detection and/or avoidance techniques used in retargetable schedulers break down in thep resence of incomp ete by p ssing. In thisp ap er, wep resent the concep of Op eration Tables, which can be used to detect data hazards, even in the pq sence of incomp ete by p ssing. Furthermore our technique integrates the detection of both data, as well as resource hazards, and can be easilyemp loyed in a comp iler to generate better schedules. Our exp erimental results on thep op4 ar Intel XScale embeddedp rocessorp latform show that even with a simp e intra-basic block scheduling technique, we achieveupP 20%p erformance impL vement over fully op timized GCC generated code on embedded ap p lications from the MiBench suite.
PBExplore: A framework for compiler-in-the-loop exploration of partial bypassing in embedded processors
- In DATE ’05: Proceedings of the conference on Design, Automation and Test in Europe
, 2005
"... Varying partial bypassing in pipelined processors is an effective way to make performance, area and energy tradeoffs in embedded processors. However, performance evaluation of partial bypassing in processors has been inaccurate, largely due to the absence of bypass-sensitive retargetable compilation ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Varying partial bypassing in pipelined processors is an effective way to make performance, area and energy tradeoffs in embedded processors. However, performance evaluation of partial bypassing in processors has been inaccurate, largely due to the absence of bypass-sensitive retargetable compilation techniques. Furthermore no existing partial bypass exploration framework estimates the power and cost overhead of partial bypassing. In this paper we present PBExplore: A framework for Compiler-in-the-Loop exploration of partial bypassing in processors. PBExplore accurately evaluates the performance of a partially bypassed processor using a generic bypass-sensitive compilation technique. It synthesizes the bypass control logic and estimates the area and energy overhead of each bypass configuration. PBExplore is thus able to effectively perform multi-dimensional exploration of the partial bypass design space. We present experimental results on the Intel XScale architecture on MiBench benchmarks and demonstrate the need, utility and exploration capabilities of PBExplore. 1
FLASH: Foresighted latency-aware scheduling heuristic for processors with customized datapaths
- In CGO
, 2004
"... Application-specific instruction set processors (ASIPs) have the potential to meet the challenging cost, performance, and power goals of future embedded processors by customizing the hardware to suit an application. A central problem is creating compilers that are capable of dealing with the heterog ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Application-specific instruction set processors (ASIPs) have the potential to meet the challenging cost, performance, and power goals of future embedded processors by customizing the hardware to suit an application. A central problem is creating compilers that are capable of dealing with the heterogeneous and non-uniform hardware created by the customization process. The processor datapath provides an effective area to customize, but specialized datapaths often have non-uniform connectivity between the function units, making the effective latency of a function unit dependent on the consuming operation. Traditional instruction schedulers break down in this environment due to their locally greedy nature of binding the best choice for a single operation even though that choice may be poor due to a lack of communication paths. To effectively schedule with non-uniform connectivity, we propose a foresighted latencyaware scheduling heuristic (FLASH) that performs lookahead across future scheduling steps to estimate the effects of a potential binding. FLASH combines a set of lookahead heuristics to achieve effective foresight with low compiletime overhead. 1.
Retargetable Pipeline Hazard Detection for Partially Bypassed Processors
- IEEE Transactions on Very Large Scale Integration System
, 2006
"... the presence of Partial Bypassing”. This article extends the earlier work in several ways. It better motivates for the need of Operation Tables. It more formally and completely describes the algorithms to use Operation Tables for pipeline hazard detection. It presents more experimental results, and ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
the presence of Partial Bypassing”. This article extends the earlier work in several ways. It better motivates for the need of Operation Tables. It more formally and completely describes the algorithms to use Operation Tables for pipeline hazard detection. It presents more experimental results, and demonstrates the need and usefulness of Operation Tables by varying the bypasses in a processor. Register bypassing is a widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, it has significant impact on the cycle time, area, and power consumption of the processor. Owing to the strict design constraints on the performance, cost and the power consumption of embedded processor systems, architects seek a compromise between the design parameters by implementing partial bypassing in processors. However, partial bypassing in processors presents challenges for compilation. Traditional data hazard detection and/or avoidance techniques used in retargetable compilers that assume a constant value of operation latency, break down in the presence of partial bypassing. In this article, we present the concept of Operation Tables that can be used to accurately detect data hazards, even in the presence of incomplete bypassing. Operation Tables integrate the detection of all kinds of pipeline hazards in a unified framework, and can therefore be easily deployed in a compiler
Automatic Design Space Exploration of Register Bypasses in Embedded Processors
- IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS
"... Register Bypassing is a popular and powerful architectural feature to improve processor performance in pipelined processors by eliminating certain data hazards. However, extensive bypassing comes with a significant impact on cycle time, area and power consumption of the processor. Recent research th ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Register Bypassing is a popular and powerful architectural feature to improve processor performance in pipelined processors by eliminating certain data hazards. However, extensive bypassing comes with a significant impact on cycle time, area and power consumption of the processor. Recent research therefore advocates the use of partial bypassing in processor. However, accurate performance evaluation of partially bypassed processors is still a challenge; primarily due to the lack of bypass-sensitive retargetable compilation techniques. No existing partial bypass exploration framework estimates the power and area overhead of partial bypassing. As a result the designers end up making sub-optimal design decisions during the exploration of partial bypass design space. This article presents PBExplore: An automatic design

