Results 1 -
7 of
7
Improving program efficiency by packing instructions into registers
- In Proceedings of the 2005 ACM/IEEE International Symposium on Computer Architecture
, 2005
"... New processors, both embedded and general purpose, often have conflicting design requirements involving space, power, and performance. Architectural features and compiler optimizations often target one or more design goals at the expense of the others. This paper presents a novel architectural and c ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
New processors, both embedded and general purpose, often have conflicting design requirements involving space, power, and performance. Architectural features and compiler optimizations often target one or more design goals at the expense of the others. This paper presents a novel architectural and compiler approach to simultaneously reduce power requirements, decrease code size, and improve performance by integrating an instruction register file (IRF) into the architecture. Frequently occurring instructions are placed in the IRF. Multiple entries in the IRF can be referenced by a single packed instruction in ROM or L1 instruction cache. Unlike conventional code compression, our approach allows the frequent instructions to be referenced in arbitrary combinations. The experimental results show significant improvements in space and power, as well as some improvement in execution time when using only 32 entries. These advantages make packing instructions into registers an effective approach for improving overall efficiency. 1.
Reducing instruction fetch cost by packing instructions into register windows
- In Proceedings of the 38th annual ACM/IEEE International Symposium on Microarchitecture (November 2005), IEEE Computer Society
"... Instruction packing is a combination compiler/architectural approach that allows for decreased code size, reduced power consumption and improved performance. The packing is obtained by placing frequently occurring instructions into an Instruction Register File (IRF). Multiple IRF entries can then be ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Instruction packing is a combination compiler/architectural approach that allows for decreased code size, reduced power consumption and improved performance. The packing is obtained by placing frequently occurring instructions into an Instruction Register File (IRF). Multiple IRF entries can then be accessed using special packed instructions. Previous IRF efforts focused on using a single 32-entry register file for the duration of an application. This paper presents software and hardware extensions to the IRF supporting multiple instruction register windows to allow a greater number of relevant instructions to be available for packing in each function. Windows are shared among similar functions to reduce the overall costs involved in such an approach. The results indicate that significant improvements in instruction fetch cost can be obtained by using this simple architectural enhancement. We also show that using an IRF with a loop cache, which is also used to reduce energy consumption, results in much less energy consumption than using either feature in isolation. 1
Adapting compilation techniques to enhance the packing of instructions into registers
- In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems
, 2006
"... The architectural design of embedded systems is becoming increasingly idiosyncratic to meet varying constraints regarding energy consumption, code size, and execution time. Traditional compiler optimizations are often tuned for improving general architectural constraints, yet these heuristics may no ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
The architectural design of embedded systems is becoming increasingly idiosyncratic to meet varying constraints regarding energy consumption, code size, and execution time. Traditional compiler optimizations are often tuned for improving general architectural constraints, yet these heuristics may not be as beneficial to less conventional designs. Instruction packing is a recently developed compiler/architectural approach for reducing energy consumption, code size, and execution time by placing the frequently occurring instructions into an Instruction Register File (IRF). Multiple IRF instructions are made accessible via special packed instruction formats. This paper presents the design and analysis of a compilation framework and its associated optimizations for improving the efficiency of instruction packing. We show that several new heuristics can be developed for IRF promotion, instruction selection, register re-assignment and instruction scheduling, leading to significant reductions in energy consumption, code size, and/or execution time when compared to results using a standard optimizing compiler targeting the IRF.
Addressing Instruction Fetch Bottlenecks by Using an Instruction Register File
, 2007
"... The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application’s instructions, resulting in decreased code size, reduced energy consumption and improved executio ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The Instruction Register File (IRF) is an architectural extension for providing improved access to frequently occurring instructions. An optimizing compiler can exploit an IRF by packing an application’s instructions, resulting in decreased code size, reduced energy consumption and improved execution time primarily due to a smaller footprint in the instruction cache. The nature of the IRF also allows the execution of packed instructions to overlap with instruction fetch, thus providing a means for tolerating increased fetch latencies, like those experienced by encrypted ICs as well as the presence of low-power L0 caches. Although previous research has focused on the direct benefits of instruction packing, this paper explores the use of increased fetch bandwidth provided by packed instructions. Small L0 caches improve energy efficiency but can increase execution time due to frequent cache misses. We show that this penalty can be significantly reduced by overlapping the execution of packed instructions with miss stalls. The IRF can also be used to supply additional instructions to a more aggressive execution engine, effectively reducing dependence on instruction cache bandwidth. This can improve energy efficiency, in addition to providing additional flexibility for evaluating various design tradeoffs in a pipeline with asymmetric instruction bandwidth. Thus, we show that the IRF is a complementary technique, operating as a buffer tolerating fetch bottlenecks, as well as providing additional fetch bandwidth for an aggressive pipeline backend.
Customization of Loop Caches for Embedded Systems Design
, 2003
"... Embedded system programs tend to spend much time in small loops. Introducing a very small loop cache into the instruction memory hierarchy has thus been shown to substantially reduce instruction fetch energy. However, loop caches come in many sizes and variations – using the configuration best on th ..."
Abstract
- Add to MetaCart
Embedded system programs tend to spend much time in small loops. Introducing a very small loop cache into the instruction memory hierarchy has thus been shown to substantially reduce instruction fetch energy. However, loop caches come in many sizes and variations – using the configuration best on the average may actually result in worsened energy for a specific program. We therefore introduce a loop cache exploration tool that analyzes a particular program’s profile, explores the possible configurations, and generates the configuration with the greatest power savings. We introduce a simulation-based approach and show the good energy savings that a customized loop cache yields. Furthermore, we also introduce a fast estimation-based approach that obtains nearly the same results in seconds rather than tens of minutes or hours. iv
A DYNAMIC CODE MAPPING TECHNIQUE FOR SCRATCHPAD MEMORIES IN EMBEDDED SYSTEMS
, 2008
"... Design of modern embedded systems has become extremely challenging due to multidimensional and stringent design constraints like performance, cost, weight, power, realtime, time-to-market and size. Such systems typically feature low power processors coupled with fast on-chip scratchpad memories (SPM ..."
Abstract
- Add to MetaCart
Design of modern embedded systems has become extremely challenging due to multidimensional and stringent design constraints like performance, cost, weight, power, realtime, time-to-market and size. Such systems typically feature low power processors coupled with fast on-chip scratchpad memories (SPMs). Scratchpads are more efficient than caches in terms of energy consumption, performance, area and timing predictability. However, unlike caches which manage the program code and data in hardware, the efficient use of scratchpads requires them to be managed explicitly, usually by the programmer. This involves deciding which code or data objects should be mapped to SPM, when to bring them in and where to bring them within the SPM- termed as the mapping process. The objective is to find a mapping which will minimize the energy consumption and maximize the performance. In this work, a fully automated, dynamic code mapping technique for SPMs based on compiler static analysis is presented, which alleviates the programmer of this burden. The mapping problem is formulated as a binary integer linear programming problem and a

