Results 1 -
7 of
7
A Software Solution for Dynamic Stack Management on Scratch Pad Memory ∗
"... Abstract — In an effort to make processors more power efficient scratch pad memory (SPM) have been proposed instead of caches, which can consume majority of processor power. However, application mapping on SPMs remain a challenge. We propose a dynamic SPM management scheme for program stack data for ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Abstract — In an effort to make processors more power efficient scratch pad memory (SPM) have been proposed instead of caches, which can consume majority of processor power. However, application mapping on SPMs remain a challenge. We propose a dynamic SPM management scheme for program stack data for processor power reduction. As opposed to previous efforts, our solution does not mandate any hardware changes, does not need profile information, and SPM size at compile-time, and seamlessly integrates support for recursive functions. Our technique manages stack frames on SPM using a scratch pad memory manager (SPMM), integrated into the application binary by the compiler. Our experiments on benchmarks from MiBench [15] show average energy savings of 37 % along with a performance improvement of 18%. I.
Optimal stack frame placement and transfer for energy reduction targeting embedded processors with scratch-pad memories
"... Abstract—Memory accesses are a major cause of energy con-sumption for embedded systems and the stack is a frequent target for data accesses. This paper presents a fully software technique which aims at reducing the energy consumption related to the stack by allocating and transferring frames or part ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Memory accesses are a major cause of energy con-sumption for embedded systems and the stack is a frequent target for data accesses. This paper presents a fully software technique which aims at reducing the energy consumption related to the stack by allocating and transferring frames or part of frames between a scratch-pad memory and the main memory. The technique utilizes an integer linear formulation of the problem in order to find at compile time the optimal management for the frames. The technique is also extended to integrate existing methods which deal with static memory objects and others which deal with recursive functions. Experimental results show that our technique effectively exploits an available scratch-pad memory space which is only one half of what the stack requires to reduce the stack-related energy consumption by more than 90 % for several applications and on an average of 84 % compared to the case where all the frames of the stack are placed into the main memory. I.
Optimizing Local Memory Allocation and Assignment Through a Decoupled Approach
- THE 23RD INTERNATIONAL WORKSHOP ON LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING
, 2009
"... Software-controlled local memories (LMs) are widely used to provide fast, scalable, power efficient and predictable access to critical data. While many studies addressed LM management, keeping hot data in the LM continues to cause major headache. This paper revisits LM management of arrays in light ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Software-controlled local memories (LMs) are widely used to provide fast, scalable, power efficient and predictable access to critical data. While many studies addressed LM management, keeping hot data in the LM continues to cause major headache. This paper revisits LM management of arrays in light of recent progresses in register allocation, supporting multiple live-range splitting schemes through a generic integer linear program. These schemes differ in the grain of decision points. The model can also be extended to address fragmentation, assigning live ranges to precise offsets. We show that the links between LM management and register allocation have been underexploited, leaving much fundamental questions open and effective applications to be explored.
A SOFTWARE-ONLY SOLUTION FOR STACK MANAGEMENT ON SYSTEMS WITH SCRATCH PAD MEMORY
, 2008
"... The pursuit for higher performance and higher power-efficiency in computing has led to the evolution of multi-core processor architectures. Early multi-core processors primarily used the shared memory multi-processing paradigm. However, the conventional shared memory architecture, due to its limited ..."
Abstract
- Add to MetaCart
The pursuit for higher performance and higher power-efficiency in computing has led to the evolution of multi-core processor architectures. Early multi-core processors primarily used the shared memory multi-processing paradigm. However, the conventional shared memory architecture, due to its limited scalability becomes a performance bottleneck. Newer architectures like the IBM Cell with 10 cores have adopted new memory architectures to truly enable the peak computing performance available. In order to achieve higher performance, it is necessary to re-design not only the bus topology, but also the memory hierarchy. The distributed memory model used in non-uniform memory access (NUMA) architectures is becoming popular in these modern processors. Conventional on-chip memory like caches have been replaced by a low power, low area alternative called scratch pad memory (SPM). Caches perform the data and code transfers in hardware in an automated fashion. Unlike caches, the transfers in SPM need to be explicitly managed by the compiler. In order to achieve a power-efficient operation, it is important to map the most frequently used objects onto the SPM. In this thesis, a dynamic scratch pad memory management scheme is proposed for program stack data with the objective of processor power reduction. As opposed to previous efforts, this technique does not need the SPM size at compile-time, does not mandate any hardware changes, does not need profile information and seamlessly integrates support for recursive functions. This solution manages stack frames on SPM using a software scratch pad memory manager (SPMM), integrated into the application binary by the compiler. The experiments on benchmarks from MiBench suite show average energy savings of 37 % along with a performance improvement of 18 %.
ISOS: Space Overlapping Based on Iteration Access Patterns for Dynamic Scratch-pad Memory Management in Embedded Systems
"... Scratch-pad memory (SPM), a small fast software-managed on-chip SRAM (Static Random Access Memory), is widely used in embedded systems. With the ever-widening performance gap between processors and main memory, it is very important to reduce the serious off-chip memory access overheads caused by tra ..."
Abstract
- Add to MetaCart
(Show Context)
Scratch-pad memory (SPM), a small fast software-managed on-chip SRAM (Static Random Access Memory), is widely used in embedded systems. With the ever-widening performance gap between processors and main memory, it is very important to reduce the serious off-chip memory access overheads caused by transferring data between SPM and off-chip memory. In this paper, we propose a novel compiler-assisted iteration-access-pattern-based space overlapping technique for dynamic SPM management (ISOS) with DMA (Direct Memory Access). In ISOS, we combine both SPM and DMA for performance optimization by exploiting the chance to overlap SPM space so as to further utilize the limited SPM space and reduce the number of DMA operations. We implement our technique based on IMPACT and conduct experiments using a set of benchmarks form DSPstone and Mediabench on the cycle-accurate VLIW simulator of Trimaran. The experimental results show that our technique achieves significant run-time performance improvement compared with the previous work.
Stack Caching Using Split Data Caches
"... Abstract—In most embedded and general purpose archi-tectures, stack data and non-stack data is cached together, meaning that writing to or loading from the stack may expel non-stack data from the data cache. Manipulation of the stack has a different memory access pattern than that of non-stack data, ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract—In most embedded and general purpose archi-tectures, stack data and non-stack data is cached together, meaning that writing to or loading from the stack may expel non-stack data from the data cache. Manipulation of the stack has a different memory access pattern than that of non-stack data, showing higher temporal and spatial locality. We propose caching stack and non-stack data separately and develop four different stack caches that allow this separation without requiring compiler support. These are the simple, window, and prefilling with and without tag stack caches. The performance of the stack cache architectures was evaluated using the SimpleScalar toolset where the window and prefilling stack cache without tag resulted in an execution speedup of up to 3.5 % for the MiBench benchmarks, executed on an out-of-order processor with the ARM instruction set. Keywords-cache memory; microprocessors; stack caching; I.
DYNAMIC BINARY TRANSLATION FOR EMBEDDED SYSTEMS WITH SCRATCHPAD MEMORY
, 2011
"... This dissertation was presented by Jose ́ Américo Baiocchi Paredes It was defended on ..."
Abstract
- Add to MetaCart
(Show Context)
This dissertation was presented by Jose ́ Américo Baiocchi Paredes It was defended on