Results 1 - 10
of
15
High-Level Power Modeling, Estimation, and Optimization
- IEEE Trans. On Computer Aided Design
, 1998
"... Abstract—Silicon area, performance, and testability have been, so far, the major design constraints to be met during the development of digital very-large-scale-integration (VLSI) systems. In recent years, however, things have changed; increasingly, power has been given weight comparable to the othe ..."
Abstract
-
Cited by 74 (10 self)
- Add to MetaCart
Abstract—Silicon area, performance, and testability have been, so far, the major design constraints to be met during the development of digital very-large-scale-integration (VLSI) systems. In recent years, however, things have changed; increasingly, power has been given weight comparable to the other design parameters. This is primarily due to the remarkable success of personal computing devices and wireless communication systems, which demand high-speed computations with low power consumption. In addition, there exists a strong pressure for manufacturers of high-end products to keep power under control, due to the increased costs of packaging and cooling this type of devices. Last, the need of ensuring high circuit reliability has turned out to be more stringent. The availability of tools for the automatic design of low-power VLSI systems has thus become necessary. More specifically, following a natural trend, the interests of the researchers have lately shifted to the investigation of power modeling, estimation, synthesis, and optimization techniques that account for power dissipation during the early stages of the design flow. This paper surveys representative contributions to this area that have appeared in the recent literature. Index Terms — Behavioral and logic synthesis, low power design, power management. I.
Memory Size Reduction through Storage Order Optimization for Embedded Parallel Multimedia Applications
- Parallel Computing
, 1997
"... In this paper, we present some strategies that are capable of reducing the required memory sizes and power consumption for a large class of data-intensive multimedia applications. This class consists of static control programs with large multi-dimensional arrays and (piece-wise) affine storage and e ..."
Abstract
-
Cited by 41 (14 self)
- Add to MetaCart
In this paper, we present some strategies that are capable of reducing the required memory sizes and power consumption for a large class of data-intensive multimedia applications. This class consists of static control programs with large multi-dimensional arrays and (piece-wise) affine storage and execution order. These strategies are equally well suited for parallel and mono-processing applications, and are particularly useful in an embedded application context, where memory size and power consumption usually are the main cost factors. The main objective of these strategies is to reuse memory as much as possible by obtaining an optimal storage order for each of the arrays present in a program through (the equivalent of) data-transformations. Although size reduction is the main objective, an added benefit is the fact that the power consumption is also reduced due to the decreased capacitive load of the memories. The memory size reduction task is part of an overall memory size and power...
Array Placement for Storage Size Reduction in Embedded Multimedia Systems
- in Embedded Multimedia Systems,” Intl. Conf. on Application Specific Systems, Architectures, and Processors
, 1997
"... In this paper, we present a two-phase strategy for reducing the required background memory sizes for a large class of data-intensive multimedia applications. This strategy is particularly useful in an embedded application context, where memory size and the corresponding power consumption are the mai ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
In this paper, we present a two-phase strategy for reducing the required background memory sizes for a large class of data-intensive multimedia applications. This strategy is particularly useful in an embedded application context, where memory size and the corresponding power consumption are the main cost factors in combination with data transfers. Our strategy optimizes the storage order of arrays in memory by trying to improve the reuse of memory locations, as well for elements of the same array as for elements of different arrays. Although size reduction is the main objective, an added benefit is a reduced power consumption due to the decreased capacitive load of the memories. The memory size reduction task is part of an overall memory size and power reduction methodology called ATOMIUM, in which other tasks can increase its effectiveness (e.g. loop transformations), but it can also be used on a stand-alone base. The feasibility and effectiveness of our approach is demonstrated by e...
Overlay techniques for scratchpad memories in low power embedded processors
- IEEE Trans. VLSI Syst
"... Abstract—Energy consumption is one of the important parameters to be optimized during the design of portable embedded systems. Thus, most of the contemporary portable devices feature low-power processors coupled with on-chip memories (e.g., caches, scratchpads). Scratchpads are better than tradition ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Abstract—Energy consumption is one of the important parameters to be optimized during the design of portable embedded systems. Thus, most of the contemporary portable devices feature low-power processors coupled with on-chip memories (e.g., caches, scratchpads). Scratchpads are better than traditional caches in terms of power, performance, area, and predictability. However, unlike caches they depend upon software allocation techniques for their utilization. In this paper, we present scratchpad overlay techniques which analyze the application and insert instructions to dynamically copy both variables and code segments onto the scratchpad at runtime. We demonstrate that the problem of overlaying scratchpad is an extension of the Global Register Allocation problem. We present optimal and near-optimal approaches for solving the scratchpad overlay problem. The near-optimal scratchpad overlay approach achieves close to the optimal results and is significantly faster than the optimal approach. Our approaches improve upon the previously known static allocation technique for assigning both variables and code segments onto the scratchpad. The evaluation of the approaches for ARM7 processor reports, average energy, and execution time reductions of 26 % and 14 % over the static approach, respectively. Additional experiments comparing the overlayed scratchpads against unified caches of the same size, report average energy, and execution time savings of 20 % and 10%, respectively. We also report data memory energy reductions of 45%–57 % due to the insertion of a 1024-bytes scratchpad memory in the memory hierarchy of a digital signal processor (DSP). Index Terms—Code overlay, memory aware code optimization, scratchpad memory (SPM). I.
Power Exploration for Dynamic Data Types through Virtual Memory Management Refinement
- In Proceedings of the International Symposium on Low Power Electronics and Design
, 1998
"... In this paper we present our novel power exploration methodology for applications with dynamic data types. Our methodology is crucial to obtain effective solutions in an embedded (HW or SW) processor context. The contributions are twofold. First we define the complete search space for Virtual Memory ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
In this paper we present our novel power exploration methodology for applications with dynamic data types. Our methodology is crucial to obtain effective solutions in an embedded (HW or SW) processor context. The contributions are twofold. First we define the complete search space for Virtual Memory Management (VMM) mechanisms in a structured way with orthogonal decision trees. Secondly we present our systematic methodology for exploration of the maximal power that takes into account characteristics of the application to heavily prune the search space guiding the choices of a VMM mechanism. Finally we demonstrate for two industrial examples that power can vary considerably depending on the VMM chosen. Moreover these experiments show the effectiveness of our exploration methodology. 1 Introduction We target applications that require manipulation of large amounts of data that are dynamically created and destroyed at run time, such as protocol processing applications. These applications...
Strategy For Power Efficient Design Of Parallel Systems
"... Application studies in the areas of image and video processing indicate that between 50 and 80% of the power cost in these systems is due to data storage and transfers. This is especially true for multi-processor realizations, because conventional parallelization methods ignore the power cost and fo ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Application studies in the areas of image and video processing indicate that between 50 and 80% of the power cost in these systems is due to data storage and transfers. This is especially true for multi-processor realizations, because conventional parallelization methods ignore the power cost and focus only on performance. However, also the power consumption depends heavily on the way a system is parallelized. To reduce this dominant cost, we propose to address the system-level storage organization for the multi-dimensional signals as a first step in mapping these applications, before the parallelization or partitioning decisions (in particular before the SW/HW partitioning which is traditionally done too early in the design trajectory). Our methodology is illustrated on a parallel QSDPCM video codec.
Code Transformations for Embedded Multimedia Applications: Impact on Power and Performance
- Power-Driven Microarchitecture Workshop In Conjunction With ISCA98
, 1998
"... A number of code transformations for embedded multimedia applications is presented in this paper and their impact on both system power and performance is evaluated. In terms of power the transformations move the accesses from the large background memories to small buffers that can be kept foreground ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
A number of code transformations for embedded multimedia applications is presented in this paper and their impact on both system power and performance is evaluated. In terms of power the transformations move the accesses from the large background memories to small buffers that can be kept foreground. This leads to reduction of the memory related power consumption that forms the dominant part of the total power budget of such systems. The transformations also affect the code size and the system's performance which is usually the overriding issue in embedded systems. The impact of the transformations to the performance is analyzed in detail. The code parameters related to the performance of the system and the way they are affected by the transformations are identified. This allows for the development of a systematic methodology for the application of code transformations that achieve an optimal balance between power and performance.
Reducing Power Consumption of Dedicated Processors through Instruction Set Encoding
- in 8th GLS
, 1998
"... With the increased clock frequency of modern, high-performance processors #over 500 MHz, in some cases#, limiting the power dissipation has become the most stringent design target. It is thus mandatory for processor engineers to resort to a large variety of optimization techniques to reduce the powe ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
With the increased clock frequency of modern, high-performance processors #over 500 MHz, in some cases#, limiting the power dissipation has become the most stringent design target. It is thus mandatory for processor engineers to resort to a large variety of optimization techniques to reduce the power requirements in the hot zones of the chip. In this paper, we focus on the power dissipated by the instruction fetch and decode logic, a portion of the processor architecture where a lot of capacitance switching normally takes place. We propose a methodology for determining an encoding of the instruction set that guarantees the minimization of the number of bit transitions occurring inside the registers of the pipeline stages involved in instruction fetching and decoding. The assignment of the binary patterns to the op-codes is driven by the statistics concerning instruction adjacency collected through instruction-level simulation of typical software applications; therefore, the technique is best exploited when appliedtoencode the instruction set of coreprocessors and microcontrollers, sincecomponents of these types arecommonly used to execute #xedportions of machine code within embedded systems. We illustrate the e#ectiveness of the methodology through the experimental data we have obtainedon an existing microprocessor.
A Loop Transformation Approach for Combined Parallelization and Data Transfer and Storage Optimization
, 2000
"... We show a new loop transformation approach to combine parallelization and data transfer and storage optimization for embedded multimedia applications. Our methodology makes use of an extended polytope model, with an exact mathematical description of all operations and dependencies. For the data tran ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
We show a new loop transformation approach to combine parallelization and data transfer and storage optimization for embedded multimedia applications. Our methodology makes use of an extended polytope model, with an exact mathematical description of all operations and dependencies. For the data transfer and storage exploration, we use a two step approach, consisting of a polytope placement step and an ordering step. We will show that an early parallelization has to be done between these two steps, in order to achieve a powerful combination of storage optimization and parallelization.
Hierarchy Exploration in High Level Memory Management
- in proc. of the 1997 International Symposium on Low Power Electronics and Design
, 1997
"... Introducing an optimized memory hierarchy to exploit temporal locality in the memory accesses on array signals can have a very large impact on the power consumption in data dominated applications. In this paper the design freedom available for the basic problem is explored in-depth and the outline o ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Introducing an optimized memory hierarchy to exploit temporal locality in the memory accesses on array signals can have a very large impact on the power consumption in data dominated applications. In this paper the design freedom available for the basic problem is explored in-depth and the outline of a systematic solution methodology is proposed. The methodology is illustrated on a real-life motion estimation application. The results obtained for this application show power reductions of about 85% for the memory sub-system compared to the case without memory hierarchy. These large gains justify that memory hierarchy design should be done early in the global memory management script. 1 Introduction The idea of using memory hierarchy to minimize the power consumption, is based on the fact that memory power consumption depends primarily on the access frequency and the size of the memory. Power savings can be obtained by accessing heavily used data from smaller memories instead of from la...

