Results 1 - 10
of
12
Memory Size Reduction through Storage Order Optimization for Embedded Parallel Multimedia Applications
- Parallel Computing
, 1997
"... In this paper, we present some strategies that are capable of reducing the required memory sizes and power consumption for a large class of data-intensive multimedia applications. This class consists of static control programs with large multi-dimensional arrays and (piece-wise) affine storage and e ..."
Abstract
-
Cited by 41 (14 self)
- Add to MetaCart
In this paper, we present some strategies that are capable of reducing the required memory sizes and power consumption for a large class of data-intensive multimedia applications. This class consists of static control programs with large multi-dimensional arrays and (piece-wise) affine storage and execution order. These strategies are equally well suited for parallel and mono-processing applications, and are particularly useful in an embedded application context, where memory size and power consumption usually are the main cost factors. The main objective of these strategies is to reuse memory as much as possible by obtaining an optimal storage order for each of the arrays present in a program through (the equivalent of) data-transformations. Although size reduction is the main objective, an added benefit is the fact that the power consumption is also reduced due to the decreased capacitive load of the memories. The memory size reduction task is part of an overall memory size and power...
Array Placement for Storage Size Reduction in Embedded Multimedia Systems
- in Embedded Multimedia Systems,” Intl. Conf. on Application Specific Systems, Architectures, and Processors
, 1997
"... In this paper, we present a two-phase strategy for reducing the required background memory sizes for a large class of data-intensive multimedia applications. This strategy is particularly useful in an embedded application context, where memory size and the corresponding power consumption are the mai ..."
Abstract
-
Cited by 27 (5 self)
- Add to MetaCart
In this paper, we present a two-phase strategy for reducing the required background memory sizes for a large class of data-intensive multimedia applications. This strategy is particularly useful in an embedded application context, where memory size and the corresponding power consumption are the main cost factors in combination with data transfers. Our strategy optimizes the storage order of arrays in memory by trying to improve the reuse of memory locations, as well for elements of the same array as for elements of different arrays. Although size reduction is the main objective, an added benefit is a reduced power consumption due to the decreased capacitive load of the memories. The memory size reduction task is part of an overall memory size and power reduction methodology called ATOMIUM, in which other tasks can increase its effectiveness (e.g. loop transformations), but it can also be used on a stand-alone base. The feasibility and effectiveness of our approach is demonstrated by e...
Storage Size Reduction by In-place Mapping of Arrays
- Verification, Model Checking and Abstract Interpretation, Third Int. Workshop, VMCAI 2002, Revised Papers, volume 2294 of LNCS
, 2002
"... Programs for embedded multimedia applications typically manipulate several large multi-dimensional arrays. The energy consumption per access increases with their size; the access to these large arrays is responsible for a substantial part of the power consumption. In this paper, an analysis is d ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
Programs for embedded multimedia applications typically manipulate several large multi-dimensional arrays. The energy consumption per access increases with their size; the access to these large arrays is responsible for a substantial part of the power consumption. In this paper, an analysis is developed to compute a bounding box for the elements in the array that are simultaneously in use. The size of the original array can be reduced to the size of the bounding box and accesses to it can be redirected using modulo operations on the original indices. This substantially reduces the size of the memories and the power consumption of accessing them.
A Unified Framework for Schedule and Storage Optimization
- IN INTERNATIONAL CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI’01
, 2001
"... We present a unified mathematical framework for analyzing the tradeoffs between parallelism and storage allocation within a parallelizing compiler. Using this framework, we show how to find a good storage mapping for a given schedule, a good schedule for a given storage mapping, and a good storage m ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
We present a unified mathematical framework for analyzing the tradeoffs between parallelism and storage allocation within a parallelizing compiler. Using this framework, we show how to find a good storage mapping for a given schedule, a good schedule for a given storage mapping, and a good storage mapping that is valid for all legal schedules. We consider storage mappings that collapse one dimension of a multi-dimensional array, and programs that are in a single assignment form with a one-dimensional schedule. Our technique combines affine scheduling techniques with occupancy vector analysis and incorporates general affine dependences across statements and loop nests. We formulate the constraints imposed by the data dependences and storage mappings as a set of linear inequalities, and apply numerical programming techniques to efficiently solve for the shortest occupancy vector. We consider our method to be a first step towards automating a procedure that finds the optimal tradeo# between parallelism and storage space.
Optimizing Storage Size for Static Control Programs in Automatic Parallelizers
- In Proc. EuroPar Conference
, 1997
"... . This article deals with automatic parallelization of static control programs. During the parallelization process the removal of artificial dependences is usually realized by translating the original program into a single assignment form. This total data expansion has a very high memory cost. We pr ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
. This article deals with automatic parallelization of static control programs. During the parallelization process the removal of artificial dependences is usually realized by translating the original program into a single assignment form. This total data expansion has a very high memory cost. We present a technique of partial data expansion which leaves untouched the performances of the parallelization process, with the help of algebra techniques given by the polytope model. 1 Introduction This article deals with the automatic parallelization technique based on the polytope model. This method can be applied provided that source programs are static control programs, i.e. are limited to do loops and assignment statements to array with affine subscripts. The first step is an array data flow analysis in order to extract exact dependences on memory cells. All artificial dependences, which are due to reuse of data, are deleted by a total data expansion. The transformed program has the sing...
The Interplay of Expansion and Scheduling in PAF
, 1998
"... This article presents an overview of our recent research on automatic parallelization of imperative programs. It describes our researches on the extension of the polytope model to general programs, and the current status of the PAF prototype parallelizer. The main contributions are a general algorit ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This article presents an overview of our recent research on automatic parallelization of imperative programs. It describes our researches on the extension of the polytope model to general programs, and the current status of the PAF prototype parallelizer. The main contributions are a general algorithm for single-assignment form transformation, an algorithm to compute parallel schedules, and a clean framework to simultaneously address parallelization by scheduling and expansion of data structures. We also recall general results on our array data-flow analysis technique, and advocate for its use at the core of parallelizing compilers. In addition, we discuss important design issues underlying the project, we compare our work with other compilation frameworks, and we propose several research perspectives for enhancing our parallelization scheme.
A constraint network based approach to memory layout optimization
- In Proc. of the Conference on Design, Automation and Test in Europe
, 2005
"... While loop restructuring based code optimization for array intensive applications has been successful in the past, it has several problems such as the requirement of checking dependences (legality issues) and transformation of all of the array references within the loop body indiscriminately (while ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
While loop restructuring based code optimization for array intensive applications has been successful in the past, it has several problems such as the requirement of checking dependences (legality issues) and transformation of all of the array references within the loop body indiscriminately (while some of the references can benefit from the transformation, others may not). As a result, data transformations, i.e., transformations that modify memory layout of array data instead of loop structure have been proposed. One of the problems associated with data transformations is the difficulty of selecting a memory layout for an array that is acceptable to the entire program (not just to a single loop). In this paper, we formulate the problem of determining the memory layouts of arrays as a constraint network, and explore several methods of solution in a systematic way. Our experiments provide strong support in favor of employing constraint processing, and point out future research directions. 1.
Interprocedural Optimisation of Regular Parallel Computations at Runtime
, 2001
"... This thesis concerns techniques for efficient runtime optimisation of regular parallel programs that are built from separate software components. High-quality, high-performance parallel software is frequently built from separately-written reusa-ble software components such as functions from a librar ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This thesis concerns techniques for efficient runtime optimisation of regular parallel programs that are built from separate software components. High-quality, high-performance parallel software is frequently built from separately-written reusa-ble software components such as functions from a library of parallel routines. Apart from the strong case from the software engineering point-of-view for constructing software in such a way, there is often also a large performance benefit in hand-optimising individual, frequently used routines. Hitherto, a problem with such libraries of separate software components has been that there is a performance penalty, both because of invocation and indirection overheads, and because opportuni-ties for cross-component optimisations are missed. The techniques we describe in this thesis aim to reverse this disadvantage by making use of high-level abstract information about the components for performing cross-component optimisation. The key is to specify, generate and make use of metadata which characterise both data and software components, and to take advantage of run-time information. We propose a delayed evaluation, self-optimising (DESO) library of data-parallel numerical rou-tines. Delayed evaluation allows us to capture the control-flow of a user program from within the
A Review of Data Placement Optimisation for Data-Parallel Component Composition
- of the University of Passau
"... Constructive methods for parallel programming are characterised by the composition of optimised, parallel software components. This paper concerns data placement, a key cross-component optimisation for regular data-parallel programs. ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Constructive methods for parallel programming are characterised by the composition of optimised, parallel software components. This paper concerns data placement, a key cross-component optimisation for regular data-parallel programs.
Memory Cost due to Anticipated Broadcast
, 2000
"... To get efficient solutions, parallelization techniques mainly focus on data alignment or on communication minimization. The efficiency of a parallel solution not only depends on the communication cost, but also on the memory cost. This paper mainly focus on a symbolic evaluation of the memory cost d ..."
Abstract
- Add to MetaCart
To get efficient solutions, parallelization techniques mainly focus on data alignment or on communication minimization. The efficiency of a parallel solution not only depends on the communication cost, but also on the memory cost. This paper mainly focus on a symbolic evaluation of the memory cost due to anticipated broadcast. This evaluation is conducted in the polytope model using Ehrhart polynomials, which express the number of integer points in a parameterized polytope.

