Results 1 
3 of
3
Optimal Loop Parallelization for Maximizing IterationLevel Parallelism
"... This paper solves the open problem of extracting the maximal number of iterations from a loop that can be executed in parallel on chip multiprocessors. Our algorithm solves it optimally by migrating the weights of parallelisminhibiting dependences on dependence cycles in two phases. First, we model ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
This paper solves the open problem of extracting the maximal number of iterations from a loop that can be executed in parallel on chip multiprocessors. Our algorithm solves it optimally by migrating the weights of parallelisminhibiting dependences on dependence cycles in two phases. First, we model dependence migration with retiming and formulate this classic loop parallelization into a graph optimization problem, i.e., one of finding retiming values for its nodes so that the minimum nonzero edge weight in the graph is maximized. We present our algorithm in three stages with each being built incrementally on the preceding one. Second, the optimal code for a loop is generated from the retimed graph of the loop found in the first phase. We demonstrate the effectiveness of our optimal algorithm by comparing with a number of representative nonoptimal algorithms using a set of benchmarks frequently used in prior work.
Optimally Maximizing IterationLevel Loop Parallelism
 IEEE Trans. Parallel Distrib. Syst
"... Abstract—Loops are the main source of parallelism in many applications. This paper solves the open problem of extracting the maximal number of iterations from a loop to run parallel on chip multiprocessors. Our algorithm solves it optimally by migrating the weights of parallelisminhibiting dependen ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Loops are the main source of parallelism in many applications. This paper solves the open problem of extracting the maximal number of iterations from a loop to run parallel on chip multiprocessors. Our algorithm solves it optimally by migrating the weights of parallelisminhibiting dependences on dependence cycles in two phases. First, we model dependence migration with retiming and formulate this classic loop parallelization into a graph optimization problem, i.e., one of finding retiming values for its nodes so that the minimum nonzero edge weight in the graph is maximized. We present our algorithm in three stages with each being built incrementally on the preceding one. Second, the optimal code for a loop is generated from the retimed graph of the loop found in the first phase. We demonstrate the effectiveness of our optimal algorithm by comparing with a number of representative nonoptimal algorithms using a set of benchmarks frequently used in prior work and a set of graphs generated by TGFF. Index Terms—Loop parallelization, loop transformation, retiming, data dependence graph, iterationlevel parallelism. Ç 1
EURASIP Journal on Applied Signal Processing 2005:16, 2641–2654 c ○ 2005 Hindawi Publishing Corporation SoftExplorer: Estimating and Optimizing the Power and Energy Consumption of a C Program for DSP Applications
, 2004
"... We present a method to estimate the power and energy consumption of an algorithm directly from the C program. Three models are involved: a model for the targeted processor (the power model), a model for the algorithm, and a model for the compiler (the prediction model). A functionallevel power anal ..."
Abstract
 Add to MetaCart
(Show Context)
We present a method to estimate the power and energy consumption of an algorithm directly from the C program. Three models are involved: a model for the targeted processor (the power model), a model for the algorithm, and a model for the compiler (the prediction model). A functionallevel power analysis is performed to obtain the power model. Five power models have been developed so far, for different architectures, from the simple RISC ARM7 to the very complex VLIW DSP TI C64. Important phenomena are taken into account, like cache misses, pipeline stalls, and internal/external memory accesses. The model for the algorithm expresses the algorithm’s influence over the processor’s activity. The prediction model represents the behavior of the compiler, and how it will allow the algorithm to use the processor’s resources. The data mapping is considered at that stage. We have developed a tool, SoftExplorer, which performs estimation both at the Clevel and the assembly level. Estimations are performed on reallife digital signal processing applications with average errors of 4.2 % at the Clevel and 1.8 % at the assembly level. We present how SoftExplorer can be used to optimize the consumption of an application. We first show how to find the best data mapping for an algorithm. Then we demonstrate a method to choose the processor and its operating frequency in order to minimize the global energy consumption.