Results 1 -
5 of
5
Probabilistic source-level optimisation of embedded programs
- In Proceedings of the Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES
, 2005
"... Efficient implementation of DSP applications is critical for many embedded systems. Optimising C compilers for embedded processors largely focus on code generation and instruction scheduling which, with their growing maturity, are providing diminishing returns. This paper empirically evaluates anoth ..."
Abstract
-
Cited by 22 (14 self)
- Add to MetaCart
Efficient implementation of DSP applications is critical for many embedded systems. Optimising C compilers for embedded processors largely focus on code generation and instruction scheduling which, with their growing maturity, are providing diminishing returns. This paper empirically evaluates another approach, namely source-level transformations and the probabilistic feedback-driven search for “good ” transformation sequences within a large optimisation space. This novel approach combines two selection methods: one based on exploring the optimisation space, the other focused on localised search of good areas. This technique was applied to the UTDSP benchmark suite on two digital signal and multimedia processors (Analog Devices TigerSHARC TS-101, Philips TriMedia TM-1100) and an embedded processor derived from a popular general-purpose processor architecture (Intel Celeron 400). On average, our approach gave a factor of 1.71 times improvement across all platforms equivalent to an average 41 % reduction in execution time, outperforming existing approaches. In certain cases a speedup of up to ≈ 7 was found for individual benchmarks.
Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming
- in Proc. of CODES/ISSS. ACM
, 2010
"... The last years have shown that there is no way to disregard the advantages provided by multiprocessor System-on-Chip (MPSoC) architectures in the embedded systems domain. Using multiple cores in a single system enables to close the gap between energy consumption, problems concerning heat dissipation ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The last years have shown that there is no way to disregard the advantages provided by multiprocessor System-on-Chip (MPSoC) architectures in the embedded systems domain. Using multiple cores in a single system enables to close the gap between energy consumption, problems concerning heat dissipation, and computational power. Nevertheless, these benefits do not come for free. New challenges arise, if existing applications have to be ported to these multiprocessor platforms. One of the most ambitious tasks is to extract efficient parallelism from these existing sequential applications. Hence, many parallelization tools have been developed, most of them are extracting as much parallelism as possible, which is in general not the best choice for embedded systems with their limitations in hardware and software support. In contrast
An Opcode Encoding Method for Low-Power Instruction Fetch
"... Abstract | In designing today's mobile embedded systems such as cellular phones and PDAs, power consumption is an important design constraint. In a CMOS circuit, switching activity accounts for over 90 % of total power dissipation. In this paper, we describe a method of encoding opcodes for low-powe ..."
Abstract
- Add to MetaCart
Abstract | In designing today's mobile embedded systems such as cellular phones and PDAs, power consumption is an important design constraint. In a CMOS circuit, switching activity accounts for over 90 % of total power dissipation. In this paper, we describe a method of encoding opcodes for low-power instruction fetchby reducing the switching activityfrom the instruction fetch logic. To reduce the switching activity from the instruction-fetch logic, our method encodes opcodes so that more frequently consecutive instruction pairs have a smaller Hamming distance between their opcodes. Our experiment shows that the switching activity reduction of 36.4 % to 66.7 % is achievable over a naive encoding method. switching activity from the instruction fetch logic. Many redundant bit changes between consecutive instructions can be removed by encoding opcodes so that more frequently consecutive instruction pairs have a smaller Hamming distance between their opcodes. In principle, our method is similar to Gray code addressing [4] and register relabeling [5], in that digital values are statically encoded to minimize the number of bit changes by the values. However, we believe that this is the rst attempt applying the low-power encoding scheme for the opcode encoding. For benchmark programs we tested, we were able to reduce the switching activity by 36.4 % to 66.7% over a naive encoding method. We explain the opcode encoding method in Section II and report the experimental results in Section III. I.
Using Genetic Programming for Source-Level Data Assignment to Dual Memory Banks
"... Abstract. Due to their streaming nature, memory bandwidth is critical for most digital signal processing applications. To accommodate these bandwidth requirements digital signal processors are typically equipped with dual memory banks that enable simultaneous access to two operands if the data is pa ..."
Abstract
- Add to MetaCart
Abstract. Due to their streaming nature, memory bandwidth is critical for most digital signal processing applications. To accommodate these bandwidth requirements digital signal processors are typically equipped with dual memory banks that enable simultaneous access to two operands if the data is partitioned appropriately. Fully automated and compiler integrated approaches to data partitioning and memory bank assignment have, however, found little acceptance by DSP software developers. This is partly due to the inflexibility of the approach and their inability to cope with certain manual data pre-assignments, e.g. due to I/O constraints. In this paper we build upon a more flexible source-level approach where code generation targets DSP-C [1], using genetic programming to overcome the issues previously experienced with high-level memory bank assignment. We have evaluated our approach on an Analog Devices Tiger-SHARC DSP and achieved performance gains of up to 1.57 on 13 UTDSP benchmarks. 1
U N I V E R S I
"... With today’s processing hardware being multicore, and development directing to even more cores in every system, it is crucial to take advantage of all the cores available in a system. Traditionally, parallelization assumes the system is available exclusively to that one workload. This work proposes ..."
Abstract
- Add to MetaCart
With today’s processing hardware being multicore, and development directing to even more cores in every system, it is crucial to take advantage of all the cores available in a system. Traditionally, parallelization assumes the system is available exclusively to that one workload. This work proposes a cooperative strategy for OpenMP to avoid system overloading when running multiple parallelized programs. It allows multiple programs to optimize towards best system performance. This is done by communicating individual estimated speedups under workloads, and using that to compute an estimated optimal thread allocation. Experiments show it achieves more than 98 % of the optimal performance on average. Additional enhancements make the strategy more resistant to exploitation. Depending on the optimization target and detection probability, this can make exploitation attempts unprofitable. i Acknowledgements Many thanks to Kousha Etessami for a chat that changed this projects perspective, and made it much more interesting. Also, I would like to thank everyone who helped

