Results 1 -
7 of
7
Code compaction of an operating system kernel
- In Proceedings of Code Generation and Optimization (CGO
, 2007
"... General-purpose operating systems, such as Linux, are increasingly being used in embedded systems. Computational resources are usually limited, and embedded processors often have a limited amount of memory. This makes code size especially important. This paper describes techniques for automatically ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
General-purpose operating systems, such as Linux, are increasingly being used in embedded systems. Computational resources are usually limited, and embedded processors often have a limited amount of memory. This makes code size especially important. This paper describes techniques for automatically reducing the memory footprint of general-purpose operating systems on embedded platforms. The problem is complicated by the fact that kernel code tends to be quite different from ordinary application code, including the presence of a significant amount of hand-written assembly code, multiple entry points, implicit control flow paths involving interrupt handlers, and frequent indirect control flow via function pointers. We use a novel “approximate decompilation” technique to apply source-level program analysis to hand-written assembly code. A prototype implementation of our ideas on an Intel x86 platform, applied to a Linux kernel that has been configured to exclude unnecessary code, obtains a code size reduction of close to 24%. 1.
Bosschere. Automated reduction of the memory footprint of the linux kernel
- Trans. on Embedded Computing Sys
"... The limited built-in configurability of Linux can lead to expensive code size overhead when it is used in the embedded market. To overcome this problem, we propose the application of link-time compaction and specialization techniques that exploit the a priori known, fixed runtime environment of many ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The limited built-in configurability of Linux can lead to expensive code size overhead when it is used in the embedded market. To overcome this problem, we propose the application of link-time compaction and specialization techniques that exploit the a priori known, fixed runtime environment of many embedded systems. In experimental setups based on the ARM XScale and i386 platforms, the proposed techniques are able to reduce the kernel memory footprint with over 16%. We also show how relatively simple additions to existing binary rewriters can implement the proposed techniques for a complex, very unconventional program, such as the Linux kernel. We note that even after specialization, a lot of seemingly unnecessary code remains in the kernel and propose to reduce the footprint of this code by applying code-compression techniques. This technique, combined with the previous ones, reduces the memory footprint with over 23% for the i386 platform and 28 % for the ARM platform. Finally, we pinpoint an important code size growth problem when compaction and compression techniques are combined on the ARM platform.
Biray rewriting of an operating system kernel
- In Proc. Workshop on Binary Instrumentation and Applications
, 2006
"... This paper deals with some of the issues that arise in the context of binary rewriting and instrumentation of an operating system kernel. OS kernels are very different from ordinary application code in many ways, e.g., they contain a significant ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
This paper deals with some of the issues that arise in the context of binary rewriting and instrumentation of an operating system kernel. OS kernels are very different from ordinary application code in many ways, e.g., they contain a significant
Jit instrumentation: a novel approach to dynamically instrument operating systems
- In EuroSys ’07: Proceedings of the 2007 conference on EuroSys (New
"... As modern operating systems become more complex, understanding their inner workings is increasingly difficult. Dynamic kernel instrumentation is a well established method of obtaining insight into the workings of an OS, with applications including debugging, profiling and monitoring, and security au ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
As modern operating systems become more complex, understanding their inner workings is increasingly difficult. Dynamic kernel instrumentation is a well established method of obtaining insight into the workings of an OS, with applications including debugging, profiling and monitoring, and security auditing. To date, all dynamic instrumentation systems for operating systems follow the probe-based instrumentation paradigm. While efficient on fixed-length instruction set architectures, probes are extremely expensive on variable-length ISAs such as the popular Intel x86 and AMD x86-64. We propose using just-in-time (JIT) instrumentation to overcome this problem. While common in user space, JIT instrumentation has not until now been attempted in kernel space. In this work, we show the feasibility and desirability of kernel-based JIT instrumentation for operating systems with our novel prototype, implemented as a Linux kernel module. The prototype is fully SMP capable. We evaluate our prototype against the popular Kprobes Linux instrumentation tool. Our prototype outperforms Kprobes, at both micro and macro levels, by orders of magnitude when applying medium- and fine-grained instrumentation.
Link-time binary rewriting techniques for program compaction
- ACM Transactions on Programming Languages and Systems
, 2005
"... Small program size is an important requirement for embedded systems with limited amounts of memory. We describe how link-time compaction through binary rewriting can achieve code size reductions of up to 62 % for statically bound languages such as C, C++, and Fortran, without compromising on perform ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Small program size is an important requirement for embedded systems with limited amounts of memory. We describe how link-time compaction through binary rewriting can achieve code size reductions of up to 62 % for statically bound languages such as C, C++, and Fortran, without compromising on performance. We demonstrate how the limited amount of information about a program at link time can be exploited to overcome overhead resulting from separate compilation. This is done with scalable, cost-effective, whole-program analyses, optimizations, and duplicate code and data elimination techniques. The discussed techniques are evaluated and their cost-effectiveness is quantified with SQUEEZE++, a prototype link-time compactor. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors—Code generation; compilers; optimization; E.4 [Coding and Information Theory]: Data compaction and compression
Dynamic code management: Improving whole program code locality in managed runtimes
- In VEE ’06: Proc. of the Intl. Conf. on Virtual Execution Environments
, 2006
"... Poor code locality degrades application performance by increasing memory stalls due to instruction cache and TLB misses. This problem is particularly an issue for large server applications written in languages such as Java and C # that provide just-in-time (JIT) compilation, dynamic class loading, a ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Poor code locality degrades application performance by increasing memory stalls due to instruction cache and TLB misses. This problem is particularly an issue for large server applications written in languages such as Java and C # that provide just-in-time (JIT) compilation, dynamic class loading, and dynamic recompilation. However, managed runtimes also offer an opportunity to dynamically profile applications and adapt them to improve their performance. This paper describes a Dynamic Code Management system (DCM) in a managed runtime that performs whole program code layout optimizations to improve instruction locality. We begin by implementing the widely used Pettis-Hansen algorithm for method layout to improve code locality. Unfortunately, this algorithm is too costly for a dynamic optimization system, O(n3) in time in the call graph. For example, Pettis-Hansen requires a prohibitively expensive 35 minutes to lay out MiniBean which has 15,586 methods. We propose three new code placement algorithms that target ITLB misses, which typically have the greatest impact on performance. The best of these algorithms, Code Tiling, groups methods into page sized tiles by performing a depth-first traversal of the call graph based on call frequency. Excluding overhead, experimental results show that DCM with Code Tiling improves performance by 6 % on the large MiniBean benchmark over a baseline that orders methods based on invocation order, whereas Pettis-Hansen placement offers less improvement, 2%, over the same base. Furthermore, Code Tiling lays out MiniBean in just 0.35 seconds for 15,586 methods (6000 times faster than Pettis-Hansen) which makes it suitable for high-performance managed runtimes.
The Revenge of the Overlay: Automatic Compaction of OS Kernel Code via On-Demand Code Loading ∗
"... There is increasing interest in using general-purpose operating systems, such as Linux, on embedded platforms. It is especially important in embedded systems to use memory efficiently because embedded processors often have limited physical memory. This paper describes an automatic technique for redu ..."
Abstract
- Add to MetaCart
There is increasing interest in using general-purpose operating systems, such as Linux, on embedded platforms. It is especially important in embedded systems to use memory efficiently because embedded processors often have limited physical memory. This paper describes an automatic technique for reducing the memory footprint of general-purpose operating systems on embedded platforms by keeping infrequently executed code on secondary storage and loading such code only if it is needed at run time. Our technique is based on an old idea—memory overlays—and it does not require hardware or operating system support for virtual memory. A prototype of the technique has been implemented for the Linux kernel. We evaluate our approach with two benchmark suites: MiBench and MediaBench, and a Web server application. The experimental results show that our approach reduces memory requirements for the Linux kernel code by about 53 % with little degradation in performance.

