Results 11 - 20
of
25
Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer
- In MICRO
, 2008
"... It is well recognized that LRU cache-line replacement can be ineffective for applications with large working sets or non-localized memory access patterns. Specifically, in lastlevel processor caches, LRU can cause cache pollution by insertingnon-reuseableelementsintothecachewhileevicting reusable on ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
It is well recognized that LRU cache-line replacement can be ineffective for applications with large working sets or non-localized memory access patterns. Specifically, in lastlevel processor caches, LRU can cause cache pollution by insertingnon-reuseableelementsintothecachewhileevicting reusable ones. The work presented in this paper addresses last-levelcachepollutionthroughadynamicoperatingsystem mechanism, called ROCS, requiring no change to underlying hardware and no change to applications. ROCS employs hardware performance counters on a commodity processor to characterize application cache behavior at run-time. Using this online profiling, cache unfriendly pages are dynamically mapped to a pollute buffer in the cache, eliminating competition between reusable and nonreusable cache lines. The operating system implements the pollute buffer through a page-coloring based technique, by dedicating a small slice of the last-level cache to store nonreusable pages. Measurements show that ROCS, implemented in the Linux 2.6.24 kernel and running on a 2.3GHz PowerPC 970FX, improves performance of memory-intensive SPEC CPU 2000 and NAS benchmarks by up to 34%, and 16% on average. 1.
Efficient cache attacks on AES, and countermeasures
- Journal of Cryptology, available online
, 2009
"... Abstract. We describe several software side-channel attacks based on inter-process leakage through the state of the CPU’s memory cache. This leakage reveals memory access patterns, which can be used for cryptanalysis of cryptographic primitives that employ data-dependent table lookups. The attacks a ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Abstract. We describe several software side-channel attacks based on inter-process leakage through the state of the CPU’s memory cache. This leakage reveals memory access patterns, which can be used for cryptanalysis of cryptographic primitives that employ data-dependent table lookups. The attacks allow an unprivileged process to attack other processes running in parallel on the same processor, despite partitioning methods such as memory protection, sandboxing and virtualization. Some of our methods require only the ability to trigger services that perform encryption or MAC using the unknown key, such as encrypted disk partitions or secure network links. Moreover, we demonstrate an extremely strong type of attack, which requires knowledge of neither the specific plaintexts nor ciphertexts, and works by merely monitoring the effect of the cryptographic process on the cache. We discuss in detail several attacks on AES, and experimentally demonstrate their applicability to real systems, such as OpenSSL and Linux’s dm-crypt encrypted partitions (in the latter case, the full key was recovered after just 800 writes to the partition, taking 65 milliseconds). Finally, we discuss a variety of countermeasures which can be used to mitigate such attacks.
Towards Real µ-Kernels
, 1996
"... ions are costly and restrict flexibility. The -kernel should only multiplex hardware primitives in a secure way. The current exokernel is tailored to the Mips architecture and gets excellent performance for kernel primitives. It is based on the philosophy that a kernel should not provide abstractio ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
ions are costly and restrict flexibility. The -kernel should only multiplex hardware primitives in a secure way. The current exokernel is tailored to the Mips architecture and gets excellent performance for kernel primitives. It is based on the philosophy that a kernel should not provide abstractions but only a minimal set of primitives (although the Exokernel includes device drivers). Consequently, the Exokernel interface is architecture dependent, in particular dedicated to software-controlled TLBs. The basic communication primitive is the protected control transfer which crosses address spaces but does not transfer arguments. A lightweight remote procedure call based on this primitives takes 10 s on an R3000 while Mach RPC needs 95 s. The open question: might the right abstractions perform better and lead to better structured and more efficient applications than Exokernel's primitives ? L4 has been developed at GMD. It is based on the theses that ffl Efficiency and flexibility r...
A survey of virtualization technologies
, 2005
"... Virtualization is a technology that combines or divides computing resources to present one or many operating environments using methodologies like hardware and software partitioning or aggregation, partial or complete machine simulation, emulation, time-sharing, and others. Virtualization technologi ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Virtualization is a technology that combines or divides computing resources to present one or many operating environments using methodologies like hardware and software partitioning or aggregation, partial or complete machine simulation, emulation, time-sharing, and others. Virtualization technologies find important applications over a wide range of areas such as server consolidation, secure computing platforms, supporting multiple operating systems, kernel debugging and development, system migration, etc, resulting in widespread usage. Most of them present similar operating environments to the end user; however, they tend to vary widely in their levels of abstraction they operate at and the underlying architecture. This paper surveys a wide range of virtualization technologies, analyzes their architecture and implementation, and proposes a taxonomy to categorize them on the basis of their abstraction levels. The paper identifies the following abstraction levels: instruction set level, hardware abstraction layer (HAL) level, operating system level, library level and application level virtual machines. It studies examples from each of the categories and provides relative comparisons. It also gives a broader perpective of the virtualization technologies and gives an insight that can be extended to accommodate future virtualization technologies under this taxonomy. The paper proposes the concept of an extremely lightweight technology, which we call as Featherweight Virtual Machine (FVM), that can be used to ”try out ” untrusted programs in a realistic environment without causing any permanent damage to the system. Finally, it demonstrates FVM’s effectiveness by applying it to two applications: secure mobile code execution and automatic clean uninstall of Windows programs. 1
Towards Practical Page Coloring-based Multi-core Cache Management
, 2009
"... Modern multi-core processors present new resource managementchallengesduetothesubtleinteractions ofsimultaneouslyexecutingprocessessharingon-chip resources(particularly the L2 cache). Recent research demonstrates that the operating system may use the page coloring mechanism to control cache partitio ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Modern multi-core processors present new resource managementchallengesduetothesubtleinteractions ofsimultaneouslyexecutingprocessessharingon-chip resources(particularly the L2 cache). Recent research demonstrates that the operating system may use the page coloring mechanism to control cache partitioning, and consequently to achieve fair and efficient cache utilization. However, page coloring places additional constraints on memory space allocation, which may conflict with application memory needs. Further, adaptive adjustments of cache partitioning policies in a multi-programmed execution environment may incur substantialoverheadforpagerecoloring(orcopying). This paper proposes a hot-page coloring approach— enforcingcoloringononlya smallsetoffrequentlyaccessed (or hot) pages for each process. The cost of identifying hot pagesonlineisreducedbyleveragingtheknowledgeofspatial locality duringapage table scan of accessbits. Our results demonstrate that hot page identification and selective coloringcansignificantlyalleviatethecoloring-induced adverse effects in practice. However, we also reach the somewhat negative conclusion that without additional hardware support, adaptive page coloring is only beneficial when recoloring is performed infrequently (meaning long scheduling time quanta in multi-programmed executions).
The Impact of Software Structure and Policy on CPU and Memory System Performance
, 1994
"... Operating systems, when compared to application programs, have received disappointingly little benefit from the performance improvements of the most recent generation of microprocessors. This thesis used complete traces of software activity from a RISCbased uniprocessor to expose the dynamic behavio ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Operating systems, when compared to application programs, have received disappointingly little benefit from the performance improvements of the most recent generation of microprocessors. This thesis used complete traces of software activity from a RISCbased uniprocessor to expose the dynamic behavior of operating system execution and explore the sources of poor performance. Traces from both Mach 3.0 and Ultrix implementations of UNIX permitted a study of performance differences between microkernel and monolithic implementations of the same operating system interface. The comparison showed that both system structure and policy implemented in the system have a significant impact on performance. Measurements of X11 workloads showed that memory system behavior for these large workloads differs significantly from the kinds of workloads traditionally used for performance analysis. Structural and behavioral similarities between large X11 workloads and the operating system are reflected in the...
Unshackle the Cloud!
"... Infrastructure-as-a-Service (IaaS) clouds are evolving from offering simple on-demand resources to providing diverse sets of tightly-coupled monolithic services. Like OSkernelsofthe1980’sand1990’s,thesemonolithicofferings,albeitrichinfeatures,aresignificantlyconstraining users ’ freedom and control ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Infrastructure-as-a-Service (IaaS) clouds are evolving from offering simple on-demand resources to providing diverse sets of tightly-coupled monolithic services. Like OSkernelsofthe1980’sand1990’s,thesemonolithicofferings,albeitrichinfeatures,aresignificantlyconstraining users ’ freedom and control over the underlying— cloud—resources. For example, we are unaware of a true hybrid cloud, where its users can migrate virtual machines freely across clouds. This paper argues for a new type of IaaS cloud, an xCloud, that builds on ideas fromextensible OSsto give usersthe flexibilityto install custom cloud extensions, which can address the limitations outlined above. We describe the design space for xClouds, includingapractical approachfor transforming today’spublic cloudsintoxClouds.
P Salverda Implications of Emerging DRAM Technologies for the RAMpage Memory Hierarchy
- Proc. SAICSIT’98, Gordon’s Bay, South Africa
, 1998
"... The RAMpage memory hierarchy is an attempt at devising a comprehensive strategy to address the growing DRAM-CPU speed gap. By moving the main memory up a level to the SRAM currently used to implement the lowest-level cache, a RAMpage system in effect implements a fully associative cache with no hit ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The RAMpage memory hierarchy is an attempt at devising a comprehensive strategy to address the growing DRAM-CPU speed gap. By moving the main memory up a level to the SRAM currently used to implement the lowest-level cache, a RAMpage system in effect implements a fully associative cache with no hit penalty (in the best case). Ordinary DRAM is relegated to a paging device. This paper shows that even with an aggressive SDRAM conventional main memory (or equivalently the new Direct Rambus design proposed for 1999), a RAMpage hierarchy is over 16 % faster than a conventional 2-level cache design, with a highend CPU of a speed likely to be delivered in 1998. Further optimizations of the RAMpage hierarchy, such as context switches on misses, are likely to further improve this result.
Configurable Platforms with Dynamic Platform Management: An Efficient Alternative to Application-Specific System-on-Chips
"... Emerging trends in system design indicate that in the future, a large role will be played by System-on-Chip (SoC) platforms consisting of general-purpose, configurable components. Commercially available SoC platforms provide some degrees of configurability, most of which are limited to one-time (sta ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Emerging trends in system design indicate that in the future, a large role will be played by System-on-Chip (SoC) platforms consisting of general-purpose, configurable components. Commercially available SoC platforms provide some degrees of configurability, most of which are limited to one-time (static) customization of platform hardware. However, with increasing application diversity, time-varying requirements, and the convergence of multiple applications on the same platform, there is a growing need for SoC platforms that can be dynamically configured in order to adapt to changing requirements.

