Results 1 - 10
of
17
Practical implementation of rank and select queries
- In Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA’05) (Greece
, 2005
"... Research on succinct data structures has made significant progress in recent years. An essential building block of many of those techniques is a data structure to perform rank and select operations over a bit array. The first operation tells how many bits are set up to some position, and the second ..."
Abstract
-
Cited by 30 (13 self)
- Add to MetaCart
Research on succinct data structures has made significant progress in recent years. An essential building block of many of those techniques is a data structure to perform rank and select operations over a bit array. The first operation tells how many bits are set up to some position, and the second the position of the i-th bit set. Albeit there exist constanttime solutions that require sublinear extra space, the practicality of those solutions against more naive ones has not been carefully studied. In this paper we show some results in this respect, which suggest that in many practical cases the simpler solutions are better in terms of time and extra space.
Planet-Sized Batched Dynamic Adaptive Meshes (P-BDAM)
"... We describe an efficient technique for out-of-core management and interactive rendering of planet sized textured terrain surfaces. The technique, called P-Batched Dynamic Adaptive Meshes (P- BDAM), extends the BDAM approach by using as basic primitive a general triangulation of points on a displaced ..."
Abstract
-
Cited by 29 (7 self)
- Add to MetaCart
We describe an efficient technique for out-of-core management and interactive rendering of planet sized textured terrain surfaces. The technique, called P-Batched Dynamic Adaptive Meshes (P- BDAM), extends the BDAM approach by using as basic primitive a general triangulation of points on a displaced triangle. The proposed framework introduces several advances with respect to the state of the art: thanks to a batched host-to-graphics communication model, we outperform current adaptive tessellation solutions in terms of rendering speed; we guarantee overall geometric continuity, exploiting programmable graphics hardware to cope with the accuracy issues introduced by single precision floating points; we exploit a compressed out of core representation and speculative prefetching for hiding disk latency during rendering of out-of-core data; we efficiently construct high quality simplified representations with a novel distributed out of core simplification algorithm working on a standard PC network.
DULO: An effective buffer cache management scheme to exploit both temporal and spatial localities
- In USENIX Conference on File and Storage Technologies (FAST
, 2005
"... Sequentiality of requested blocks on disks, or their spatial locality, is critical to the performance of disks, where the throughput of accesses to sequentially placed disk blocks can be an order of magnitude higher than that of accesses to randomly placed blocks. Unfortunately, spatial locality of ..."
Abstract
-
Cited by 23 (9 self)
- Add to MetaCart
Sequentiality of requested blocks on disks, or their spatial locality, is critical to the performance of disks, where the throughput of accesses to sequentially placed disk blocks can be an order of magnitude higher than that of accesses to randomly placed blocks. Unfortunately, spatial locality of cached blocks is largely ignored and only temporal locality is considered in system buffer cache management. Thus, disk performance for workloads without dominant sequential accesses can be seriously degraded. To address this problem, we propose a scheme called DULO (DUal LOcality), which exploits both temporal and spatial locality in buffer cache management. Leveraging the filtering effect of the buffer cache, DULO can influence the I/O request stream by making the requests passed to disk more sequential, significantly increasing the effectiveness of I/O scheduling and prefetching for disk performance improvements. DULO has been extensively evaluated by both tracedriven simulations and a prototype implementation in Linux 2.6.11. In the simulations and system measurements, various application workloads have been tested, including Web Server, TPC benchmarks, and scientific programs. Our experiments show that DULO can significantly increase system throughput and reduce program execution times. 1
Understanding the Linux 2.6.8.1 CPU Scheduler
- SGI, 2005. http://josh.trancesoftware.com/linux/linux_cpu_scheduler.pdf, accessed on August
, 2005
"... This paper on the Linux 2.6.8.1 scheduler was inspired by Mel Gorman's thesis on the Linux virtual memory (VM) system [6], which current Linux VM developers probably reference and value more than any other piece of documentation on the subject ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
This paper on the Linux 2.6.8.1 scheduler was inspired by Mel Gorman's thesis on the Linux virtual memory (VM) system [6], which current Linux VM developers probably reference and value more than any other piece of documentation on the subject
Eliminating the Threat of Kernel Stack Overflows
"... The Linux kernel stack has a fixed size. There is no mechanism to prevent the kernel from overflowing the stack. Hackers can exploit this bug to put unwanted information in the memory of the operating system and gain control over the system. In order to prevent this problem, we introduce a dynamical ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
The Linux kernel stack has a fixed size. There is no mechanism to prevent the kernel from overflowing the stack. Hackers can exploit this bug to put unwanted information in the memory of the operating system and gain control over the system. In order to prevent this problem, we introduce a dynamically sized kernel stack that can be integrated into the standard Linux kernel. The well-known paging mechanism is reused with some changes, in order to enable the kernel stack to grow. 1.
Performance analysis of Linux networking packet receiving
- In Proceedings of Computing in High Energy and Nuclear Physics
, 2006
"... The computing models for High-Energy Physics experiments are becoming ever more globally distributed and grid-based, both for technical reasons (e.g., to place computational and data resources near each other and the demand) and for strategic reasons (e.g., to leverage equipment investments). To sup ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The computing models for High-Energy Physics experiments are becoming ever more globally distributed and grid-based, both for technical reasons (e.g., to place computational and data resources near each other and the demand) and for strategic reasons (e.g., to leverage equipment investments). To support such computing models, the network and end systems, computing and storage, face unprecedented challenges. One of the biggest challenges is to transfer scientific data sets – now in the multi-petabyte (10 15 bytes) range and expected to grow to exabytes within a decade – reliably and efficiently among facilities and computation centers scattered around the world. Both the network and end systems should be able to provide the capabilities to support high bandwidth, sustained, end-to-end data transmission. Recent trends in technology are showing that although the raw transmission speeds used in networks are increasing rapidly, the rate of advancement of microprocessor technology has slowed down. Therefore, network protocol-processing overheads have risen sharply in comparison with the time spent in packet transmission, resulting in degraded throughput for networked applications. More and more, it is the network end system, instead of the network, that is responsible for degraded performance of network applications. In this paper, the Linux system’s packet receive process is studied from NIC to application. We develop a mathematical model to characterize the Linux packet receiving process. Key factors that affect Linux systems ’ network performance are analyzed. Keywords: Linux, TCP/IP, protocol stack, process scheduling, performance analysis 1.
Application Buffer-Cache Management for Performance: Running the World’s Largest MRTG
"... An operating system’s readahead and buffer-cache behaviors can significantly impact application performance; most often these better performance, but occasionally they worsen it. To avoid unintended I/O latencies, many database systems sidestep these OS features by minimizing or eliminating applicat ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
An operating system’s readahead and buffer-cache behaviors can significantly impact application performance; most often these better performance, but occasionally they worsen it. To avoid unintended I/O latencies, many database systems sidestep these OS features by minimizing or eliminating application file I/O. However, network traffic measurement applications are commonly built instead atop a high-performance file-based database: the Round Robin Database (RRD) Tool. While RRD is successful, experience has led the network operations community to believe that its scalability is limited to tens of thousands of, or perhaps one hundred thousand, RRD files on a single system, keeping it from being used to measure the largest managed networks today. We identify the bottleneck responsible for that experience and present two approaches to overcome it. In this paper, we provide a method and tools to expose the readahead and buffer-cache behaviors that are otherwise hidden from the user. We apply our method to a very large network traffic measurement system that experiences scalability problems and determine the performance bottleneck to be unnecessary disk reads, and page faults, due to the default readahead behavior. We develop both a simulation and an analytical model of the performance-limiting page fault rate for RRD file updates. We develop and evaluate two approaches that alleviate this problem: application advice to disable readahead and application-level caching. We demonstrate their effectiveness by configuring and operating the world’s largest 1 Multi-Router Traffic Grapher (MRTG), with approximately 320,000 RRD files, and over half a million data points measured every five minutes. Conservatively, our techniques approximately triple the capacity of very large MRTG and other RRD-based measurement systems.
Enery Efficient Dynamic Memory Bank and NV Swap Device Management
"... As demand for mobile devices increases, prolonging battery life has been a focus of mobile device manufacturers. While manufactures support a partial self-refresh capability for MSDRAMs (Mobile SDRAM), most operating systems do not include this feature due to the complexity of the memory management. ..."
Abstract
- Add to MetaCart
As demand for mobile devices increases, prolonging battery life has been a focus of mobile device manufacturers. While manufactures support a partial self-refresh capability for MSDRAMs (Mobile SDRAM), most operating systems do not include this feature due to the complexity of the memory management. Utilizing this capability correctly could potentially reduce the amount of power consumed by MSDRAMs while a system is in a suspended mode. The goal of this project focuses on implementing the partial self-refresh capability on ARM Linux and saving the maximum amount of power while the system is in the suspended mode. 1
LINUX VM – COMPARING VIRTUAL MEMORY PERFORMANCE BETWEEN LINUX VERSION 2.4 AND 2.6 ON LOW MEMORY SYSTEM
, 2004
"... Abstract: In general, computer systems have limited amount of physical memory. Operating systems have to provide an illusion of having unlimited amount of memory in order to service the demands of processes that exists within the system. Hence, the concept of virtual memory was introduced into opera ..."
Abstract
- Add to MetaCart
Abstract: In general, computer systems have limited amount of physical memory. Operating systems have to provide an illusion of having unlimited amount of memory in order to service the demands of processes that exists within the system. Hence, the concept of virtual memory was introduced into operating systems as a way for providing such an illusion. Since the Linux OS is gaining popularity in both the desktop and enterprise market, it is crucial for us to determine the characteristics of the VM subsystem of the Linux OS. In this paper, we intend to compare and provide some benchmark results of the VM subsystem between version 2.4 and 2.6 of the Linux kernel. © 2004 FSU 1.
Virtual Memory-Induced Priority Inversion in Multi-Tasked Systems
, 2003
"... Virtual memory (VM) sub-systems in many widely adopted desktop and server operating systems rely on approximations of the least-recently-used (LRU) heuristic to select pages for replacement. These heuristics work well when memory is abundant, but they produce counter-intuitive behavior when applicat ..."
Abstract
- Add to MetaCart
Virtual memory (VM) sub-systems in many widely adopted desktop and server operating systems rely on approximations of the least-recently-used (LRU) heuristic to select pages for replacement. These heuristics work well when memory is abundant, but they produce counter-intuitive behavior when applications' memory demands substantially exceeds the available physical memory. This paper describes the results of preliminary experiments with a new instrumentation framework that observes Linux VM behavior in a controlled setting. Repeated experiments with a microbenchmark consistently reveal three types of misbehavior. First, the CPU scheduler's intended priorities can be inverted for an indefinite period of time when low-priority processes push higher-priority processes out of memory. Second, the VM heuristics can perpetually assign unequal amounts of memory to simultaneously running, identical processes. Finally, processes with modest memory requirements experience execution delays during periods of memory shortage.

