Results 1 - 10
of
13
Remote physical device fingerprinting
"... We introduce the area of remote physical device fingerprinting, or fingerprinting a physical device, as opposed to an operating system or class of devices, remotely, and without the fingerprinted device’s known cooperation. We accomplish this goal by exploiting small, microscopic deviations in devic ..."
Abstract
-
Cited by 78 (7 self)
- Add to MetaCart
We introduce the area of remote physical device fingerprinting, or fingerprinting a physical device, as opposed to an operating system or class of devices, remotely, and without the fingerprinted device’s known cooperation. We accomplish this goal by exploiting small, microscopic deviations in device hardware: clock skews. Our techniques do not require any modification to the fingerprinted devices. Our techniques report consistent measurements when the measurer is thousands of miles, multiple hops, and tens of milliseconds away from the fingerprinted device, and when the fingerprinted device is connected to the Internet from different locations and via different access technologies. Further, one can apply our passive and semi-passive techniques when the fingerprinted device is behind a NAT or firewall, and also when the device’s system time is maintained via NTP or SNTP. One can use our techniques to obtain information about whether two devices on the Internet, possibly shifted in time or IP addresses, are actually the same physical device. Example applications include: computer forensics; tracking, with some probability, a physical device as it connects to the Internet from different public access points; counting the number of devices behind a NAT even when the devices use constant or random IP IDs; remotely probing a block of addresses to determine if the addresses correspond to virtual hosts, e.g., as part of a virtual honeynet; and unanonymizing anonymized network traces.
Flexible CoScheduling: Mitigating Load Imbalance and Improving Utilization of Heterogeneous Resources
- Proc. Int. Parallel and Distributed Processing Symposium (IPDPS'03
, 2002
"... Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically achieved by space slicing with variable partitioning, wherein nodes are dedicated for the duration of the run, or by gang scheduling, wherein time ..."
Abstract
-
Cited by 18 (6 self)
- Add to MetaCart
Fine-grained parallel applications require all their processes to run simultaneously on distinct processors to achieve good efficiency. This is typically achieved by space slicing with variable partitioning, wherein nodes are dedicated for the duration of the run, or by gang scheduling, wherein time slicing is coordinated across processors. Both schemes suffer from fragmentation, where processors are left idle because jobs cannot be packed with 100% efficiency. Naturally, this leads to reduced utilization and sub-optimal performance. Flexible coscheduling (FCS) solves this problem by monitoring each job's granularity and communication activity, and using gang scheduling only for those jobs for which it is appropriate. Processes from other jobs, which can be scheduled without any constraints, are used as filler to reduce fragmentation. In addition, inefficiencies due to load imbalance and hardware heterogeneity are also reduced because the classification is done on a per-process basis. FCS has been fully implemented as part of the STORM resource manager, and shown to be competitive with gang scheduling and implicit coscheduling.
Desktop Scheduling: How Can We Know What The User Wants?
, 2004
"... Current desktop operating systems use CPU utilization (or lack thereof) to prioritize processes for scheduling. This was thought to be beneficial for interactive processes, under the assumption that they spend much of their time waiting for user input. This reasoning fails for modern multimedia appl ..."
Abstract
-
Cited by 16 (5 self)
- Add to MetaCart
Current desktop operating systems use CPU utilization (or lack thereof) to prioritize processes for scheduling. This was thought to be beneficial for interactive processes, under the assumption that they spend much of their time waiting for user input. This reasoning fails for modern multimedia applications. For example, playing a movie in parallel with a heavy background job usually leads to poor graphical results, as these jobs are indistinguishable in terms of CPU usage. Suggested solutions involve shifting the burden to the user or programmer, which we claim is unsatisfactory; instead, we seek an automatic solution. Our attempts using new metrics based on CPU usage failed. We therefore propose and implement a novel scheme of identifying interactive and multimedia applications by directly quantifying the I/O between an application and the user (keyboard, mouse, and screen activity). Preliminary results indicate that prioritizing processes according to this metric indeed solves the aforementioned problem, demonstrating that operating systems can indeed provide better support for multimedia and interactive applications. Additionally, once user I/O data is available, it opens intriguing new possibilities to system designers.
Fine grained kernel logging with klogger: Experience and insights
- In ACM EuroSys
, 2007
"... Understanding the detailed behavior of an operating system is crucial for making informed design decisions. But such an understanding is very hard to achieve, due to the increasing complexity of such systems and the fact that they are implemented and maintained by large and diverse groups of develop ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
Understanding the detailed behavior of an operating system is crucial for making informed design decisions. But such an understanding is very hard to achieve, due to the increasing complexity of such systems and the fact that they are implemented and maintained by large and diverse groups of developers. Tools like Klogger — presented in this paper — can help by enabling fine-grained logging of system events and the sharing of a logging infrastructure between multiple developers and researchers, facilitating a methodology where design evaluation can be an integral part of kernel development. We demonstrate the need for such methodology by a host of case studies, using Klogger to better understand various subsystems in the Linux kernel, and pinpointing overheads and problems therein. 1
Process prioritization using output production: scheduling for multimedia
- ACM Trans. on Multimedia Comput. Commun. & Appl. (TOMCCAP
, 2006
"... Desktop operating systems such as Windows and Linux base scheduling decisions on CPU consumption — processes that consume fewer CPU cycles are prioritized, assuming that interactive processes gain from this as they spend most of their time waiting for user input. However, this doesn’t work for moder ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Desktop operating systems such as Windows and Linux base scheduling decisions on CPU consumption — processes that consume fewer CPU cycles are prioritized, assuming that interactive processes gain from this as they spend most of their time waiting for user input. However, this doesn’t work for modern multimedia applications, which require significant CPU resources. We therefore suggest a new metric to identify interactive processes, by explicitly measuring interactions with the user, and use it to design and implement a process scheduler. Measurements using a variety of applications indicate that this scheduler is very effective in distinguishing between competing interactive and non-interactive processes.
Secretly Monopolizing the CPU Without Superuser Privileges
"... We describe a “cheat ” attack, allowing an ordinary process to hijack any desirable percentage of the CPU cycles without requiring superuser/administrator privileges. Moreover, the nature of the attack is such that, at least in some systems, listing the active processes will erroneously show the che ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We describe a “cheat ” attack, allowing an ordinary process to hijack any desirable percentage of the CPU cycles without requiring superuser/administrator privileges. Moreover, the nature of the attack is such that, at least in some systems, listing the active processes will erroneously show the cheating process as not using any CPU resources: the “missing ” cycles would either be attributed to some other process or not be reported at all (if the machine is otherwise idle). Thus, certain malicious operations generally believed to have required overcoming the hardships of obtaining root access and installing a rootkit, can actually be launched by non-privileged users in a straightforward manner, thereby making the job of a malicious adversary that much easier. We show that most major general-purpose operating systems are vulnerable to the cheat attack, due to a combination of how they account for CPU usage and how they use this information to prioritize competing processes. Furthermore, recent scheduler changes attempting to better support interactive workloads increase the vulnerability to the attack, and naive steps taken by certain systems to reduce the danger are easily circumvented. We show that the attack can nevertheless be defeated, and we demonstreate this by implementing a patch for Linux that eliminates the problem with negligible overhead.
Missed Deadline Notification in Best-effort Schedulers
, 2004
"... It is common to run multimedia and other periodic, soft real-time applications on general-purpose computer systems. These systems use best-effort scheduling algorithms that cannot guarantee applications will receive responsive scheduling to meet deadline or timing requirements. We present a simple m ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
It is common to run multimedia and other periodic, soft real-time applications on general-purpose computer systems. These systems use best-effort scheduling algorithms that cannot guarantee applications will receive responsive scheduling to meet deadline or timing requirements. We present a simple mechanism called Missed Deadline Notification (MDN) that allows applications to notify the system when they do not receive their desired level of responsiveness. Consisting of a single system call with no arguments, this simple interface allows the operating system to provide better support for soft real-time applications without any a priori information about their timing or resource needs. We implemented MDN in three different schedulers: Linux, BEST, and BeRate. We describe these implementations and their performance when running real-time applications and discuss policies to prevent applications from abusing MDN to gain extra resources.
The Context-Switch Overhead Inflicted by Hardware Interrupts (and the Enigma of Do-Nothing Loops)
, 2007
"... The overhead of a context switch is typically associated with multitasking, where several applications share a processor. But even if only one runnable application is present in the system and supposedly runs alone, it is still repeatedly preempted in favor of a different thread of execution, namely ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The overhead of a context switch is typically associated with multitasking, where several applications share a processor. But even if only one runnable application is present in the system and supposedly runs alone, it is still repeatedly preempted in favor of a different thread of execution, namely, the operating system that services periodic clock interrupts. We employ two complementing methodologies to measure the overhead incurred by such events and obtain contradictory results. The first methodology systematically changes the interrupt frequency and measures by how much this prolongs the duration of a program that sorts an array. The overall overhead is found to be 0.5-1.5 % at 1000 Hz, linearly proportional to the tick rate, and steadily declining as the speed of processors increases. If the kernel is configured such that each tick is slowed down by an access to an external time source, then the direct overhead dominates. Otherwise, the relative weight of the indirect portion is steadily growing with processors ’ speed, accounting for up to 85 % of the total. The second methodology repeatedly executes a simplistic loop (calibrated to take 1ms), measures the actual execution time, and analyzes the perturbations. Some loop implementations yield results similar to the above, but others indicate that the overhead is actually an order of magnitude bigger, or worse. The phenomenon was observed on IA32, IA64, and Power processors, the latter being part of the ASC Purple supercomputer. Importantly, the effect is greatly amplified for parallel jobs, where one late thread holds up all its peers, causing a slowdown that is dominated by the per-node latency (numerator) and the job granularity (denominator). We trace the bizarre effect to an unexplained interrupt/loop interaction; the question of whether this hardware misfeature is experienced by real applications remains open.
Parallel Job Scheduling Under Dynamic Workloads
, 2003
"... Jobs that run on parallel systems that use gang scheduling for multiprogramming may interact with each other in various ways. These interactions are affected by system parameters such as the level of multiprogramming and the scheduling time quantum. A careful evaluation is therefore required in orde ..."
Abstract
- Add to MetaCart
Jobs that run on parallel systems that use gang scheduling for multiprogramming may interact with each other in various ways. These interactions are affected by system parameters such as the level of multiprogramming and the scheduling time quantum. A careful evaluation is therefore required in order to find parameter values that lead to optimal performance. We perform a detailed performance evaluation of three factors affecting scheduling systems running dynamic workloads: multiprogramming level, time quantum, and the use of backfilling for queue management --- and how they depend on offered load. Our evaluation is based on synthetic MPI applications running on a real cluster that actually implements the various scheduling schemes. Our results demonstrate the importance of both components of the gang-scheduling plus backfilling combination: gang scheduling reduces response time and slowdown, and backfilling allows doing so with a limited multiprogramming level. This is further improved by using flexible coscheduling rather than strict gang scheduling, as this reduces the constraints and allows for a denser packing.
Adaptive Memory Allocations in Clusters
- Jobs,” IEEE Trans. Parallel and Distributed Systems
, 2004
"... In a cluster system with dynamic load sharing support, a job submission or migration to a workstation is determined by the availability of CPU and memory resources of the workstation at the time [21]. In such a system, a small number of running jobs with unexpectedly large memory allocation requir ..."
Abstract
- Add to MetaCart
In a cluster system with dynamic load sharing support, a job submission or migration to a workstation is determined by the availability of CPU and memory resources of the workstation at the time [21]. In such a system, a small number of running jobs with unexpectedly large memory allocation requirements may significantly increase the queuing delay times of the rest of jobs with normal memory requirements, slowing down execution of each individual job and decreasing the system throughput. We call this phenomenon the job blocking problem because the big jobs block the execution pace of majority jobs in the cluster. Since the memory demand of jobs may not be known in advance and may change dynamically, the possibility of unsuitable job submissions/migrations to cause the blocking problem is high, and existing load sharing schemes are unable to effectively handle this problem. We propose two schemes to address this problem. The first scheme, Network RAM supported load sharing, combines job migrations with network RAM, which uses remote execution to initially allocate a job to the most lightly loaded workstation and, if necessary, network RAM to provide a global memory space for the job larger than it would be available otherwise. This scheme has the merits of both job migrations and network RAM. Our experiments show its effectiveness and scalability. However, this scheme requires a network RAM facility in the cluster, which may cause additional overhead and increase cluster network traffic. In order to address this limit, we propose a second scheme, memory reservation, incorporated with dynamic load sharing, which adaptively reserves a small set of workstations to provide special services to the jobs demanding large memory allocations. As soon as the blocking p...

