Results 1 -
6 of
6
Measuring Interference Between Live Datacenter Applications
"... Abstract—Application interference is prevalent in datacenters due to contention over shared hardware resources. Unfortunately, understanding interference in live datacenters is more difficult than in controlled environments or on simpler architectures. Most approaches to mitigating interference rely ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Application interference is prevalent in datacenters due to contention over shared hardware resources. Unfortunately, understanding interference in live datacenters is more difficult than in controlled environments or on simpler architectures. Most approaches to mitigating interference rely on data that cannot be collected efficiently in a production environment. This work exposes eight specific complexities of live datacenters that constrain measurement of interference. It then introduces new, generic measurement techniques for analyzing interference in the face of these challenges and restrictions. We use the measurement techniques to conduct the first large-scale study of application interference in live production datacenter workloads. Data is measured across 1000 12-core Google servers observed to be running 1102 unique applications. Finally, our work identifies several opportunities to improve performance that use only the available data; these opportunities are applicable to any datacenter. I.
Thread reinforcer: Dynamically determining number of threads via os level monitoring
- in Workload Characterization (IISWC), 2011 IEEE International Symposium on. IEEE, 2011
"... Abstract—It is often assumed that to maximize the performance of a multithreaded application, the number of threads created should equal the number of cores. While this may be true for systems with four or eight cores, this is not true for systems with larger number of cores. Our experiments with PA ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
(Show Context)
Abstract—It is often assumed that to maximize the performance of a multithreaded application, the number of threads created should equal the number of cores. While this may be true for systems with four or eight cores, this is not true for systems with larger number of cores. Our experiments with PARSEC programs on a 24-core machine demonstrate this. Therefore, dynamically determining the appropriate number of threads for a multithreaded application is an important unsolved problem. In this paper we develop a simple technique for dynamically determining appropriate number of threads without recompiling the application or using complex compilation techniques or modifying Operating System policies. We first present a scalability study of eight programs from PARSEC conducted on a 24 core Dell PowerEdge R905 server running OpenSolaris.2009.06 for numbers of threads ranging from a few threads to 128 threads. Our study shows that not only does the maximum speedup achieved by these programs vary widely (from 3.6x to 21.9x), the number of threads that produce maximum speedups also vary widely (from 16 to 63 threads). By understanding the overall speedup behavior of these programs we identify the critical Operating System level factors that explain why the speedups vary with the number of threads. As an application of these observations, we develop a framework called “Thread Reinforcer” that dynamically monitors program’s execution to search for the number of threads that are likely to yield best speedups. Thread Reinforcer identifies optimal or near optimal number of threads for most of the PARSEC programs studied and as well as for SPEC OMP and PBZIP2 programs. I.
A ADAPT: A Framework for Coscheduling Multithreaded Programs
"... Since multicore systems offer greater performance via parallelism, future computing is progressing towards use of multicore machines with large number of cores. However, the performance of emerging multithreaded programs often does not scale to fully utilize the available cores. Therefore, simultane ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Since multicore systems offer greater performance via parallelism, future computing is progressing towards use of multicore machines with large number of cores. However, the performance of emerging multithreaded programs often does not scale to fully utilize the available cores. Therefore, simultaneously running multiple multithreaded applications becomes inevitable to fully exploit such machines. However, multicore machines pose a challenge for the OS with respect to maximizing performance and throughput in the presence of multiple multithreaded programs. We have observed that the state-of-the-art contention management algorithms fail to effectively coschedule multithreaded programs on multicore machines. To address the above challenge, we present ADAPT, a scheduling framework that continuously monitors the resource usage of multithreaded programs and adaptively coschedules them such that they interfere with each other’s performance as little as possible. In addition, it adaptively selects appropriate memory allocation and scheduling policies according to the workload characteristics. We have implemented ADAPT on a 64-core Supermicro server running Solaris 11 and evaluated it using 26 multithreaded programs including the TATP database application, SPECjbb2005, programs from Phoenix, PARSEC, and SPEC OMP. The experimental results show that ADAPT substantially improves total turnaround time and system utilization relative to the default Solaris 11 scheduler.
A Thread Tranquilizer: Dynamically Reducing Performance Variation
"... To realize the performance potential of multicore systems, we must effectively manage the interactions between memory reference behaviour and the operating system policies for thread scheduling and migration decisions. We observe that these interactions lead to significant variations in the performa ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
To realize the performance potential of multicore systems, we must effectively manage the interactions between memory reference behaviour and the operating system policies for thread scheduling and migration decisions. We observe that these interactions lead to significant variations in the performance of a given application, from one execution to the next, even when the program input remains unchanged and no other applications are being run on the system. Our experiments with multithreaded programs, including the TATP database application, SPECjbb2005, and a subset of PARSEC and SPEC OMP programs, on a 24-core Dell PowerEdge R905 server running OpenSolaris confirms the above observation. In this work we develop Thread Tranquilizer, an automatic technique for simultaneously reducing performance variation and improving performance by dynamically choosing appropriate memory allocation and process scheduling policies. Thread Tranquilizer uses simple utilities available on modern Operating Systems for monitoring cache misses and thread context-switches and then utilizes the collected information to dynamically select appropriate memory allocation and scheduling policies. In our experiments, Thread Tranquilizer yields up to 98 % (average 68%) reduction in performance variation and up to 43 % (average 15%) improvement in performance over default policies of OpenSolaris. We also demonstrate that Thread Tranquilizer simultaneously reduces performance variation and improves performance of the programs on Linux. Thread Tranquilizer is easy to use as it does not require any changes to the application source code or the OS kernel.
4Survey of Scheduling Techniques for Addressing Shared Resources in Multicore Processors
"... Chip multicore processors (CMPs) have emerged as the dominant architecture choice for modern comput-ing platforms and will most likely continue to be dominant well into the foreseeable future. As with any system, CMPs offer a unique set of challenges. Chief among them is the shared resource contenti ..."
Abstract
- Add to MetaCart
Chip multicore processors (CMPs) have emerged as the dominant architecture choice for modern comput-ing platforms and will most likely continue to be dominant well into the foreseeable future. As with any system, CMPs offer a unique set of challenges. Chief among them is the shared resource contention that results because CMP cores are not independent processors but rather share common resources among cores such as the last level cache (LLC). Shared resource contention can lead to severe and unpredictable perfor-mance impact on the threads running on the CMP. Conversely, CMPs offer tremendous opportunities for mulithreaded applications, which can take advantage of simultaneous thread execution as well as fast inter thread data sharing. Many solutions have been proposed to deal with the negative aspects of CMPs and take advantage of the positive. This survey focuses on the subset of these solutions that exclusively make use of OS thread-level scheduling to achieve their goals. These solutions are particularly attractive as they require no changes to hardware and minimal or no changes to the OS. The OS scheduler has expanded well beyond its original role of time-multiplexing threads on a single core into a complex and effective resource manager. This article surveys a multitude of new and exciting work that explores the diverse new roles the OS scheduler can successfully take on.
Runtime Support For Maximizing Performance on Multicore Systems
, 2012
"... First and foremost I would like to sincerely thank my advisor, Dr. Rajiv Gupta, who was always there for me and shaped my research in many ways. His enthusiasm in research and hard working nature were instrumental in enabling my research to make the progress which it has made. I am particularly grat ..."
Abstract
- Add to MetaCart
(Show Context)
First and foremost I would like to sincerely thank my advisor, Dr. Rajiv Gupta, who was always there for me and shaped my research in many ways. His enthusiasm in research and hard working nature were instrumental in enabling my research to make the progress which it has made. I am particularly grateful for all the freedom he gave me in selecting research problems and his seemingly never-ending trust in my potential. Next, I would like to thank the members of my dissertation committee, Dr. Laxmi N. Bhuyan and Dr. Walid Najjar for reviewing this dissertation. Their extensive and constructive comments have been very helpful in improving this dissertation. I was fortunate enough to do various internships during the course of my Ph.D.