Results 1 - 10
of
10
Informed Data Distribution Selection in a Self-Predicting Storage System
- In International Conference on Autonomic Computing
, 2006
"... Systems should be self-predicting. They should continuously monitor themselves and provide quantitative answers to What...if questions about hypothetical workload or resource changes. Self-prediction would significantly simplify administrators' decision making, such as acquisition planning and perfo ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Systems should be self-predicting. They should continuously monitor themselves and provide quantitative answers to What...if questions about hypothetical workload or resource changes. Self-prediction would significantly simplify administrators' decision making, such as acquisition planning and performance tuning, by reducing the detailed workload and internal system knowledge required. This paper describes and evaluates support for self-prediction in a cluster-based storage system and its application to What...if questions about data distribution selection.
Automatic virtual machine configuration for database workloads
- In SIGMOD
, 2008
"... Virtual machine monitors are becoming popular tools for the deployment of database management systems and other enterprise software applications. In this paper, we consider a common resource consolidation scenario, in which several database management system instances, each running in a virtual mach ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
Virtual machine monitors are becoming popular tools for the deployment of database management systems and other enterprise software applications. In this paper, we consider a common resource consolidation scenario, in which several database management system instances, each running in a virtual machine, are sharing a common pool of physical computing resources. We address the problem of optimizing the performance of these database management systems by controlling the configurations of the virtual machines in which they run. These virtual machine configurations determine how the shared physical resources will be allocated to the different database instances. We introduce a virtualization design advisor that uses information about the anticipated workloads of each of the database systems to recommend workload-specific configurations offline. Furthermore, runtime information collected after the deployment of the recommended configurations can be used to refine the recommendation. To estimate the effect of a particular resource allocation on workload performance, we use the query optimizer in a new what-if mode. We have implemented our approach using both PostgreSQL and DB2, and we have experimentally evaluated its effectiveness using DSS and OLTP workloads.
IRONModel: robust performance models in the wild
- ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (Annapolis, MD, 02– 06 June 2008
, 2008
"... Traditional performance models are too brittle to be relied on for continuous capacity planning and performance debugging in many computer systems. Simply put, a brittle model is often inaccurate and incorrect. We find two types of reasons why a model’s prediction might diverge from the reality: (1) ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Traditional performance models are too brittle to be relied on for continuous capacity planning and performance debugging in many computer systems. Simply put, a brittle model is often inaccurate and incorrect. We find two types of reasons why a model’s prediction might diverge from the reality: (1) the underlying system might be misconfigured or buggy or (2) the model’s assumptions might be incorrect. The extra effort of manually finding and fixing the source of these discrepancies, continuously, in both the system and model, is one reason why many system designers and administrators avoid using mathematical models altogether. Instead, they opt for simple, but often inaccurate, “rules-of-thumb”. This paper describes IRONModel, a robust performance modeling architecture. Through studying performance anomalies encountered in an experimental cluster-based storage system, we analyze why and how models and actual system implementations get out-of-sync. Lessons learned from that study are incorporated into IRONModel. IRONModel leverages the redundancy of high-level system specifications described through models and low-level system implementation to localize many types of system-model inconsistencies. IRONModel can guide designers to the potential source of the discrepancy, and, if appropriate, can semi-automatically evolve the models to handle unanticipated inputs.
Database virtualization: A new frontier for database tuning and physical design
- in Proceedings of ICDE Workshops (SMDB 2007
, 2007
"... Resource virtualization is currently being employed at all levels of the IT infrastructure to improve provisioning and manageability, with the goal of reducing total cost of ownership. This means that database systems will increasingly be run in virtualized environments, inside virtual machines. Thi ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
Resource virtualization is currently being employed at all levels of the IT infrastructure to improve provisioning and manageability, with the goal of reducing total cost of ownership. This means that database systems will increasingly be run in virtualized environments, inside virtual machines. This has many benefits, but it also introduces new tuning and physical design problems that are of interest to the database research community. In this paper, we discuss how virtualization can benefit database systems, and we present the tuning problems it introduces, which relate to setting the new “tuning knobs ” that control resource allocation to virtual machines in the virtualized environment. We present a formulation of the virtualization design problem, which focuses on setting resource allocation levels for different database workloads statically at deployment and configuration time. An important component of the solution to this problem is modeling the cost of a workload for a given resource allocation. We present an approach to this cost modeling that relies on using the query optimizer in a special virtualization-aware “what-if ” mode. We also discuss the next steps in solving this problem, and present some long-term research directions. 1.
Early Experiences on the Journey Towards Self-* Storage
- IEEE Data Eng. Bulletin
"... Self-* systems are self-organizing, self-configuring, self-healing, self-tuning and, in general, selfmanaging. Ursa Minor is a large-scale storage infrastructure being designed and deployed at Carnegie Mellon University, with the goal of taking steps towards the self-* ideal. This paper discusses ou ..."
Abstract
- Add to MetaCart
Self-* systems are self-organizing, self-configuring, self-healing, self-tuning and, in general, selfmanaging. Ursa Minor is a large-scale storage infrastructure being designed and deployed at Carnegie Mellon University, with the goal of taking steps towards the self-* ideal. This paper discusses our early experiences with one specific aspect of storage management: performance tuning and projection. Ursa Minor uses self-monitoring and rudimentary system modeling to support analysis of how system changes would affect performance, exposing simple What...if query interfaces to administrators and tuning agents. We find that most performance predictions are sufficiently accurate (within 10-20%) and that the associated performance overhead is less than 6%. Such embedded support for What...if queries simplifies tuning automation and reduces the administrator expertise needed to make acquisition decisions.
Microsoft Corporation and
"... Current businesses rely heavily on efficient access to their databases. Manual tuning of these database systems by performance experts is increasingly infeasible: For small companies, hiring an expert may be too expensive; for large enterprises, even an expert may not fully understand the interactio ..."
Abstract
- Add to MetaCart
Current businesses rely heavily on efficient access to their databases. Manual tuning of these database systems by performance experts is increasingly infeasible: For small companies, hiring an expert may be too expensive; for large enterprises, even an expert may not fully understand the interaction between a large system and its multiple changing workloads. This trend has led major vendors to offer tools that automatically and dynamically tune a database system. Many database tuning knobs concern the buffer pool for caching data and disk pages. Specifically, these knobs control the buffer allocation and thus the cache miss probability, which has direct impact on performance. Previous methods for automatic buffer tuning are based on simulation, black-box control, gradient descent, and empirical equations. This article presents a new approach, using calculations with an analytically-derived equation that relates miss probability to buffer allocation; this equation fits four buffer replacement policies, as well as twelve datasets from mainframes running commercial databases in large corporations. The equation identifies a buffer-size limit that is useful for buffer tuning and powering down idle buffers. It can also replace simulation in predicting I/O costs. Experiments with PostgreSQL
Sharing DBMS among Multiple Users while Providing Performance Isolation: Analysis and Implementation
, 2008
"... Scarecrow: ThD? ..."
A Generic Auto-Provisioning Framework for Cloud Databases
"... Abstract — We discuss the problem of resource provisioning for database management systems operating on top of an Infrastructure-As-A-Service (IaaS) cloud. To solve this problem, we describe an extensible framework that, given a target query workload, continually optimizes the system’s operational c ..."
Abstract
- Add to MetaCart
Abstract — We discuss the problem of resource provisioning for database management systems operating on top of an Infrastructure-As-A-Service (IaaS) cloud. To solve this problem, we describe an extensible framework that, given a target query workload, continually optimizes the system’s operational cost, estimated based on the IaaS provider’s pricing model, while satisfying QoS expectations. Specifically, we describe two different approaches, a “white-box ” approach that uses a fine-grained estimation of the expected resource consumption for a workload, and a “black-box ” approach that relies on coarse-grained profiling to characterize the workload’s end-to-end performance across various cloud resources. We formalize both approaches as a constraint programming problem and use a generic constraint solver to efficiently tackle them. We present preliminary experimental numbers, obtained by running TPC-H queries with PostsgreSQL on Amazon’s EC2, that provide evidence of the feasibility and utility of our approaches. We also briefly discuss the pertinent challenges and directions of on-going research. I.
A Bayesian Approach to Online Performance Modeling for Database Appliances using Gaussian Models
"... In order to meet service level agreements (SLAs) and to maintain peak performance for database management systems (DBMS), database administrators (DBAs) need to implement policies for effective workload scheduling, admission control, and resource provisioning. Accurately predicting response times of ..."
Abstract
- Add to MetaCart
In order to meet service level agreements (SLAs) and to maintain peak performance for database management systems (DBMS), database administrators (DBAs) need to implement policies for effective workload scheduling, admission control, and resource provisioning. Accurately predicting response times of DBMS queries is necessary for a DBA to effectively achieve these goals. This task is particularly challenging due to the fact that a database workload typically consists of many concurrently running queries and an accurate model needs to capture their interactions. Additional challenges are introduced when DBMSes are run in dynamic cloud computing environments, where workload, data, and physical resources can change frequently, on-the-fly. Building an efficient and highly accurate online DBMS performance model that is robust in the face of changing workloads, data evolution, and physical resource allocations is still an unsolved problem. In this work, our goal is to build such an online performance model for database appliances using an experiment-driven modeling approach. We use a Bayesian approach and build novel Gaussian models that take into account the interaction among concurrently executing queries and predict response times of individual DBMS queries. A key feature of our modeling approach is that the models can be updated online in response to new queries or data, or changing resource allocations. We experimentally demonstrate that our models are accurate and effective – our best models have an average prediction error of 16.3 % in the worst case. 1

