Results 1 -
9 of
9
Automated Experiment-Driven Management of (Database) Systems
- In Proceedings of 12th Workshop on Hot Topics in Operating Systems
, 2009
"... In this position paper, we argue that an important piece of the system administration puzzle has largely been left untouched by researchers. This piece involves mechanisms and policies to identify as well as collect missing instrumentation data; the missing data is essential to generate the knowledg ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
In this position paper, we argue that an important piece of the system administration puzzle has largely been left untouched by researchers. This piece involves mechanisms and policies to identify as well as collect missing instrumentation data; the missing data is essential to generate the knowledge required to address certain administrative tasks satisfactorily and efficiently. We introduce the paradigm of experiment-driven management which encapsulates such mechanisms and policies for a given administrative task. We outline the benefits that automated experiment-driven management brings to several long-standing problems in databases as well as other systems, and identify research challenges as well as initial solutions. 1
Query Interactions in Database Workloads
"... Database workloads consist of mixes of queries that run concurrently and interact with each other. In this paper, we demonstrate that query interactions can have a significant impact on database system performance. Hence, we argue that it is important to take these interactions into account when cha ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Database workloads consist of mixes of queries that run concurrently and interact with each other. In this paper, we demonstrate that query interactions can have a significant impact on database system performance. Hence, we argue that it is important to take these interactions into account when characterizing workloads, designing test cases, or developing performance tuning algorithms for database systems. To capture and model query interactions, we propose using an experimental approach that is based on sampling the space of possible interactions and fitting statistical models to the sampled data. We discuss using such an approach for database testing and tuning, and we present some opportunities and research challenges. 1.
Performance Prediction for Concurrent Database Workloads
"... Current trends in data management systems, such as cloud and multi-tenant databases, are leading to data processing environments that concurrently execute heterogeneous query workloads. At the same time, these systems need to satisfy diverse performance expectations. In these newly-emerging settings ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Current trends in data management systems, such as cloud and multi-tenant databases, are leading to data processing environments that concurrently execute heterogeneous query workloads. At the same time, these systems need to satisfy diverse performance expectations. In these newly-emerging settings, avoiding potential Quality-of-Service (QoS) violations heavily relies on performance predictability, i.e., the ability to estimate the impact of concurrent query execution on the performance of individual queries in a continuously evolving workload. This paper presents a modeling approach to estimate the impact of concurrency on query performance for analytical workloads. Our solution relies on the analysis of query behavior in isolation, pairwise query interactions and sampling techniques to predict resource contention under various query mixes and concurrency levels. We introduce a simple yet powerful metric that accurately captures the joint effects of disk and memory contention on query performance in a single value. We also discuss predicting the execution behavior of a time-varying query workload through queryinteraction timelines, i.e., a fine-grained estimation of the time segments during which discrete mixes will be executed concurrently. Our experimental evaluation on top of PostgreSQL/TPC-H demonstrates that our models can provide query latency predictions within approximately 20 % of the actual values in the average case.
Avoiding Bad Query Mixes to Minimize Unsuccessful Client Requests,” M.Math Thesis
, 2009
"... c ○ Sean Tozer 2009I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. In three-tiered web applications, some ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
c ○ Sean Tozer 2009I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. In three-tiered web applications, some form of admission control is required to ensure that throughput and response times are not significantly harmed during periods of heavy load. We propose Q-Cop, a prototype system for improving admission control decisions that computes measures of load on the system based on the actual mix of queries being executed. This measure of load is used to estimate execution times for incoming queries, which allows Q-Cop to make control decisions with the goal of minimizing the number of requests that are not serviced before the client, or their browser, times out. Using TPC-W queries, we show that the response times of different types of queries can vary significantly, in excess of 50 % in our experiments, depending not just on the number of queries being processed but on the mix of other queries that are running simultaneously. The variation implies that admission control can benefit from taking into
Synergy-based Workload Management
"... Workload management aims at the efficient execution of queries on a database. In this context, scheduling plays a crucial role. A vast number of scheduling approaches have been developed, most of them belonging to one of two categories: analysis and monitoring. However, they mainly either focus only ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Workload management aims at the efficient execution of queries on a database. In this context, scheduling plays a crucial role. A vast number of scheduling approaches have been developed, most of them belonging to one of two categories: analysis and monitoring. However, they mainly either focus only on one possible kind of impact of queries on each other’s execution time, or require an offline phase for information gathering. In contrast, the approach we pursue does not require any offline phase and flexibly adapts to any database system or hardware configuration. It bases on the fact that the multiple requests that are executed concurrently in a database for performance purposes may have a positive impact on the execution of each other, e.g. due to caching or complementary resource consumption, or impede each other’s execution, e.g. in the case of resource contention. Both kinds of impacts are reflected by the execution time of the workload. We apply a monitoring approach to derive those impacts – called synergies – between request types fully automated at runtime from measured execution times. Thereby, our approach works completely independent from changing synergies or configurations and easily handles new query types. 1.
Interaction-Aware Scheduling of Report Generation Workloads
"... Abstract The typical workload in a database system consists of a mix of multiple queries of different types that run concurrently. Interactions among the different queries in a query mix can have a significant impact on database performance. Hence, optimizing database performance requires reasoning ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract The typical workload in a database system consists of a mix of multiple queries of different types that run concurrently. Interactions among the different queries in a query mix can have a significant impact on database performance. Hence, optimizing database performance requires reasoning about query mixes rather than considering queries individually. Current database systems lack the ability to do such reasoning. We propose a new approach based on planning experiments and statistical modeling to capture the impact of query interactions. Our approach requires no prior assumptions about the internal workings of the database system or the nature and cause of query interactions; making it portable across systems. To demonstrate the potential of modeling and exploiting query interactions, we have developed a novel interactionaware query scheduler for report-generation workloads. Our scheduler, called QShuffler, uses two query scheduling algorithms that leverage models of query interactions. The first algorithm is optimized for workloads where queries are submitted in large batches. The second algorithm targets workloads where queries arrive continuously, and scheduling decisions have to be made on-line. We report an experimental evaluation of QShuffler using TPC-H workloads running on IBM DB2. The evaluation shows that QShuffler, by modeling and exploiting query interactions, can consistently out-
Learning-based Query Performance Modeling and Prediction
"... Abstract — Accurate query performance prediction (QPP) is central to effective resource management, query optimization and query scheduling. Analytical cost models, used in current generation of query optimizers, have been successful in comparing the costs of alternative query plans, but they are po ..."
Abstract
- Add to MetaCart
Abstract — Accurate query performance prediction (QPP) is central to effective resource management, query optimization and query scheduling. Analytical cost models, used in current generation of query optimizers, have been successful in comparing the costs of alternative query plans, but they are poor predictors of execution latency. As a more promising approach to QPP, this paper studies the practicality and utility of sophisticated learningbased models, which have recently been applied to a variety of predictive tasks with great success, in both static (i.e., fixed) and dynamic query workloads. We propose and evaluate predictive modeling techniques that learn query execution behavior at different granularities, ranging from coarse-grained plan-level models to fine-grained operatorlevel models. We demonstrate that these two extremes offer a tradeoff between high accuracy for static workload queries and generality to unforeseen queries in dynamic workloads, respectively, and introduce a hybrid approach that combines their respective strengths by selectively composing them in the process of QPP. We discuss how we can use a training workload to (i) pre-build and materialize such models offline, so that they are readily available for future predictions, and (ii) build new models online as new predictions are needed. All prediction models are built using only static features (available prior to query execution) and the performance values obtained from the offline execution of the training workload. We fully implemented all these techniques and extensions on top of PostgreSQL and evaluated them experimentally by quantifying their effectiveness over analytical workloads, represented by well-established TPC-H data and queries. The results provide quantitative evidence that learning-based modeling for QPP is both feasible and effective for both static and dynamic workload scenarios. I.
A Generic Auto-Provisioning Framework for Cloud Databases
"... Abstract — We discuss the problem of resource provisioning for database management systems operating on top of an Infrastructure-As-A-Service (IaaS) cloud. To solve this problem, we describe an extensible framework that, given a target query workload, continually optimizes the system’s operational c ..."
Abstract
- Add to MetaCart
Abstract — We discuss the problem of resource provisioning for database management systems operating on top of an Infrastructure-As-A-Service (IaaS) cloud. To solve this problem, we describe an extensible framework that, given a target query workload, continually optimizes the system’s operational cost, estimated based on the IaaS provider’s pricing model, while satisfying QoS expectations. Specifically, we describe two different approaches, a “white-box ” approach that uses a fine-grained estimation of the expected resource consumption for a workload, and a “black-box ” approach that relies on coarse-grained profiling to characterize the workload’s end-to-end performance across various cloud resources. We formalize both approaches as a constraint programming problem and use a generic constraint solver to efficiently tackle them. We present preliminary experimental numbers, obtained by running TPC-H queries with PostsgreSQL on Amazon’s EC2, that provide evidence of the feasibility and utility of our approaches. We also briefly discuss the pertinent challenges and directions of on-going research. I.
The Case for Predictive Database Systems: Opportunities and Challenges
"... This paper argues that next generation database management systems should incorporate a predictive model management component to effectively support both inward-facing applications, such as self management, and user-facing applications such as data-driven predictive analytics. We draw an analogy bet ..."
Abstract
- Add to MetaCart
This paper argues that next generation database management systems should incorporate a predictive model management component to effectively support both inward-facing applications, such as self management, and user-facing applications such as data-driven predictive analytics. We draw an analogy between model management and data management functionality and discuss how model management can leverage profiling, physical design and query optimization techniques, as well as the pertinent challenges. We then describe the early design and architecture of Longview, a predictive DBMS prototype that we are building at Brown, along with a case study of how models can be used to predict query execution performance. 1.

