Results 1 - 10
of
32
Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems
- In VLDB
, 2004
"... We consider the problem of horizontally partitioning a dynamic relation across a large number of disks/nodes by the use of range partitioning. Such partitioning is often desirable in large-scale parallel databases, as well as in peer-to-peer (P2P) systems. As tuples are inserted and deleted... ..."
Abstract
-
Cited by 77 (3 self)
- Add to MetaCart
We consider the problem of horizontally partitioning a dynamic relation across a large number of disks/nodes by the use of range partitioning. Such partitioning is often desirable in large-scale parallel databases, as well as in peer-to-peer (P2P) systems. As tuples are inserted and deleted...
Integrating vertical and horizontal partitioning into automated physical database design
- In Proceedings of ACM SIGMOD
, 2004
"... In addition to indexes and materialized views, horizontal and vertical partitioning are important aspects of physical design in a relational database system that significantly impact performance. Horizontal partitioning also provides manageability; database administrators often require indexes and t ..."
Abstract
-
Cited by 48 (6 self)
- Add to MetaCart
In addition to indexes and materialized views, horizontal and vertical partitioning are important aspects of physical design in a relational database system that significantly impact performance. Horizontal partitioning also provides manageability; database administrators often require indexes and their underlying tables partitioned identically so as to make common operations such as backup/restore easier. While partitioning is important, incorporating partitioning makes the problem of automating physical design much harder since: (a) The choices of partitioning can strongly interact with choices of indexes and materialized views. (b) A large new space of physical design alternatives must be considered. (c) Manageability requirements impose a new constraint on the problem. In this paper, we present novel techniques for designing a scalable solution to this integrated physical design problem that takes both performance and manageability into account. We have implemented our techniques and evaluated it on Microsoft SQL Server. Our experiments highlight: (a) the importance of taking an integrated approach to automated physical design and (b) the scalability of our techniques. 1.
Database Tuning Advisor for Microsoft SQL Server 2005
- In Proceedings of VLDB
, 2004
"... The Database Tuning Advisor (DTA) that is part of Microsoft SQL Server 2005 is an automated physical database design tool that significantly advances the state-of-the-art in several ways. First, DTA is capable to providing an integrated physical design recommendation for horizontal partitioning, ind ..."
Abstract
-
Cited by 36 (4 self)
- Add to MetaCart
The Database Tuning Advisor (DTA) that is part of Microsoft SQL Server 2005 is an automated physical database design tool that significantly advances the state-of-the-art in several ways. First, DTA is capable to providing an integrated physical design recommendation for horizontal partitioning, indexes, and materialized views. Second, unlike today’s physical design tools that focus solely on performance, DTA also supports the capability for a database administrator (DBA) to specify manageability requirements while optimizing for performance. Third, DTA is able to scale to large databases and workloads using several novel techniques including: (a) workload compression (b) reduced statistics creation and (c) exploiting test server to reduce load on production server. Finally, DTA greatly enhances scriptability and customization through the use of a public XML schema for input and output. This paper provides an overview of DTA’s novel functionality, the rationale for its architecture, and demonstrates DTA’s quality and scalability on large customer workloads. 1.
Automatic physical design tuning: workload as a sequence
- In Proceedings of the ACM International Conference on Management of Data (SIGMOD
, 2006
"... The area of automatic selection of physical database design to optimize the performance of a relational database system based on a workload of SQL queries and updates has gained prominence in recent years. Major database vendors have released automated physical database design tools with the goal of ..."
Abstract
-
Cited by 21 (0 self)
- Add to MetaCart
The area of automatic selection of physical database design to optimize the performance of a relational database system based on a workload of SQL queries and updates has gained prominence in recent years. Major database vendors have released automated physical database design tools with the goal of reducing the total cost of ownership. An important assumption underlying these tools is that the workload is a set of SQL statements. In this paper, we show that being able to treat the workload as a sequence, i.e., exploiting the ordering of statements can significantly broaden the usage of such tools. We present scenarios where exploiting sequence information in the workload is crucial for performance tuning. We also propose techniques for addressing the technical challenges arising from treating the workload as a sequence. We evaluate the effectiveness of our techniques through experiments on Microsoft SQL Server.
Schism: a Workload-Driven Approach to Database Replication and Partitioning
"... We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of sharednothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitione ..."
Abstract
-
Cited by 14 (4 self)
- Add to MetaCart
We present Schism, a novel workload-aware approach for database partitioning and replication designed to improve scalability of sharednothing distributed databases. Because distributed transactions are expensive in OLTP settings (a fact we demonstrate through a series of experiments), our partitioner attempts to minimize the number of distributed transactions, while producing balanced partitions. Schism consists of two phases: i) a workload-driven, graph-based replication/partitioning phase and ii) an explanation and validation phase. The first phase creates a graph with a node per tuple (or group of tuples) and edges between nodes accessed by the same transaction, and then uses a graph partitioner to split the graph into k balanced partitions that minimize the number of cross-partition transactions. The second phase exploits machine learning techniques to find a predicate-based explanation of the partitioning strategy (i.e., a set of range predicates that represent the same replication/partitioning scheme produced by the partitioner). The strengths of Schism are: i) independence from the schema layout, ii) effectiveness on n-to-n relations, typical in social network databases, iii) a unified and fine-grained approach to replication and partitioning. We implemented and tested a prototype of Schism on a wide spectrum of test cases, ranging from classical OLTP workloads (e.g., TPC-C and TPC-E), to more complex scenarios derived from social network websites (e.g., Epinions.com), whose schema contains multiple n-to-n relationships, which are known to be hard to partition. Schism consistently outperforms simple partitioning schemes, and in some cases proves superior to the best known manual partitioning, reducing the cost of distributed transactions up to 30%. 1.
Toward Autonomic Computing with DB2 Universal Database
- SIGMOD Record
, 2002
"... As the cost of both hardware and software falls due to technological advancements and economies of scale, the cost of ownership for database applications is increasingly dominated by the cost of people to manage them. Databases are growing rapidly in scale and complexity, while skilled database admi ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
As the cost of both hardware and software falls due to technological advancements and economies of scale, the cost of ownership for database applications is increasingly dominated by the cost of people to manage them. Databases are growing rapidly in scale and complexity, while skilled database administrators (DBAs) are becoming rarer and more expensive. This paper describes the self-managing or autonomic technology in IBM’s DB2 Universal Database ® for UNIX and Windows to illustrate how self-managing technology can reduce complexity, helping to reduce the total cost of ownership (TCO) of DBMSs and improve system performance. 1.
Today’s DBMSs: How Autonomic Are They
- Proc. First International Autonomic Systems Workshop, DEXA 2003
, 2003
"... Database Management Systems (DBMSs) are complex systems whose manageability is increasingly becoming a real concern. Realizing that expert Database Administrators (DBAs) are scarce and that the cost of hiring them is a major part of the Total Cost of Ownership (TCO) makes an urgent call for an Auton ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Database Management Systems (DBMSs) are complex systems whose manageability is increasingly becoming a real concern. Realizing that expert Database Administrators (DBAs) are scarce and that the cost of hiring them is a major part of the Total Cost of Ownership (TCO) makes an urgent call for an Autonomic DBMS (ADBMS) that is capable of managing and maintaining itself. In this paper, we examine the characteristics that a DBMS should have in order to be considered autonomic. We assess the position of today’s DBMSs by drawing example features from popular, commercial database products, such as DB2 UDB, SQL Server, and Oracle. We argue that today's DBMSs are still far from being autonomic. We highlight the source of difficulties towards achieving that goal, and sketch the most important research terrains that need investigation in order to have ADBMSs one day. 1.
Efficient use of the query optimizer for automated physical design
- In Proceedings of the International Conference on Very Large Databases (VLDB
, 2007
"... State-of-the-art database design tools rely on the query optimizer for comparing between physical design alternatives. Although it provides an appropriate cost model for physical design, query optimization is a computationally expensive process. The significant time consumed by optimizer invocations ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
State-of-the-art database design tools rely on the query optimizer for comparing between physical design alternatives. Although it provides an appropriate cost model for physical design, query optimization is a computationally expensive process. The significant time consumed by optimizer invocations poses serious performance limitations for physical design tools, causing long running times, especially for large problem instances. So far it has been impossible to remove query optimization overhead without sacrificing cost estimation precision. Inaccuracies in query cost estimation are detrimental to the quality of physical design algorithms, as they increase the chances of “missing ” good designs and consequently selecting sub-optimal ones. Precision loss and the resulting reduction in solution quality is particularly undesirable and it is the reason the query optimizer is used in the first place. In this paper we eliminate the tradeoff between query cost estimation accuracy and performance. We introduce the INdex Usage Model (INUM), a cost estimation technique that returns the same values that would have been returned by the optimizer, while being three orders of magnitude faster. Integrating INUM with existing index selection algorithms dramatically improves their running times without precision compromises. 1.
Efficient and Robust Database Support for Data-Intensive Applications in Dynamic Environments
"... Abstract — Requirements from new types of applications call for new database system solutions. Computational science applications performing distributed computations on Grid networks with requirements for efficient storage and query solutions are now emerging. For this purpose we have developed DASC ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Abstract — Requirements from new types of applications call for new database system solutions. Computational science applications performing distributed computations on Grid networks with requirements for efficient storage and query solutions are now emerging. For this purpose we have developed DASCOSA-DB, a P2P-based distributed database system, which in addition to providing location-transparent storage and querying, also includes novel features like efficient partial restart of queries and redistribution of query operators in the context of failure, dynamic refragmentation and allocation, and distributed semantic caching. In this demo, the novel features will be demonstrated, combined with a more general description of the architecture and demonstration of the distributed query processing capabilities. I.
SMART: Making DB2 (More) Autonomic
- In VLDB 2002 28th International Conference on Very Large Data Bases , Kowloon Shangri-La Hotel, Hong Kong
, 2002
"... IBM's SMART (Self-Managing And Resource Tuning) project aims to make DB2 selfmanaging, i.e. autonomic, to decrease the total cost of ownership and penetrate new markets. ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
IBM's SMART (Self-Managing And Resource Tuning) project aims to make DB2 selfmanaging, i.e. autonomic, to decrease the total cost of ownership and penetrate new markets.

