Results 1 -
2 of
2
Autopart: Automating schema design for large scientific databases using data partitioning
- In Proceedings of the 16th International Conference on Scientific and Statistical Database Management
, 2004
"... Database applications that use multi-terabyte datasets are becoming increasingly important for scientific fields such as astronomy and biology. Scientific databases are particularly suited for the application of automated physical design techniques, because of their data volume and the complexity of ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Database applications that use multi-terabyte datasets are becoming increasingly important for scientific fields such as astronomy and biology. Scientific databases are particularly suited for the application of automated physical design techniques, because of their data volume and the complexity of the scientific workloads. Current automated physical design tools focus on the selection of indexes and materialized views. In large-scale scientific databases, however, the data volume and the continuous insertion of new data allows for only limited indexes and materialized views. By contrast, data partitioning does not replicate data, thereby reducing space requirements and minimizing update overhead. In this paper we propose AutoPart, an algorithm that automatically partitions database tables to optimize sequential access assuming prior knowledge of a representative workload. The resulting schema is indexed using a fraction of the space required for indexing the original schema. To evaluate AutoPart, we build an automated schema design tool that interfaces to commercial database systems. We experiment with AutoPart in the context of the Sloan Digital Sky Survey database, a real-world astronomical database, running on SQL Server 2000. Our experiments corroborate the benefits of partitioning for large-scale systems: Partitioning alone improves query execution performance by a factor of two on average. Combined with indexes, the new schema also outperforms the indexed original schema by 20 % (for queries) and a factor of five (for updates), while using only half the original index space.
Query reformulation with constraints
- SIGMOD Record
"... Let Σ1, Σ2 be two schemas, which may overlap, C be a set of constraints on the joint schema Σ1 ∪ Σ2, and q1 be a Σ1-query. An (equivalent) reformulation of q1 in the presence of C is a Σ2-query, q2, such that q2 gives the same answers as q1 on ..."
Abstract
-
Cited by 20 (4 self)
- Add to MetaCart
Let Σ1, Σ2 be two schemas, which may overlap, C be a set of constraints on the joint schema Σ1 ∪ Σ2, and q1 be a Σ1-query. An (equivalent) reformulation of q1 in the presence of C is a Σ2-query, q2, such that q2 gives the same answers as q1 on

