Results 1 -
6 of
6
Iterative-Improvement-Based Declustering Heuristics For Multi-Disk Databases
, 2005
"... Data declustering is an important issue for reducing query response times in multi-disk database systems. In this paper, we propose a declustering method that utilizes the available information on query distribution, data distribution, data-item sizes, and disk capacity constraints. The proposed met ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Data declustering is an important issue for reducing query response times in multi-disk database systems. In this paper, we propose a declustering method that utilizes the available information on query distribution, data distribution, data-item sizes, and disk capacity constraints. The proposed method exploits the natural correspondence between a data set with a given query distribution and a hypergraph. We define an objective function that exactly represents the aggregate parallel query-response time for the declustering problem and adapt the iterative-improvement-based heuristics successfully used in hypergraph partitioning to this objective function. We propose a two-phase algorithm that first obtains an initial K-way declustering by recursively bipartitioning the data set, then applies multi-way refinement on this declustering. We provide effective gain models and efficient implementation schemes for both phases. The experimental results on a wide range of realistic data sets show that the proposed method provides a significant performance improvement compared with the state-of-the-art declustering strategy based on similarity-graph partitioning. Author Keywords: Parallel database systems
Efficient and Robust Node-Partitioned Data Warehouses 203 Chapter IX Efficient and Robust Node-Partitioned Data Warehouses
"... Running large data warehouses (DWs) efficiently over low cost platforms places special requirements on the design of system architecture. The idea is to have the DW on a set of low-cost nodes in a nondedicated local area network (LAN). Nodes can run any relational database engine, and the system rel ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Running large data warehouses (DWs) efficiently over low cost platforms places special requirements on the design of system architecture. The idea is to have the DW on a set of low-cost nodes in a nondedicated local area network (LAN). Nodes can run any relational database engine, and the system relies on a partitioning strategy and query processing middle layer. These characteristics are in contrast with typical parallel database systems, which rely on fast dedicated interconnects and hardware, as well as a specialized parallel query optimizer for a specific database engine. This chapter describes the architecture of the node-partitioned data warehouse (NPDW), designed to run on the low cost environment, focusing on the design for partitioning, efficient parallel join and query transformations. Given the low reliability of the target environment, we also show how replicas are incorporated in the design of a robust NPDW strategy with availability guarantees and how the replicas are used for always-on, always efficient behavior in the presence of periodic load and maintenance tasks. Copyright © 2007, Idea Group Inc. Copying or distributing in print or electronic forms without written permission of Idea Group Inc. is prohibited. 204 Furtado
Decision Support for Management of Parallel Database Systems
- In Proceedings of HPCN Europe-96
, 1996
"... . Parallel database systems are generally recognised as one of the most important application areas for commercial parallel systems. However, the task of managing the performance of a parallel database system is exceedingly complex. The initial choice of hardware configuration to support a particula ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
. Parallel database systems are generally recognised as one of the most important application areas for commercial parallel systems. However, the task of managing the performance of a parallel database system is exceedingly complex. The initial choice of hardware configuration to support a particular DBMS application and the subsequent task of tuning the DBMS to improve performance rely not only on the way in which the data is structured, but also on how it is fragmented, replicated and distributed across the processing elements of the system. To understand the behaviour of a particular application requires the study of large volumes of performance data. To simplify this process it is essential to provide some means of presenting performance data in a comprehensible form which will aid visualisation. This paper explores some of the issues relating to decision support for the performance management of parallel database systems and describes an analytical capacity planning tool to assist...
An Analytical Tool for Predicting the Performance Of Parallel . . .
, 1999
"... ... This paper describes an analytical tool which determines the performance characteristics (in terms of throughput, resource utilisation and response time) of relational database transactions executing on particular machine configurations and provides simple graphical visualisations of these to en ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
... This paper describes an analytical tool which determines the performance characteristics (in terms of throughput, resource utilisation and response time) of relational database transactions executing on particular machine configurations and provides simple graphical visualisations of these to enable users to obtain rapid insight into particular scenarios. The problems of handling different parallel DBMSs are illustrated with reference to three systems - Ingres, Informix and Oracle. A brief description is also given of two different approaches used to confirm the validity of the analytical approach on which the tool is based
A Tool For Supporting The Teaching Of Parallel Database Systems
"... Parallel database systems are complex entities. As part of a course in a limited time scale, it is difficult to provide useful practical experience on these systems that provide deep insight into their behaviour and operation. This paper describes a tool for performance prediction which has been dev ..."
Abstract
- Add to MetaCart
Parallel database systems are complex entities. As part of a course in a limited time scale, it is difficult to provide useful practical experience on these systems that provide deep insight into their behaviour and operation. This paper describes a tool for performance prediction which has been developed to aid the visualisation of parallel database systems and which is currently being used to support teaching. It enables students to experiment with different hardware and software configurations and to view the effects of changes on the performance of the system. It provides insight into how data can be placed among the nodes of a parallel machine according to predefined strategies, as well as manually, and provides feedback on the effect of these on throughput and response time. It is able to provide a good appreciation of the concepts in a relatively short period of time.
Warehouses
"... Some businesses generate giga or even terabytes of historical data that can be organized and analyzed for better decision making. This poses issues concerning systems and software for efficient processing over such data. While the traditional solution to this problem involves costly hardware and sof ..."
Abstract
- Add to MetaCart
Some businesses generate giga or even terabytes of historical data that can be organized and analyzed for better decision making. This poses issues concerning systems and software for efficient processing over such data. While the traditional solution to this problem involves costly hardware and software, we focus on strategies for running large data warehouses over low-cost, non-dedicated nodes in a local-area network (LAN) and non-proprietary software. Once such a technology is in place, every data warehouse will be able to run in a small cost environment, but the system must be able to choose its placement and processing for maximum efficiency. We discuss the basic system architecture and the design of the data placement and processing strategy. We compare the shortcomings of a basic horizontal partitioning for the environment, with a simple design that produces efficient placements. Our discussion and results provide important insight into how low-cost efficient data warehouse systems can be obtained. Copyright © 2007, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. 134 Furtado

