Results 1 - 10
of
15
Parallel R-trees
, 1992
"... We consider the problem of exploiting parallelism to accelerate the performance of spatial access methods and specifically, R-trees [11]. Our goal is to design a server for spatial data, so that to maximize the throughput of range queries. This can be achieved by (a) maximizing parallelism for large ..."
Abstract
-
Cited by 66 (1 self)
- Add to MetaCart
We consider the problem of exploiting parallelism to accelerate the performance of spatial access methods and specifically, R-trees [11]. Our goal is to design a server for spatial data, so that to maximize the throughput of range queries. This can be achieved by (a) maximizing parallelism for large range queries, and (b) by engaging as few disks as possible on point queries [22]. We propose a simple hardware architecture consisting of one processor with several disks attached to it. On this architecture, we propose to distribute the nodes of a traditional R-tree, with cross-disk pointers (`Multiplexed' R-tree). The R-tree code is identical to the one for a single-disk R-tree, with the only addition that we have to decide which disk a newly created R-tree node should be stored in. We propose and examine several criteria to choose a disk for a new node. The most successful one, termed `proximity index' or PI, estimates the similarity of the new node with the other R-tree nodes already o...
Partitioning Similarity Graphs: A Framework for Declustering Problems
- Information Systems Journal
, 1996
"... Declustering problems are well-known in the databases for parallel computing environments. In this paper, we propose a new similarity-based technique for declustering data. The proposed method can adapt to the available information about query distribution (e.g. size, shape and frequency) and can ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
Declustering problems are well-known in the databases for parallel computing environments. In this paper, we propose a new similarity-based technique for declustering data. The proposed method can adapt to the available information about query distribution (e.g. size, shape and frequency) and can work with alternative atomic data-types. Furthermore, the proposed method is flexible and can work with alternative data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of atomic data-items that are frequently accessed together by queries are allocated to distinct disks. We describe the application of the proposed method to parallelizing Grid Files at the data page level. Detailed experiments in this context show that the proposed method adapts to query distribution and data distribution, and tha...
A Similarity Graph-Based Approach to Declustering Problems and Its Application towards Parallelizing Grid Files
- In the 11th Inter. Conference on Data Engineering
, 1995
"... We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given ..."
Abstract
-
Cited by 19 (8 self)
- Add to MetaCart
We propose a new similarity-based technique for declustering data. The proposed method can adapt to available information about query distributions, data distributions, data sizes and partition-size constraints. The method is based on max-cut partitioning of a similarity graph defined over the given set of data, under constraints on the partition sizes. It maximizes the chances that a pair of data-items that are to be accessed together by queries are allocated to distinct disks. We show that the proposed method can achieve optimal speed-up for a query-set, if there exists any other declustering method which will achieve the optimal speed-up. Experiments in parallelizing Grid Files show that the proposed method outperforms mappingfunction -based methods for interesting query distributions as well for non-uniform data distributions. 1 Introduction With an increasing performance gap between processors and I/O systems, parallelizing I/O operations by declustering data [8, 7, 21] is becom...
Physical Database Design Decision Algorithms and Concurrent Reorganization for Parallel Database Systems
, 1997
"... Stringent performance requirements in DB applications have led to the use of parallelism for database processing. To allow the database system to take advantage of the performance of parallel shared-nothing systems, the physical DB design must be appropriate for the DB structure and the workload. We ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Stringent performance requirements in DB applications have led to the use of parallelism for database processing. To allow the database system to take advantage of the performance of parallel shared-nothing systems, the physical DB design must be appropriate for the DB structure and the workload. We develop decision algorithms that will select a good physical DB design both when the DB is first loaded into the system (static decision) and while the DB is being used by the workload (dynamic decision). Our decision algorithms take the database structure, workload, and system characteristics as inputs. The static (or initial) physical DB design decision algorithm involves: • selecting a partitioning attribute for each relation that determines how the relation is fragmented across the nodes (allowing for high I/O bandwidth); • selecting indexes on the relation attributes to allow faster accesses compared to sequential file scans; • selecting the attributes by which to cluster a relation in order to take advantage of the prefetching and caching involved in I/O access; • grouping of relations to allow DB operations (joins) on relation pairs to be executed locally
Performance Evaluation of Parallel S-trees
, 2000
"... The S-tree is a dynamic height-balanced tree similar in structure to B+trees. S-trees store fixed length bit-strings, which are called signatures. Signatures are used for indexing textbases, relational, object oriented and extensible databases as well as in data mining. In this article, methods of d ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
The S-tree is a dynamic height-balanced tree similar in structure to B+trees. S-trees store fixed length bit-strings, which are called signatures. Signatures are used for indexing textbases, relational, object oriented and extensible databases as well as in data mining. In this article, methods of designing multi-disk B-trees are adapted to S-trees and new methods of parallelizing S-trees are developed. The resulting structures aim at achieving performance gain by accessing two or more disks simultaneously. In addition, two different searching techniques that exploit parallel disk accessing are devised. Performance results of experiments based on the new structures and searching techniques are also presented and commented.
µDatabase: A Toolkit for Constructing Memory Mapped Databases
- In Persistent Object Systems
, 1992
"... The main objective of this work was an efficient methodology for constructing low-level database tools that are built around a single-level store implemented using memory mapping. The methodology allowed normal programming pointers to be stored directly onto secondary storage, and subsequently retri ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The main objective of this work was an efficient methodology for constructing low-level database tools that are built around a single-level store implemented using memory mapping. The methodology allowed normal programming pointers to be stored directly onto secondary storage, and subsequently retrieved and manipulated by other programs without the need for relocation, pointer swizzling or reading all the data. File structures for a database, e.g. a BTree, built using this approach are significantly simpler to build, test, and maintain than traditional file structures. All access methods to the file structure are statically type-safe and file structure definitions can be generic in the type of the record and possibly key(s) stored in the file structure, which affords significant code reuse. An additional design requirement is that multiple file structures may be simultaneously accessible by an application. Concurrency at both the front end (multiple accessors) and the back end (file st...
R-tree Indexing by Multiple Processors
"... This work investigates the performance of parallel R-tree where the spatial data are high dimensional objects. Due to the large number of bounding values for each bounding box in the R-tree, one disk page might contain one bounding box only. Parallel Binary R-tree(PBR-tree) is proposed to facilitate ..."
Abstract
- Add to MetaCart
This work investigates the performance of parallel R-tree where the spatial data are high dimensional objects. Due to the large number of bounding values for each bounding box in the R-tree, one disk page might contain one bounding box only. Parallel Binary R-tree(PBR-tree) is proposed to facilitate the costly node access. An analytical performance evaluation of PBR-tree will be given. Improved Multiplexed R-tree for such data is proposed to achieve high throughput and short response time for concurrent small range queries. 1 Introduction It is foreseen that the Database Management System(DBMS) of the future shall be able to handle a lot of new data representation. Multimedia data(spatial data) are often represented by vectors in a multidimensional spaces. Typical operation on spatial data involves range queries which retrieve all the data objects intersecting with the user defined hyper rectangle(search rectangle)[4]. Visual data such as image is one of the multimedia data. Features...
Exploiting Advanced Database Optimization Features for Large-Scale SAP R/3 Installations
- In the 28th International Conference on Very Large Data Bases (VLDB 2002), Hong Kong
, 2002
"... The database volumes of enterprise resource planning (ERP) systems like SAP R/3 are growing at a tremendous rate and some of them have already reached a size of several Terabytes. OLTP (Online Transaction Processing) databases of this size are hard to maintain and tend to perform poorly. There ..."
Abstract
- Add to MetaCart
The database volumes of enterprise resource planning (ERP) systems like SAP R/3 are growing at a tremendous rate and some of them have already reached a size of several Terabytes. OLTP (Online Transaction Processing) databases of this size are hard to maintain and tend to perform poorly. Therefore most database vendors have implemented new features like horizontal partitioning to optimize such mission critical applications. Horizontal partitioning was already investigated in detail in the context of shared nothing distributed database systems but today's ERP systems mostly use a centralized database with a shared everything architecture. In this work, we therefore investigate how an SAP R/3 system performs when the data in the underlying database is partitioned horizontally.
University of Arizona
"... Overview and Topics: CSc560 is a graduate-level course in database systems that will emphasize the DBMS architecture and implementation issues as well as recent research problems. The topics to be covered include • Storage structures (disk, records, pages and files) and buffer management strategies, ..."
Abstract
- Add to MetaCart
Overview and Topics: CSc560 is a graduate-level course in database systems that will emphasize the DBMS architecture and implementation issues as well as recent research problems. The topics to be covered include • Storage structures (disk, records, pages and files) and buffer management strategies, • Access methods: B +-tree, multidimensional indexes (k-d trees, Grid files, R-trees), • Query evaluation and optimization, • Transaction management: concurrency control and recovery, • Database security and authorization, • Parallel database systems, decision support and OLAP, • Advanced search: k-NN similarity search, string databases, • XML: labeling schemes, path joins, selective dissemination (if time permits). Prerequisites: CSc460 (Database Design), or the instructor’s permission. Recommended texts:
Modelization and simulation of parallel relational query execution plans using DPL graphs and High-level Petri nets
, 1996
"... This report presents a novel representation model of parallel relational query execution plans, called DPL graphs. This model allows to deal with any kind of parallel architecture and any kind of parallel execution strategy. Based on an analysis of execution dependencies between operators, this m ..."
Abstract
- Add to MetaCart
This report presents a novel representation model of parallel relational query execution plans, called DPL graphs. This model allows to deal with any kind of parallel architecture and any kind of parallel execution strategy. Based on an analysis of execution dependencies between operators, this model allows to precisely represent communications, run-time control mechanisms, scheduling constraints or specific processing strategies (e.g. bucket processing). This report especially focus on the modelization and the simulation of the data and control flows which are realized using high-level Petri nets.

