Results 1 - 10
of
17
Query evaluation techniques for large databases
- ACM COMPUTING SURVEYS
, 1993
"... Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On ..."
Abstract
-
Cited by 592 (7 self)
- Add to MetaCart
Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate it: In order to manipulate large sets of complex objects as efficiently as today’s database systems manipulate simple records, query processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and post-relational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Autopart: Automating schema design for large scientific databases using data partitioning
- In Proceedings of the 16th International Conference on Scientific and Statistical Database Management
, 2004
"... Database applications that use multi-terabyte datasets are becoming increasingly important for scientific fields such as astronomy and biology. Scientific databases are particularly suited for the application of automated physical design techniques, because of their data volume and the complexity of ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Database applications that use multi-terabyte datasets are becoming increasingly important for scientific fields such as astronomy and biology. Scientific databases are particularly suited for the application of automated physical design techniques, because of their data volume and the complexity of the scientific workloads. Current automated physical design tools focus on the selection of indexes and materialized views. In large-scale scientific databases, however, the data volume and the continuous insertion of new data allows for only limited indexes and materialized views. By contrast, data partitioning does not replicate data, thereby reducing space requirements and minimizing update overhead. In this paper we propose AutoPart, an algorithm that automatically partitions database tables to optimize sequential access assuming prior knowledge of a representative workload. The resulting schema is indexed using a fraction of the space required for indexing the original schema. To evaluate AutoPart, we build an automated schema design tool that interfaces to commercial database systems. We experiment with AutoPart in the context of the Sloan Digital Sky Survey database, a real-world astronomical database, running on SQL Server 2000. Our experiments corroborate the benefits of partitioning for large-scale systems: Partitioning alone improves query execution performance by a factor of two on average. Combined with indexes, the new schema also outperforms the indexed original schema by 20 % (for queries) and a factor of five (for updates), while using only half the original index space.
A Mixed Fragmentation Methodology for Initial Distributed Database Design
, 1995
"... We define mixed fragmentation as a process of simultaneously applying the horizontal and vertical fragmentation on a relation. It can be achieved in one of two ways: by performing horizontal fragmentation followed by vertical fragmentation or by performing vertical fragmentation followed by horizont ..."
Abstract
-
Cited by 20 (2 self)
- Add to MetaCart
We define mixed fragmentation as a process of simultaneously applying the horizontal and vertical fragmentation on a relation. It can be achieved in one of two ways: by performing horizontal fragmentation followed by vertical fragmentation or by performing vertical fragmentation followed by horizontal fragmentation. The need for mixed fragmentation arises in distributed databases because database users usually access subsets of data which are vertical and horizontal fragments of global relations and there is a need to process queries or transactions that would access these fragments optimally. We present algorithms for generating candidate vertical and horizontal fragmentation schemes and propose a methodology for distributed database design using these fragmentation schemes. When applied together these schemes form a grid. This grid consisting of cells is then merged to form mixed fragments so as to minimize the number of disk accesses required to process the distributed transactions....
Index merging
- In Proceedings of the International Conference on Data Engineering (ICDE
, 1999
"... Indexes play a vital role in decision support systems by reducing the cost of answering complex queries. A popular methodology for choosing indexes that is adopted by database administrators as well as automatic tools is: (a) Consider poorly performing queries in the workload. (b) For each query, pr ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
Indexes play a vital role in decision support systems by reducing the cost of answering complex queries. A popular methodology for choosing indexes that is adopted by database administrators as well as automatic tools is: (a) Consider poorly performing queries in the workload. (b) For each query, propose a set of candidate indexes that potentially benefits the query. (c) Choose a subset from the candidate indexes in (b). Unfortunately, such a strategy can result in significant storage and index maintenance cost. In this paper, we present a novel technique called index merging to address the above shortcoming. Index merging can take an existing set of indexes (perhaps optimized for individual queries in the workload), and produce a new set of indexes with significantly lower storage and maintenance overhead, while retaining almost all the querying benefits of the initial set of indexes. We present an efficient algorithm for index merging, and demonstrate significant savings in index storage and maintenance by virtue of index merging, through experiments on Microsoft SQL Server 7.0. 1.
Distributed Object Based Design: Vertical Fragmentation of Classes
, 1998
"... Processing costs in distributed environments is most often dominated by the network communications required for interprocess communication. It is well-known from distributed relational database design research that careful placement of data "near" the users or processors where it is used is manda ..."
Abstract
-
Cited by 13 (5 self)
- Add to MetaCart
Processing costs in distributed environments is most often dominated by the network communications required for interprocess communication. It is well-known from distributed relational database design research that careful placement of data "near" the users or processors where it is used is mandatory or system performance will suffer greatly. Data placement in relational database systems is comparatively simple because the data is flat, structured, and passive. Objects are characterized by an inheritance hierarchy (other hierarchies could also be considered including, class composition and execution), unstructured (possibly dynamic data), and contain a behavioral component that defines how the "data" is accessed by encapsulating it within the object per se. Algorithms currently exist for fragmenting relations, but the fragmentation and allocation of objects is still a relatively untouched field of study. Similar to relations, objects can be fragmented both horizontally and ve...
A Formal Approach to the Vertical Partitioning Problem in Distributed Database Design
- In Technical Report. CIS Dept, Univ. of
, 1993
"... The design of distributed databases is an optimization problem requiring solutions to several interrelated problems: data fragmentation, allocation, and local optimization. Each problem can be solved with several different approaches thereby making the distributed database design a very difficult ta ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
The design of distributed databases is an optimization problem requiring solutions to several interrelated problems: data fragmentation, allocation, and local optimization. Each problem can be solved with several different approaches thereby making the distributed database design a very difficult task. Although there is a large body of work on the design of data fragmentation, most of them are either ad hoc solutions or formal solutions for special cases (e. g., binary vertical partitioning). In this paper, we address the problem of n-ary vertical partitioning problem and derive an objective function that generalizes and subsumes earlier work. The objective function derived in this paper is being used for developing heuristic algorithms that can be shown to satisfy the objective function. The objective function is also being used for comparing previously proposed algorithms for vertical partitioning. We first derive an objective function that is suited to distributed transaction proces...
Fragmentation of XML Documents
- In: Proc. of SBBD. (2003
, 2003
"... The world-wide web (WWW) is often considered to be the world’s largest database and the eXtensible Markup Language (XML) is then considered to provide its datamodel. Adopting this view we have to deal with a distributed database. This raises the question, how to obtain a suitable distribution design ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
The world-wide web (WWW) is often considered to be the world’s largest database and the eXtensible Markup Language (XML) is then considered to provide its datamodel. Adopting this view we have to deal with a distributed database. This raises the question, how to obtain a suitable distribution design for XML documents. In this paper horizontal and vertical fragmentation techniques are generalised from the relational datamodel to XML. Furthermore, splitting will be introduced as a third kind of fragmentation. Then it is shown how relational techniques for defining reasonable fragments can be applied to the case of XML. 1
An Overview of Vertical Partitioning in Object Oriented Databases
- The Computer Journal
, 1999
"... this paper, some interesting issues related to vertical partitioning in object oriented database systems are presented. A review of existing research is given with an identification of some open problems. A taxonomy of various possible partitioning schemes and a unified view of the vertical parti ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
this paper, some interesting issues related to vertical partitioning in object oriented database systems are presented. A review of existing research is given with an identification of some open problems. A taxonomy of various possible partitioning schemes and a unified view of the vertical partitioning problem are also presented. Existing vertical partitioning algorithms have been studied for their use in both parallel and distributed object-oriented databases
The Use of a Combined Text/Relational Database System to Support Document Management
, 1996
"... In this thesis, we study the problem of representing and manipulating a document to facilitate browsing, editing, string/content searches and document assembly. Two major data models in which documents are represented and stored are : 1. a relational data model, where all text contents in a docume ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this thesis, we study the problem of representing and manipulating a document to facilitate browsing, editing, string/content searches and document assembly. Two major data models in which documents are represented and stored are : 1. a relational data model, where all text contents in a document are represented in relations, each with several attributes, or 2. a text data model, where documents are represented as contiguous characters, typically interspersed with tags to capture their various logical, semantic, and presentational features and relationships Each approach has its own strengths and limitations. In our work, we study how a hybrid system based on a combined text/relational model can support document management. We describe database design trade-offs involving the appropriate placement of information in the text and relational database components. With an appropriate design, the advantages of both models can be exploited, while the shortcomings of using them individua...
On Data Fragmentation and Allocation in Distributed Object Oriented Databases
, 1997
"... The objective of object oriented databases is to respond to the needs of new applications as engineering and multimedia which manipulate complex data. Furthermore, most of these applications need to execute in a distributed environment, making necessary the distribution of data on different sites. T ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The objective of object oriented databases is to respond to the needs of new applications as engineering and multimedia which manipulate complex data. Furthermore, most of these applications need to execute in a distributed environment, making necessary the distribution of data on different sites. This must garantee a minimum cost of inter-site communication and minimum access to irrelevant data. Data distribution has largely been studied in the relational model, but the complex structure of objects and their relationships make it difficult for object oriented databases. In this paper, both fragmentation and allocation problems are tackeled. Hybrid fragments are defined by applying horizontal fragmentation followed by vertical fragmentation to each database class. Resulting fragments are disjoint regarding to data but overlapping regarding to methods. An optimisation model for a nonreplicated data allocation in the form of a non linear integer zero-one programming problem is also develo...

