Results 1 - 10
of
18
Data allocation in distributed database systems
- ACM Transactions on Database Systems
, 1988
"... The problem of allocating the data of a database to the sites of a communication network is investigated. This problem deviates from the well-known file allocation problem in several aspects. First, the objects to be allocated are not known a priori; second, these objects are accessed by schedules t ..."
Abstract
-
Cited by 61 (1 self)
- Add to MetaCart
The problem of allocating the data of a database to the sites of a communication network is investigated. This problem deviates from the well-known file allocation problem in several aspects. First, the objects to be allocated are not known a priori; second, these objects are accessed by schedules that contain transmissions between objects to produce the result. A model that makes it possible to compare the cost of allocations is presented, the cost can be computed for different cost functions and for processing schedules produced by arbitrary query processing algorithms. For minimizing the total transmission cost, a method is proposed to determine the fragments to be allocated from the relations in the conceptual schema and the queries and updates executed by the users. For the same cost function, the complexity of the data allocation problem is investigated. Methods for obtaining optimal and heuristic solutions under various ways of computing the cost of an allocation are presented and compared. Two different approaches to the allocation management problem are presented and their merits are discussed.
Integrating vertical and horizontal partitioning into automated physical database design
- In Proceedings of ACM SIGMOD
, 2004
"... In addition to indexes and materialized views, horizontal and vertical partitioning are important aspects of physical design in a relational database system that significantly impact performance. Horizontal partitioning also provides manageability; database administrators often require indexes and t ..."
Abstract
-
Cited by 48 (6 self)
- Add to MetaCart
In addition to indexes and materialized views, horizontal and vertical partitioning are important aspects of physical design in a relational database system that significantly impact performance. Horizontal partitioning also provides manageability; database administrators often require indexes and their underlying tables partitioned identically so as to make common operations such as backup/restore easier. While partitioning is important, incorporating partitioning makes the problem of automating physical design much harder since: (a) The choices of partitioning can strongly interact with choices of indexes and materialized views. (b) A large new space of physical design alternatives must be considered. (c) Manageability requirements impose a new constraint on the problem. In this paper, we present novel techniques for designing a scalable solution to this integrated physical design problem that takes both performance and manageability into account. We have implemented our techniques and evaluated it on Microsoft SQL Server. Our experiments highlight: (a) the importance of taking an integrated approach to automated physical design and (b) the scalability of our techniques. 1.
Automating physical database design in a parallel database
- Proc. 2002 ACM SIGMOD
, 2002
"... LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. Ithas ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. Ithas
Data Placement in Parallel Database Systems
, 1996
"... The way in which data is distributed across the processing elements of a parallel sharednothing architecture can have a significant effect on the performance of a parallel DBMS. Data placement strategies provide a mechanical approach to determining a data distribution which will provide good perform ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
The way in which data is distributed across the processing elements of a parallel sharednothing architecture can have a significant effect on the performance of a parallel DBMS. Data placement strategies provide a mechanical approach to determining a data distribution which will provide good performance. However, there is considerable variation in the results produced by different strategies and no simple way of determining which strategy will provide the best results for any particular database application. This paper considers five different data placement strategies and illustrates some of the problems associated with the placement of data by studying the sensitivity of the results produced by these different strategies to the changes in a number of environmental factors, such as the number of processing elements participating in database activities and the size of database. The study was conducted by using an analytical performance estimator for parallel database systems, in the co...
Low Overhead Concurrency Control for Partitioned Main Memory Databases
"... Database partitioning is a technique for improving the performance of distributed OLTP databases, since “single partition” transactions that access data on one partition do not need coordination with other partitions. For workloads that are amenable to partitioning, some argue that transactions shou ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
Database partitioning is a technique for improving the performance of distributed OLTP databases, since “single partition” transactions that access data on one partition do not need coordination with other partitions. For workloads that are amenable to partitioning, some argue that transactions should be executed serially on each partition without any concurrency at all. This strategy makes sense for a main memory database where there are no disk or user stalls, since the CPU can be fully utilized and the overhead of traditional concurrency control, such as two-phase locking, can be avoided. Unfortunately, many OLTP applications have some transactions which access multiple partitions. This introduces network stalls in order to coordinate distributed transactions, which will limit the performance of a database that does not allow concurrency. In this paper, we compare two low overhead concurrency control schemes that allow partitions to work on other transactions during network stalls, yet have little cost in the common case when concurrency is not needed. The first is a light-weight locking scheme, and the second is an even lighter-weight type of speculative concurrency control that avoids the overhead of tracking reads and writes, but sometimes performs work that eventually must be undone. We quantify the range of workloads over which each technique is beneficial, showing that speculative concurrency control generally outperforms locking as long as there are few aborts or few distributed transactions that involve multiple rounds of communication. On a modified TPC-C benchmark, speculative concurrency control can improve throughput relative to the other schemes by up to a factor of two.
Practical Throughput Estimation for Parallel Databases
- SOFTWARE ENGINEERING JOURNAL
, 1996
"... ... This paper describes an approach to performance estimation for shared-nothing parallel database systems. It estimates system throughput for a given benchmark or set of queries, and can exercise different data placement schemes to determine the data layout which provides the best throughput value ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
... This paper describes an approach to performance estimation for shared-nothing parallel database systems. It estimates system throughput for a given benchmark or set of queries, and can exercise different data placement schemes to determine the data layout which provides the best throughput value.
On Complex Object Distribution Technique for Distributed Computing Systems
- in Proceedings of the 6th International Conference on Computing and Information (ICCI'94
"... Object partitioning is an essential mechanism for improving the performance of object-based systems. However, in most of the work to date, emphasis has been confined to the optimization of disk rotation and seek time. The trend in the development of advanced information systems has become more and m ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Object partitioning is an essential mechanism for improving the performance of object-based systems. However, in most of the work to date, emphasis has been confined to the optimization of disk rotation and seek time. The trend in the development of advanced information systems has become more and more towards distributed environments. In a distributed environment, multiple loosely-coupled (or even independent) systems are interconnected through communication networks, each, possibly, with its own local disks. An essential step in developing an information system in such a distributed environment is to efficiently distribute data to disks attached to the sites. Therefore, network communication overhead becomes an important concern during object partitioning. In this paper, we exploit the Mincut algorithm and introduce the concept of the binding strength to interpret the relationships between objects in order to develop a linear time partitioning algorithm. The algorithm partitions the ...
An iterative method for distributed database design
- In Proc. of the Conf. on Very Large Data Bases (VLDB
, 1991
"... The development of a distributed database system requires effective solutions to many complex and interrelated design problems. The cost dependencies between query opti-mization and data allocation on distrihuled systems are well recognized but little under-stood. We investigate these dependencies b ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The development of a distributed database system requires effective solutions to many complex and interrelated design problems. The cost dependencies between query opti-mization and data allocation on distrihuled systems are well recognized but little under-stood. We investigate these dependencies by proposing and analyzing an iterative heuristic which provides an integrated solution lo the query optimization and data allocation prob-lems, The optimization heuristic itcrates between finding minimum cost query slrate-gies and minimum cost data allocations until a local minimum for the combined problem is found. A search from convergence efficiently scans the optimization search space for lower cost solutions. Parametric studies within a simple query environment demonstrate near-optimal performance for the iterative method when minimizing lolal time and response cost of queries. The iterative method provides clear improvements over alternative solution methods. The paper concludes with the practical implications of this research and its future directions.
Data Engineering
- in Handbook of Software Engineering, Les Belady and
, 1994
"... data type pertaining to instances referred to by the identifier Part of Member of higher level abstract data type Composed of Nested data elements included Range Limits, or a list of permissible values Representation Format of data values, i.e., REAL, CHAR, etc. Size Number of bits or characters, or ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
data type pertaining to instances referred to by the identifier Part of Member of higher level abstract data type Composed of Nested data elements included Range Limits, or a list of permissible values Representation Format of data values, i.e., REAL, CHAR, etc. Size Number of bits or characters, or limit, if variable Marking Method of denoting variable length (count, terminator, ...) Cardinality Expected number of elements Specifier Designer of this data element specification Owner Individual responsible for data values (Accessors: access types+) Users having access privileges Sources+ Names of programs which create new data values Modifiers Names of programs which modify the data values Uses+ Names of programs using the data Load estimates Frequencies of update and retrieval Storage node and file Persistent storage for the data values Backup files Storage nodes and files for recovery of the data Secondary files Other files containing the data Comments For anything not categorized abo...
A Two-Phase Approach to Data Allocation in Distributed Databases
, 1995
"... In this paper, we propose a two-phase approach to the problem of optimal allocation of data objects (fragments) on a network in a distributed database system. In the first phase, we perform fragment clustering 1 , in which we form groupings of fragments that tend to be accessed together. In the se ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we propose a two-phase approach to the problem of optimal allocation of data objects (fragments) on a network in a distributed database system. In the first phase, we perform fragment clustering 1 , in which we form groupings of fragments that tend to be accessed together. In the second phase, we use a "divide and conquer " search technique to allocate clusters to the computing nodes (sites) in the network. We show, via complexity analysis, that the combined process of clustering and data allocation takes time that is polynomial with respect to the number of objects and sites. We also show, via experimental analysis, that our approach produces solutions that are close to optimal for a wide range of fragmentations, queries and network structures. 1 Introduction Data allocation is a critical aspect of distributed database systems: a poorly-designed data allocation can lead to inefficient computation, high access costs, and high network loads [15, 16] whereas a welldesig...

