## Vertical partitioning of relational OLTP databases using integer programming

### BibTeX

@MISC{Amossen_verticalpartitioning,

author = {Rasmus Resen Amossen},

title = {Vertical partitioning of relational OLTP databases using integer programming},

year = {}

}

### OpenURL

### Abstract

Abstract — A way to optimize performance of relational row store databases is to reduce the row widths by vertically partitioning tables into table fractions in order to minimize the number of irrelevant columns/attributes read by each transaction. This paper considers vertical partitioning algorithms for relational rowstore OLTP databases with an H-store-like architecture, meaning that we would like to maximize the number of single-sited transactions. We present a model for the vertical partitioning problem that, given a schema together with a vertical partitioning and a workload, estimates the costs (bytes read/written by storage layer access methods and bytes transferred between sites) of evaluating the workload on the given partitioning. The cost model allows for arbitrarily prioritizing load balancing of sites vs. total cost minimization. We show that finding a minimum-cost vertical partitioning in this model is NP-hard and therefore the problem should obviously not be solved manually by a human DBA. We present two algorithms returning solutions in which singlesitedness of read queries is preserved while allowing column replication (which may allow a drastically reduced cost compared to disjoint partitioning). The first algorithm is a quadratic integer program that finds optimal minimum-cost solutions with respect to the model, and the second algorithm is a more scalable heuristic based on simulated annealing. Experiments show that the algorithms can reduce the cost of the model objective by 37 % when applied to the TPC-C benchmark and the heuristic is shown to obtain solutions with costs close to the ones found using the quadratic program. I.

### Citations

2152 |
Algorithms for Clustering Data
- Jain, Dubes
- 1988
(Show Context)
Citation Context ...Chakravarthy et al. [3] did not present an algorithm but gave an interesting objective function for evaluating vertical partitionings. The function was based on the square-error criterion as given in =-=[8]-=- for data clustering, but did not cover placement of transactions which, in our case, has a large influence on the expected costs. Navathe et al. [13] considered the vertical partitioning problem for ... |

160 | A tree projection algorithm for generation of frequent itemsets
- Agarwal, Aggarwal, et al.
- 2000
(Show Context)
Citation Context ... [2] also constructed a disjoint partitioning with non-remote partition placement. They used a two-phase strategy where the first phase generated all relevant attribute groups using association rules =-=[1]-=- considering only one query at a time, and the second phase merged the attribute groups that were useful across queries. Son and Kim [11] presented an algorithm for generating disjoint partitioning by... |

108 | The end of an architectural era: (it’s time for a complete rewrite
- Stonebraker, Madden, et al.
- 2007
(Show Context)
Citation Context ...TPC-C benchmark and the heuristic is shown to obtain solutions with costs close to the ones found using the quadratic program. I. INTRODUCTION In this paper we consider OLTP databases with an H-store =-=[12]-=- like architecture in which we would aim for maximizing the number of single-sited transactions (i.e. transactions that can be run to completion on a single site). The full version of the paper can be... |

85 | Vertical partitioning algorithms for database design
- Navathe, Ceri, et al.
- 1984
(Show Context)
Citation Context ...nd method were presented with varying complexity and accuracy. The partitionings were formed by recursively applying the binary partitioning algorithms on the set of “reasonable cuts”. Navathe et al. =-=[10]-=- considered the vertical partitioning problem for three different environments: a) single site with one memory level, b) single site with several memory levels, and c) multiple sites. The partitions c... |

62 | Integrating vertical and horizontal partitioning into automated physical database design
- Agrawal, Narasayya, et al.
- 2004
(Show Context)
Citation Context ... binary partitioning. The partitioning decisions were based on an integer program and with strong assumptions on a System-R like architecture when estimating the amount of disk access. Agrawal et al. =-=[2]-=- also constructed a disjoint partitioning with non-remote partition placement. They used a two-phase strategy where the first phase generated all relevant attribute groups using association rules [1] ... |

40 | H-store: a high-performance, distributed main memory transaction processing system
- Kallman, Kimura, et al.
(Show Context)
Citation Context ...isms needed for non-single-sited or parallel queries (e.g. undo and redo logs) are assumed to be bottlenecks in situations with short transaction durations. Stonebraker et al. [12] and Kallman et al. =-=[8]-=- discuss the benefits of single-sitedness in high-throughput OLTP databases in more details. This paper presents a cost model together with two algorithms that find either optimal or close-to-optimal ... |

28 | Database partitioning in a cluster of processors
- Sacca, Widerhold
- 1983
(Show Context)
Citation Context ...h transaction. By connecting attribute and transaction nodes with a weighted edge according to their affinity, a min-cut algorithm could be applied to construct the partitioning. Sacca and Wiederhold =-=[15]-=- assumed a set of horizontal and vertical fragments of a database was known in advance and produced a disjoint distribution of 3these fragments onto a set of network-connected processors using a gree... |

27 |
Mathematical techniques for efficient record segmentation in large shared databases
- Eisner, Severance
- 1976
(Show Context)
Citation Context ...otal cost minimization. We therefore order the references below by increasing estimated problem similarity and do not mention work dedicated on vertical partitioning of OLAP databases. In 1976 Eisner =-=[6]-=- reduced the cost of information retrieval by vertically partitioning records into a primary and a secondary record segment. This was done by constructing a bi-partite graph with two node sets: one se... |

22 |
An Effective Approach to Vertical Partitioning for Physical Design of Relational Databases
- Cornell, Yu
- 1990
(Show Context)
Citation Context ...ng estimated problem similarity and do not mention work dedicated on vertical partitioning of OLAP databases. The full version of this paper [3] describes some additional related work. Cornell and Yu =-=[5]-=- generated a non-remote, disjoint partitioning minimizing the amount of disk access by recursively applying a binary partitioning. The partitioning decisions were based on an integer program and with ... |

21 |
A transactionbased approach to vertical partitioning for relational database systems
- Chu, Ieong
- 1993
(Show Context)
Citation Context ... super-partition existed. The k-way partitioning was found at the iteration having exactly k partitions and the lowest-cost partitioning was found at the iteration with the lowest cost. Chu and Ieong =-=[4]-=- minimized the amount of disk access by constructing a non-remote and non-disjoint vertical partitioning. Two binary partitioning algorithms based on the branchand-bound method were presented with var... |

17 | An objective function for vertically partitioning relations in distributed databases and its .... Distributed and parallel databases
- Chakravarthy, Muthuraj, et al.
- 1994
(Show Context)
Citation Context ...thod were presented with varying complexity and accuracy. The partitionings were formed by recursively applying the binary partitioning algorithms on the set of “reasonable cuts”. Chakravarthy et al. =-=[3]-=- did not present an algorithm but gave an interesting objective function for evaluating vertical partitionings. The function was based on the square-error criterion as given in [8] for data clustering... |

12 |
Problem decomposition and data reorganization by a clustering technique
- Jr, Schweitzer, et al.
- 1972
(Show Context)
Citation Context ...sites. The partitions could be both disjoint and non-disjoint. A clustering algorithm grouped attributes with high affinity by using an attribute affinity matrix together with a bond energy algorithm =-=[7]-=-. Three basic algorithms for generating partitions were presented which, depending on the desired environment, used different prioritization of four access and transfer cost classes. D. Outline of pap... |

2 |
On the complexity of finding stationary points of nonconvex quadratic programs
- Murty, Júdice
- 1996
(Show Context)
Citation Context ...r-right quadrant equal zero and letting the upper-right and lower-left submatrices be defined by c1(a, t). Q is indefinite and the cost function (6) therefore not convex. As shown by Marty and Judice =-=[9]-=- finding optimum when Q is indefinite is NP-hard. III. THE SA SOLVER – A HEURISTIC APPROACH We develop a heuristic based on simulated annealing (see [13]) and will refer to it as the SA-solver from no... |

2 |
An adaptable vertical partitioning method in distributed systems
- Son, Kim
- 2004
(Show Context)
Citation Context ...generated all relevant attribute groups using association rules [1] considering only one query at a time, and the second phase merged the attribute groups that were useful across queries. Son and Kim =-=[11]-=- presented an algorithm for generating disjoint partitioning by either minimizing costs or by ensuring that exactly k vertical fragments were produced. Intersite transfer costs were not considered. Th... |

2 |
A constrained nonlinear 0–1 program for data allocation
- Sarathy, Shetty, et al.
- 1997
(Show Context)
Citation Context ...eedy firstfit bin packing heuristic. Similarly, Menon [12] distributed a set of predefined fragments to a set of sites, but used a linearized quadratic program to compute the solution. Sarathy et al. =-=[16]-=- took as input a geographically distributed database together with statistics for a query pattern on this database and produced as output a non-disjoint distribution of whole database tables to the ph... |

1 |
ISBN 0-471-28366-5. APPENDIX This section describes how to extend the algorithms to also estimate costs of network latency for queries accessing attributes on remote sites. We assume, that all remote access (if any) for queries are done in parallel and wi
- WileyInterscience
- 1998
(Show Context)
Citation Context ...refore not convex. As shown by Marty and Judice [9] finding optimum when Q is indefinite is NP-hard. III. THE SA SOLVER – A HEURISTIC APPROACH We develop a heuristic based on simulated annealing (see =-=[13]-=-) and will refer to it as the SA-solver from now on. The base idea is to alternately fix x and y and only optimize the not-fixed vector, thereby simplifying the problem. In each iteration we search in... |

1 |
Allocating fragments in distributed databases. IEEE transactions on parallel and distributed systems
- Menon
- 2005
(Show Context)
Citation Context ...of a database was known in advance and produced a disjoint distribution of 3these fragments onto a set of network-connected processors using a greedy firstfit bin packing heuristic. Similarly, Menon =-=[12]-=- distributed a set of predefined fragments to a set of sites, but used a linearized quadratic program to compute the solution. Sarathy et al. [16] took as input a geographically distributed database t... |

1 |
ISBN 0-471-28366-5. 18 Latency This section describes how to extend the algorithms to also estimate costs of network latency for queries accessing attributes on remote sites. We assume, that all remote access (if any) for queries are done in parallel and
- Wiley-Interscience
- 1998
(Show Context)
Citation Context ...re not convex. As shown by Marty and Judice [11] finding optimum when Q is indefinite is NP-hard. (7) 83 The SA solver – a heuristic approach We develop a heuristic based on simulated annealing (see =-=[19]-=-) and will refer to it as the SA-solver from now on. The base idea is to alternately fix x and y and only optimize the not-fixed vector, thereby simplifying the problem. In each iteration we search in... |