• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Buffering and read-ahead strategies for external mergesort (1998)

by W Zhang, P-A Larson
Venue:in VLDB
Add To MetaCart

Tools

Sorted by:
Results 1 - 5 of 5

Implementing Sorting in Database Systems

by Goetz Graefe - ACM Comput. Surv , 2006
"... Most commercial database systems do (or should) exploit many sorting techniques that are publicly known, but not readily available in the research literature. These techniques improve both sort performance on modern computer systems and the ability to adapt gracefully to resource fluctuations in mul ..."
Abstract - Cited by 12 (3 self) - Add to MetaCart
Most commercial database systems do (or should) exploit many sorting techniques that are publicly known, but not readily available in the research literature. These techniques improve both sort performance on modern computer systems and the ability to adapt gracefully to resource fluctuations in multiuser operations. This survey collects many of these techniques for easy reference by students, researchers, and product developers. It covers in-memory sorting, disk-based external sorting, and considerations that apply specifically to sorting in database systems.

Efficient bundle sorting

by Yossi Matias, Eran Segal, Jeffrey, Scott Vitter - In Proceedings of the 11th Annual SIAM/ACM Symposium on Discrete Algorithms , 2000
"... Abstract. Many data sets to be sorted consist of a limited number of distinct keys. Sorting such data sets can be thought of as bundling together identical keys and having the bundles placed in order; we therefore denote this as bundle sorting. We describe an efficient algorithm for bundle sorting i ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
Abstract. Many data sets to be sorted consist of a limited number of distinct keys. Sorting such data sets can be thought of as bundling together identical keys and having the bundles placed in order; we therefore denote this as bundle sorting. We describe an efficient algorithm for bundle sorting in external memory, which requires at most c(N/B) log M/B k disk accesses, where N is the number of keys, M is the size of internal memory, k is the number of distinct keys, B is the transfer block size, and 2 <c<4. For moderately sized k, this bound circumvents the Θ((N/B) log M/B (N/B)) I/O lower bound known for general sorting. We show that our algorithm is optimal by proving a matching lower bound for bundle sorting. The improved running time of bundle sorting over general sorting can be significant in practice, as demonstrated by experimentation. An important feature of the new algorithm is that it is executed “in-place, ” requiring no additional disk space.

A Simple and Efficient Parallel Disk Mergesort

by R. D. Barve, J. S. Vitter , 2002
"... External sorting—the process of sorting a file that is too large to fit into the computer’s internal memory and must be stored externally on disks—is a fundamental subroutine in database systems [G], [IBM]. Of prime importance are techniques that use multiple disks in parallel in order to speed up t ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
External sorting—the process of sorting a file that is too large to fit into the computer’s internal memory and must be stored externally on disks—is a fundamental subroutine in database systems [G], [IBM]. Of prime importance are techniques that use multiple disks in parallel in order to speed up the performance of external sorting. The simple randomized merging (SRM) mergesort algorithm proposed by Barve et al. [BGV] is the first parallel disk sorting algorithm that requires a provably optimal number of passes and that is fast in practice. Knuth [K, Section 5.4.9] recently identified SRM (which he calls “randomized striping”) as the method of choice for sorting with parallel disks. In this paper we present an efficient implementation of SRM, based upon novel and elegant data structures. We give a new implementation for SRM’s lookahead forecasting technique for parallel prefetching and its forecast and flush technique for buffer management. Our techniques amount to a significant improvement in the way SRM carries out the parallel, independent disk accesses necessary to read blocks of input runs efficiently during external merging. Our implementation is

Efficient data organization and management on

by Minglong Shao, Greg Ganger, Todd Mowry , 2008
"... heterogeneous storage hierarchies ..."
Abstract - Add to MetaCart
heterogeneous storage hierarchies

Optimized Query Evaluation Using Cooperative Sorts

by Yu Cao, Ramadhana Bram, Chee-yong Chan, Kian-lee Tan
"... Abstract — Many applications require sorting a table over multiple sort orders: generation of multiple reports from a table, evaluation of a complex query that involves multiple instances of a relation, and batch processing of a set of queries. In this paper, we study how multiple sortings of a tabl ..."
Abstract - Add to MetaCart
Abstract — Many applications require sorting a table over multiple sort orders: generation of multiple reports from a table, evaluation of a complex query that involves multiple instances of a relation, and batch processing of a set of queries. In this paper, we study how multiple sortings of a table can be efficiently performed. We introduce a new evaluation technique, called cooperative sort, that exploits the relationships among the input set of sort orders to minimize I/O operations for the collection of sort operations. To demonstrate the efficiency of the proposed scheme, we implemented it in PostgreSQL and evaluated its performance using both TPC-DS benchmark and synthetic data. Our experimental results show significant performance improvement over the traditional non-cooperative sorting scheme. I.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University