Results 1 -
4 of
4
Speeding up External Mergesort
- IEEE Transactions on Knowledge and Data Engineering
"... External mergesort is normally implemented so that each run is stored contiguously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Int ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
External mergesort is normally implemented so that each run is stored contiguously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Interleaved layout places blocks from different runs in consecutive disk addresses. This is done in the hope that interleaving will reduce seek overhead during merging. The new reading strategy precomputes the order in which data blocks are to be read according to where they are located on disk and when they are needed for merging. Extra buffer space makes it possible to read blocks in an order that reduces seek overhead, instead of reading them exactly in the order they are needed for merging. A detailed simulation model was used to compare the two layout strategies and three reading strategies. The effects of using multiple work disks were also investigated. We found that, in most cases, inte...
MCSTL: The Multi-Core Standard Template Library
"... Abstract. 1 Future gain in computing performance will not stem from increased clock rates, but from even more cores in a processor. Since automatic parallelization is still limited to easily parallelizable sections of the code, most applications will soon have to support parallelism explicitly. The ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Abstract. 1 Future gain in computing performance will not stem from increased clock rates, but from even more cores in a processor. Since automatic parallelization is still limited to easily parallelizable sections of the code, most applications will soon have to support parallelism explicitly. The Multi-Core Standard Template Library (MCSTL) simplifies parallelization by providing efficient parallel implementations of the algorithms in the C++ Standard Template Library. Thus, simple recompilation will provide partial parallelization of applications that make consistent use of the STL. We present performance measurements on several architectures. For example, our sorter achieves a speedup of 21 on an 8-core 32-thread SUN T1. 1
Towards Optimal Range Medians ⋆
"... Abstract. We consider the following problem: Given an unsorted array of n elements, and a sequence of intervals in the array, compute the median in each of the subarrays defined by the intervals. We describe a simple algorithm which needs O(n log k + k log n) time to answer k such median queries. Th ..."
Abstract
- Add to MetaCart
Abstract. We consider the following problem: Given an unsorted array of n elements, and a sequence of intervals in the array, compute the median in each of the subarrays defined by the intervals. We describe a simple algorithm which needs O(n log k + k log n) time to answer k such median queries. This improves previous algorithms by a logarithmic factor and matches a comparison lower bound for k = O(n). The space complexity of our simple algorithm is O(n log n) in the pointer-machine model, and O(n) in the RAM model. In the latter model, a more involved O(n) space data structure can be constructed in O(n log n) time where the time per query is reduced to O(log n / log log n). We also give efficient dynamic variants of both data structures, achieving O(log 2 n) query time using O(n log n) space in the comparison model and O((log n / log log n) 2) query time using O(n log n / log log n) space in the RAM model, and show that in the cell-probe model, any data structure which supports updates in O(log O(1) n) time must have Ω(log n / log log n) query time.
Sune Tougaard-Andersen Efficient Chip Multi Processor Programming Programming a Multi-Core Processor
"... In this work a realistic machine model, the CMP-model, is investigated. This model captures the cache hierarchies on mainstream CMPs as well as the ways that these caches interacts. A parallel programming library for benchmarking is presented. The presented library introduces policy based scheduling ..."
Abstract
- Add to MetaCart
In this work a realistic machine model, the CMP-model, is investigated. This model captures the cache hierarchies on mainstream CMPs as well as the ways that these caches interacts. A parallel programming library for benchmarking is presented. The presented library introduces policy based scheduling allowing fair evaluation of scheduling algorithms. The parallel-depth-first scheduler theoretically perform better than the widely used work-stealing scheduler, in the CMP-cache model. Both schedulers are implemented in the benchmarking library. Efficient parallelizations of the well-known sequential sorting algorithms quicksort and multi-way mergesort are analysed based on the CMP-cache model. The parallel quicksort algorithm is based on a parallelization of the in-place sequential partitioning algorithm. The parallel multi-way mergesort is based on a f-way partitioning algorithm. The presented library is used to evaluate the two analysed parallel sorting algorithms using both the implemented schedulers. We find that efficient parallel

