Results 1 
9 of
9
MCSTL: The MultiCore Standard Template Library
"... Abstract. 1 Future gain in computing performance will not stem from increased clock rates, but from even more cores in a processor. Since automatic parallelization is still limited to easily parallelizable sections of the code, most applications will soon have to support parallelism explicitly. The ..."
Abstract

Cited by 26 (9 self)
 Add to MetaCart
(Show Context)
Abstract. 1 Future gain in computing performance will not stem from increased clock rates, but from even more cores in a processor. Since automatic parallelization is still limited to easily parallelizable sections of the code, most applications will soon have to support parallelism explicitly. The MultiCore Standard Template Library (MCSTL) simplifies parallelization by providing efficient parallel implementations of the algorithms in the C++ Standard Template Library. Thus, simple recompilation will provide partial parallelization of applications that make consistent use of the STL. We present performance measurements on several architectures. For example, our sorter achieves a speedup of 21 on an 8core 32thread SUN T1. 1
Speeding up External Mergesort
 IEEE Transactions on Knowledge and Data Engineering
"... External mergesort is normally implemented so that each run is stored contiguously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Int ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
External mergesort is normally implemented so that each run is stored contiguously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Interleaved layout places blocks from different runs in consecutive disk addresses. This is done in the hope that interleaving will reduce seek overhead during merging. The new reading strategy precomputes the order in which data blocks are to be read according to where they are located on disk and when they are needed for merging. Extra buffer space makes it possible to read blocks in an order that reduces seek overhead, instead of reading them exactly in the order they are needed for merging. A detailed simulation model was used to compare the two layout strategies and three reading strategies. The effects of using multiple work disks were also investigated. We found that, in most cases, inte...
Towards Optimal Range Medians ⋆
"... Abstract. We consider the following problem: Given an unsorted array of n elements, and a sequence of intervals in the array, compute the median in each of the subarrays defined by the intervals. We describe a simple algorithm which needs O(n log k + k log n) time to answer k such median queries. Th ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We consider the following problem: Given an unsorted array of n elements, and a sequence of intervals in the array, compute the median in each of the subarrays defined by the intervals. We describe a simple algorithm which needs O(n log k + k log n) time to answer k such median queries. This improves previous algorithms by a logarithmic factor and matches a comparison lower bound for k = O(n). The space complexity of our simple algorithm is O(n log n) in the pointermachine model, and O(n) in the RAM model. In the latter model, a more involved O(n) space data structure can be constructed in O(n log n) time where the time per query is reduced to O(log n / log log n). We also give efficient dynamic variants of both data structures, achieving O(log 2 n) query time using O(n log n) space in the comparison model and O((log n / log log n) 2) query time using O(n log n / log log n) space in the RAM model, and show that in the cellprobe model, any data structure which supports updates in O(log O(1) n) time must have Ω(log n / log log n) query time.
MCSTL: the multicore standard template library
 In PPoPP '07: Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
, 2007
"... To benefit from the increased power of current and upcoming multicore processors, programs have to exploit parallelism. This now becomes mandatory not just for a selected number of specialized programs, but for all nontrivial applications. This is a problem ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
To benefit from the increased power of current and upcoming multicore processors, programs have to exploit parallelism. This now becomes mandatory not just for a selected number of specialized programs, but for all nontrivial applications. This is a problem
ARPN Journal of Systems and Software
"... In recent times, developments in wide range of knowledge sectors have experienced an unprecedented growth of data and information. This makes increasing demand for mechanisms for processing these high volumes of data. These mechanisms often employ different sorting algorithms. In the literature, the ..."
Abstract
 Add to MetaCart
In recent times, developments in wide range of knowledge sectors have experienced an unprecedented growth of data and information. This makes increasing demand for mechanisms for processing these high volumes of data. These mechanisms often employ different sorting algorithms. In the literature, there exist numerous implementation solutions for sorting. The choice of choosing these techniques to be used in the implementation of such mechanisms is now becoming an ongoing research issue. Thus, the aim of this study is to evaluate sorting techniques in the solution space based on CPU time and memory consumption as a performance index. To realize this, we carried out an extensive review of related works. The knowledge acquired from the literature was used to formulate an architectural model. We implemented the architecture in Clanguage and the performance of bubble sort, insertion sort, and selection sort techniques was evaluated using the GNUprofiler. Experimental results show that insertion sort technique is the most efficient, while bubble sort technique is the most inefficient in all test cases for CPU time and memory consumption.
Synchronization, Coherence, and Consistency for High Performance SharedMemory Multiprocessing
, 1992
"... Although improved device technology has increased the performance of computer systems, fundamental hardware limitations and the need to build faster systems using existing technology have led many computer system designers to consider parallel designs with multiple computing elements. Unfortunately, ..."
Abstract
 Add to MetaCart
(Show Context)
Although improved device technology has increased the performance of computer systems, fundamental hardware limitations and the need to build faster systems using existing technology have led many computer system designers to consider parallel designs with multiple computing elements. Unfortunately, the design of efficient and scalable multiprocessors has proven to be an elusive goal. This dissertation describes a hierarchical busbased multiprocessor architecture, an adaptive cache coherence protocol, and efficient and simple synchronization support that together meet this challenge. We have also developed an executiondriven tool for the simulation of sharedmemory multiprocessors, which we use to evaluate the proposed architectural enhancements. Our simulator offers substantial advantages in terms of reduced time and space overheads when compared to instructiondriven or tracedriven simulation techniques, without significant loss of accuracy. The simulator generates correctly inter...
Efficient Chip Multi Processor Programming  Programming a MultiCore Processor
, 2011
"... In this work a realistic machine model, the CMPmodel, is investigated. This model captures the cache hierarchies on mainstream CMPs as well as the ways that these caches interacts. A parallel programming library for benchmarking is presented. The presented library introduces policy based scheduling ..."
Abstract
 Add to MetaCart
In this work a realistic machine model, the CMPmodel, is investigated. This model captures the cache hierarchies on mainstream CMPs as well as the ways that these caches interacts. A parallel programming library for benchmarking is presented. The presented library introduces policy based scheduling allowing fair evaluation of scheduling algorithms. The paralleldepthfirst scheduler theoretically perform better than the widely used workstealing scheduler, in the CMPcache model. Both schedulers are implemented in the benchmarking library. Efficient parallelizations of the wellknown sequential sorting algorithms quicksort and multiway mergesort are analysed based on the CMPcache model. The parallel quicksort algorithm is based on a parallelization of the inplace sequential partitioning algorithm. The parallel multiway mergesort is based on a fway partitioning algorithm. The presented library is used to evaluate the two analysed parallel sorting algorithms using both the implemented schedulers. We find that efficient parallel
A Comparative Study of Sorting Algorithms
"... The study presents a comparative study of some sorting algorithm with the aim to come up with the most efficient sorting algorithm. The methodology used was to evaluate the performance of median, heap, and quick sort techniques using CPU time and memory space as performance index. This was achieved ..."
Abstract
 Add to MetaCart
The study presents a comparative study of some sorting algorithm with the aim to come up with the most efficient sorting algorithm. The methodology used was to evaluate the performance of median, heap, and quick sort techniques using CPU time and memory space as performance index. This was achieved by reviewing literatures of relevant works. We also formulated architectural model which serves as guideline for implementing and evaluating the sorting techniques. The techniques were implemented with Clanguage; while the profile of each technique was obtained with Gprofiler. The results obtained show that in majority of the cases considered, heap sort technique is faster and requires less space than median and quick sort algorithms in sorting data of any input data size. Similarly, results also show that the slowest technique of the three is median sort; while quick sort technique is faster and requires less memory than median sort, but slower and requires more memory than heap sort. The number of sorting algorithms considered in this study for complexity measurement is limited to bubble, insertion, and selection sorting. Future effort will investigate complexities of other sorting techniques in the literature based on CPU time and memory space. The goal for this will be to adopt the most efficient sorting technique in the development of job scheduler for grid computing community.