Results 1 -
5 of
5
Visualization and performance prediction of multithreaded Solaris programs by tracing kernel threads
- Proc. 13th Int’l Parallel Processing Symp
, 1999
"... Efficient performance tuning of parallel programs is often hard. We present a performance prediction and visualization tool called VPPB. Based on a monitored uni-processor execution, VPPB shows the (predicted) behaviour of a multithreaded program using any number of processors and the program behavi ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
Efficient performance tuning of parallel programs is often hard. We present a performance prediction and visualization tool called VPPB. Based on a monitored uni-processor execution, VPPB shows the (predicted) behaviour of a multithreaded program using any number of processors and the program behaviour is visualized as a graph. The first version of VPPB was unable to handle I/O operations. This version has, by an improved tracing technique, added the possibility to trace activities at the kernel level as well. Thus, VPPB is now able to trace various I/O activities, e.g., manipulation of OS internal buffers, physical disk I/O, socket I/O, and RPC. VPPB allows flexible performance tuning of parallel programs developed for shared memory multiprocessors using a standardized environment; C/C++ programs that uses the thread package in Solaris 2.X. 1.
Performance Optimization Using Extended Critical Path Analysis in Multithreaded Programs on Multiprocessors
- Journal of Parallel and Distributed Computing
, 2000
"... this paper we present an approach to ..."
Multicore computing—the state of the art
, 2008
"... This document presents the current state of the art in multicore computing, in hardware and software, as well as ongoing activities, especially in Sweden. To a large extent, it draws on the presentations given at the Multicore Days 2008 organized by SICS, Swedish Multicore Initiative and Ericsson So ..."
Abstract
- Add to MetaCart
This document presents the current state of the art in multicore computing, in hardware and software, as well as ongoing activities, especially in Sweden. To a large extent, it draws on the presentations given at the Multicore Days 2008 organized by SICS, Swedish Multicore Initiative and Ericsson Software Research but the published literature and the experience of the authors has been equally important sources. It is clear that multicore processors will be with us for the foreseeable future; there seems to be no alternative way to provide substantial increases of microprocessor performance in the coming years. While processors with a few (2–8) cores are common today, this number is projected to grow as we enter the era of manycore computing. The road ahead for multicore and manycore hardware seems relatively clear, although some issues like the organization of the on-chip memory hierarchy remain to be settled. Multicore software is however much less mature, with fundamental questions of programming models, languages, tools and methodologies still outstanding. 1
A Tool for Binding Threads to Processors
"... Abstract. Many multiprocessor systems are based on distributed shared memory. It is often important to statically bind threads to processors in order to avoid remote memory access, due to performance. Finding a good allocation takes long time and it is hard to know when to stop searching for a bette ..."
Abstract
- Add to MetaCart
Abstract. Many multiprocessor systems are based on distributed shared memory. It is often important to statically bind threads to processors in order to avoid remote memory access, due to performance. Finding a good allocation takes long time and it is hard to know when to stop searching for a better one. It is sometimes impossible to run the application on the target machine. The developer needs a tool that finds the good allocations without the target multiprocessor. We present a tool that uses a greedy algorithm and produces allocations that are more than 40 % faster (in average) than when using a binpacking algorithm. The number of allocations to be evaluated can be reduced by 38 % with a 2 % performance loss. Finally, an algorithm is proposed that is promising in avoiding local maxima. 1

