Results 1 
7 of
7
On the architectural requirements for efficient execution of graph algorithms
 In Proc. 34th Int’l Conf. on Parallel Processing (ICPP
, 2005
"... Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, conti ..."
Abstract

Cited by 27 (10 self)
 Add to MetaCart
(Show Context)
Combinatorial problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous, concurrent accesses to global data structures with low degrees of locality. The hierarchical memory systems of symmetric multiprocessor (SMP) clusters optimize for local, contiguous memory accesses, and so are inefficient platforms for such algorithms. Few parallel graph algorithms outperform their best sequential implementation on SMP clusters due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two graph algorithms, list ranking and connected components, on two classes of sharedmemory computers: symmetric multiprocessors such as the Sun Enterprise servers and multithreaded architectures
Computational grand challenges in assembling the tree of life: Problems & solutions
 THE IEEE AND ACM SUPERCOMPUTING CONFERENCE 2005 (SC2005) TUTORIAL
, 2005
"... The computation of ever larger as well as more accurate phylogenetic (evolutionary) trees with the ultimate goal to compute the tree of life represents one of the grand challenges in High Performance Computing (HPC) Bioinformatics. Unfortunately, the size of trees which can be computed in reasonable ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
The computation of ever larger as well as more accurate phylogenetic (evolutionary) trees with the ultimate goal to compute the tree of life represents one of the grand challenges in High Performance Computing (HPC) Bioinformatics. Unfortunately, the size of trees which can be computed in reasonable time based on elaborate evolutionary models is limited by the severe computational cost inherent to these methods. There exist two orthogonal research directions to overcome this challenging computational burden: First, the development of novel, faster, and more accurate heuristic algorithms and second, the application of high performance computing techniques. The goal of this chapter is to provide a comprehensive introduction to the field of computational evolutionary biology to an audience with computing background, interested in participating in research and/or commercial applications of this field. Moreover, we will cover leadingedge technical and algorithmic developments in the field and discuss open problems and potential solutions.
An experimental study of parallel biconnected components algorithms on symmetric multiprocessors (SMPs)
 IN PROCEEDINGS OF THE 19TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS 2005
, 2005
"... We present an experimental study of parallel biconnected components algorithms employing several fundamental parallel primitives, e.g., prefix sum, list ranking, sorting, connectivity, spanning tree, and tree computations. Previous experimental studies of these primitives demonstrate reasonable para ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
We present an experimental study of parallel biconnected components algorithms employing several fundamental parallel primitives, e.g., prefix sum, list ranking, sorting, connectivity, spanning tree, and tree computations. Previous experimental studies of these primitives demonstrate reasonable parallel speedups. However, when these algorithms are used as subroutines to solve higherlevel problems, there are two factors that hinder fast parallel implementations. One is parallel overhead, i.e., the large constant factors hidden in the asymptotic bounds; the other is the discrepancy among the data structures used in the primitives that brings nonnegligible conversion cost. We present various optimization techniques and a new parallel algorithm that significantly improve the performance of finding biconnected components of a graph on symmetric multiprocessors (SMPs). Finding biconnected components has application in faulttolerant network design, and is also used in graph planarity testing. Our parallel implementation achieves speedups up to 4 using 12 processors on a Sun E4500 for large, sparse graphs, and the source code is freelyavailable at our web site
The STAPL pList
 22nd International Workshop on Languages and Compilers for Parallel Computing (LPCP 2009), LNCS 5898
, 2009
"... Major Subject: Computer Science iii The STAPL pList. (December 2010) Xiabing Xu, B.S., Jilin University, P.R.China Chair of Advisory Committee: Dr. Nancy M. Amato We present the design and implementation of the Standard Template Adaptive Parallel Library (stapl) pList, a parallel container that has ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Major Subject: Computer Science iii The STAPL pList. (December 2010) Xiabing Xu, B.S., Jilin University, P.R.China Chair of Advisory Committee: Dr. Nancy M. Amato We present the design and implementation of the Standard Template Adaptive Parallel Library (stapl) pList, a parallel container that has the properties of a sequential list, but allows for scalable concurrent access when used in a parallel program. The stapl is a parallel programming library that extends C++ with support for parallelism. stapl provides a collection of distributed data structures (pContainers) and parallel algorithms (pAlgorithms) and a generic methodology for extending them to provide customized functionality. stapl pContainers are threadsafe, concurrent objects, providing appropriate interfaces (pViews) that can be used by generic pAlgorithms.
Abstract
, 2004
"... Irregular problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous accesses to global data structures with low degrees of locality. Few parallel graph algorithms on distributed or sharedmemory machines can outperform their best sequential impleme ..."
Abstract
 Add to MetaCart
(Show Context)
Irregular problems such as those from graph theory pose serious challenges for parallel machines due to noncontiguous accesses to global data structures with low degrees of locality. Few parallel graph algorithms on distributed or sharedmemory machines can outperform their best sequential implementation due to long memory latencies and high synchronization costs. In this paper, we consider the performance and scalability of two important combinatorial algorithms, list ranking and connected components, on two types of sharedmemory computers: symmetric multiprocessors (SMP) such as the Sun Enterprise servers and multithreaded architectures (MTA) such as the Cray MTA2. Previous studies show that for SMPs performance is primarily a function of noncontiguous memory accesses, whereas for the MTA, it is primarily a function of the number of concurrent operations. We present a performance model for each machine, and use it to analyze the performance of the two algorithms. We compare the models for SMPs and the MTA and discuss how the difference affects algorithm development, ease of programming, performance, and scalability.
Chapter 5 PARALLEL ALGORITHM DESIGN FOR BRANCH AND BOUND
"... Abstract Large and/or computationally expensive optimization problems sometimes require parallel or highperformance computing systems to achieve reasonable running times. This chapter gives an introduction to parallel computing for those familiar with serial optimization. We present techniques to a ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Large and/or computationally expensive optimization problems sometimes require parallel or highperformance computing systems to achieve reasonable running times. This chapter gives an introduction to parallel computing for those familiar with serial optimization. We present techniques to assist the porting of serial optimization codes to parallel systems and discuss more fundamentally parallel approaches to optimization. We survey the stateoftheart in distributed and sharedmemory architectures and give an overview of the programming models appropriate for efficient algorithms on these platforms. As concrete examples, we discuss the design of parallel branchandbound algorithms for mixedinteger programming on a distributedmemory system, quadratic assignment problem on a grid architecture, and maximum parsimony in evolutionary trees on a sharedmemory system.
CERTIFICATE
, 2010
"... It is certified that the work contained in this thesis, titled “Exploring Irregular Memory Access Applications on the GPU ” by Mohammed Suhail Rehman, has been carried out under our supervision and is not submitted elsewhere for a degree. Date Advisor: Dr. P. J. Narayanan Advisor: Dr. Kishore Kotha ..."
Abstract
 Add to MetaCart
(Show Context)
It is certified that the work contained in this thesis, titled “Exploring Irregular Memory Access Applications on the GPU ” by Mohammed Suhail Rehman, has been carried out under our supervision and is not submitted elsewhere for a degree. Date Advisor: Dr. P. J. Narayanan Advisor: Dr. Kishore Kothapalli To my parents, for supporting my (un)healthy obsession with computers ' / 1Æz/¿Î 1Ë 1 '
0ÀÇv 1yÆ /´/ã /)L 0 2 /Ç
0Â1È /ÌÇ
0 / 2g
/Ơ I ́ //»v,ÌÈ 1Ǯ 1 12 /Æ0 ¿ /)K0Ò0 Ü /) /ÂÜ/Ç1 IØ¨À0Û “High above all is God, the King, the Truth! Be not in haste with the Qur’an before its revelation to thee is completed, but say, “O my Lord! advance me in knowledge””