Results 1 - 10
of
58
GloMoSim: A Library for Parallel Simulation of Large-scale Wireless Networks
- in Workshop on Parallel and Distributed Simulation
, 1998
"... A number of library-based parallel and sequential network simulators have been designed. This paper describes a library, called GloMoSim (for Global Mobile system Simulator), for parallel simulation of wireless networks. GloMoSim has been designed to be extensible and composable: the communication p ..."
Abstract
-
Cited by 429 (24 self)
- Add to MetaCart
A number of library-based parallel and sequential network simulators have been designed. This paper describes a library, called GloMoSim (for Global Mobile system Simulator), for parallel simulation of wireless networks. GloMoSim has been designed to be extensible and composable: the communication protocol stack for wireless networks is divided into a set of layers, each with its own API. Models of protocols at one layer interact with those at a lower (or higher) layer only via these APIs. The modular implementation enables consistent comparison of multiple protocols at a given layer. The parallel implementation of GloMoSim can be executed using a variety of conservative synchronization protocols, which include the null message and conditional event algorithms. This paper describes the GloMoSim library, addresses a number of issues relevant to its parallelization, and presents a set of experimental results on the IBM 9076 SP, a distributed memory multicomputer. These experiments use mo...
LogGP: Incorporating Long Messages into the LogP Model - One step closer towards a realistic model for parallel computation
, 1995
"... We present a new model of parallel computation---the LogGP model---and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the comm ..."
Abstract
-
Cited by 204 (1 self)
- Add to MetaCart
We present a new model of parallel computation---the LogGP model---and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the communication of fixed-sized short messages through the use of four parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P ). As evidenced by experimental data, the LogP model can accurately predict communication performance when only short messages are sent (as on the CM-5) [CKP + 93, CDMS94]. However, many existing parallel machines have special support for long messages and achieve a much higher bandwidth for long messages compared to short messages (e.g., IBM SP-2, Paragon, Meiko CS-2, Ncube/2). We extend the basic LogP model with a linear model for long messages. This combination, which we call the LogGP model of parallel computation, has o...
Scalable Parallel Data Mining for Association Rules
, 1997
"... One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency of the occurrences of interesting subset of ..."
Abstract
-
Cited by 134 (11 self)
- Add to MetaCart
One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency of the occurrences of interesting subset of items (called candidates) in the database of transactions. To prune the exponentially large space of candidates, most existing algorithms, consider only those candidates that have a user defined minimum support. Even with the pruning, the task of finding all association rules requires a lot of computation power and time. Parallel computers offer a potential solution to the computation requirement of this task, provided efficient and scalable parallel algorithms can be designed. In this paper, we present two new parallel algorithms for mining association rules. The Intelligent Data Distribution algorithm efficiently uses aggregate memory of the parallel computer by employing intelligent candi...
Matrix Multiplication on Heterogeneous Platforms
, 2001
"... this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the ..."
Abstract
-
Cited by 35 (19 self)
- Add to MetaCart
this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different speed resources while minimizing the communication volume. We formally state this problem in a geometric framework and prove its NP-completeness. Next, we introduce a (polynomial) column-based heuristic, which turns out to be very satisfactory: We derive a theoretical performance guarantee for the heuristic and we assess its practical usefulness through MPI experiments
Algorithmic redistribution methods for block cyclic decompositions
- IEEE Trans. on PDS
, 1996
"... ii To my parents iii Acknowledgments The writer expresses gratitude and appreciation to the members of his disser-tation committee, Michael Berry, Charles Collins, Jack Dongarra, Mark Jones and David Walker for their encouragement and participation throughout my doctoral experience. Special apprecia ..."
Abstract
-
Cited by 22 (2 self)
- Add to MetaCart
ii To my parents iii Acknowledgments The writer expresses gratitude and appreciation to the members of his disser-tation committee, Michael Berry, Charles Collins, Jack Dongarra, Mark Jones and David Walker for their encouragement and participation throughout my doctoral experience. Special appreciation is due to Professor Jack Dongarra, Chairman, who pro-vided sound guidance, support and appropriate commentaries during the course of my graduate study. I also would like to thank Yves Robert and R. Clint Whaley for many useful and instructive discussions on general parallel algorithms and message passing software libraries. Many valuable comments for improving the presentation of this document were received from L. Susan Blackford. Finally, I am grateful to the Department of Computer Science at the University ofTennessee for allowing me to do this doctoral research work here. A special debt of gratitude is owed to Joanne Martin, IBM POWERparallel Division, for awarding me an IBM Corporation Fellowship covering the tuition as well as a stipend for the 1994-96 academic years. This work was also supported
A System For Fault-Tolerant Execution of Data and Compute Intensive Programs Over a Network Of Workstations
, 1996
"... The bag of tasks structure permits dynamic partitioning for a wide class of parallel applications. This paper describes a fault-tolerant implementation of this structure using atomic actions (atomic transactions) to operate on persistent objects, which are accessed in a distributed setting via a Rem ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
The bag of tasks structure permits dynamic partitioning for a wide class of parallel applications. This paper describes a fault-tolerant implementation of this structure using atomic actions (atomic transactions) to operate on persistent objects, which are accessed in a distributed setting via a Remote Procedure Call (RPC). The system is suited to parallel execution of data and compute intensive programs that require persistent storage and fault tolerance, and runs on stock hardware and software platforms, unix, C++. Its suitability is examined in the context of the measured performance of three applications; ray tracing, matrix multiplication and Cholesky factorization. 1 Introduction Many computations manipulate very large amounts of data. Matrix calculations represent one example class. In a Massively Parallel Processor (MPP) such a vast data set is typically partitioned statically between the very many distributed processing elements and moved amongst them as necessary to perform ...
Efficiency of Shared-Memory Multiprocessors for a Genetic Sequence Similarity Search Algorithm
, 1997
"... Molecular biologists who conduct large-scale genetic sequencing projects are producing an ever-increasing amount of sequence data. GenBank, the primary repository for DNA sequence data is doubling in size every 1.3 years. Keeping pace with the analysis of this data is a difficult task. One of the mo ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Molecular biologists who conduct large-scale genetic sequencing projects are producing an ever-increasing amount of sequence data. GenBank, the primary repository for DNA sequence data is doubling in size every 1.3 years. Keeping pace with the analysis of this data is a difficult task. One of the most successful techniques for analyzing genetic data is sequence similarity analysis---the comparison of unknown sequences against known sequences kept in databases. As biologists gather more sequence data, sequence similarity algorithms are more and more useful, but take longer and longer to run. BLAST is one of the most popular sequence similarity algorithms in use today, but its running time is proportional to the size of the database. Sequence similarity analysis using BLAST is becoming a bottleneck. Shared-Memory Multiprocessors (SMPs) may offer performance that scales with the growth of the genetic databases. This paper analyzes the performance of BLAST on SMPs, to improve our theoretic...
Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms
- IEEE Trans. Parallel and Distributed Systems
, 1998
"... We show that deadlocks due to dependencies on consumption channels are a fundamental problem in wormhole multicast routing. This type of resource deadlocks has not been addressed in many previously proposed wormhole multicast algorithms. We also show that deadlocks on consumption channels can be a ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
We show that deadlocks due to dependencies on consumption channels are a fundamental problem in wormhole multicast routing. This type of resource deadlocks has not been addressed in many previously proposed wormhole multicast algorithms. We also show that deadlocks on consumption channels can be avoided by using multiple classes of consumption channels and restricting the use of consumption channels by multicast messages. We provide upper bounds for the number of consumption channels required to avoid deadlocks. In addition, we present a new multicast routing algorithm, column-path, which is based on the well-known dimension-order routing used in many multicomputers and multiprocessors. Therefore, this algorithm could be implemented in existing multicomputers with simple changes to the hardware. Using simulations, we compare the performance of the proposed column-path algorithm with the previously proposed Hamiltonian-path-based multipath and an e-cube-based multicast routing a...
Determining the Execution Time Distribution for a Data Parallel Program in a Heterogeneous Computing Environment
, 1997
"... this paper. Section 2 presents the basic assumptions and a brief overview of the proposed approach. Methods for computing the execution time distribution of a single code block in either SIMD or SPMD mode are discussed in Section 3. The methods for computing the execution time distribution for the ..."
Abstract
-
Cited by 12 (10 self)
- Add to MetaCart
this paper. Section 2 presents the basic assumptions and a brief overview of the proposed approach. Methods for computing the execution time distribution of a single code block in either SIMD or SPMD mode are discussed in Section 3. The methods for computing the execution time distribution for the entire program executed in SPMD, SIMD, and mixed-mode are introduced in Sections 4, 5, and 6, respectively. Section 7 presents a hypothetical numerical example and an application study to demonstrate the effect of mode selections on the distribution of total execution time. The Appendix reviews the basic probability theory and notation used here
Predicting Multiprocessor Memory Access Patterns with Learning Models
, 1997
"... Machine learning techniques are applicable to computer system optimization. We show that shared memory multiprocessors can successfully utilize machine learning algorithms for memory access pattern prediction. In particular three different on-line machine learning prediction techniques were tested t ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Machine learning techniques are applicable to computer system optimization. We show that shared memory multiprocessors can successfully utilize machine learning algorithms for memory access pattern prediction. In particular three different on-line machine learning prediction techniques were tested to learn and predict repetitive memory access patterns for three typical parallel processing applications, the 2-D relaxation algorithm, matrix multiply and Fast Fourier Transform on a shared memory multiprocessor. The predictions were then used by a routing control algorithm to reduce control latency in the interconnection network by configuring the interconnection network to provide needed memory access paths before they were requested. Three trainable prediction techniques were used and tested: 1). a Markov predictor, 2). a linear predictor and 3). a time delay neural network (TDNN) predictor. Different predictors performed best on different applications, but the TDNN produced uniformly go...

