Results 1  10
of
66
GloMoSim: A Library for Parallel Simulation of Largescale Wireless Networks
 in Workshop on Parallel and Distributed Simulation
, 1998
"... A number of librarybased parallel and sequential network simulators have been designed. This paper describes a library, called GloMoSim (for Global Mobile system Simulator), for parallel simulation of wireless networks. GloMoSim has been designed to be extensible and composable: the communication p ..."
Abstract

Cited by 512 (25 self)
 Add to MetaCart
A number of librarybased parallel and sequential network simulators have been designed. This paper describes a library, called GloMoSim (for Global Mobile system Simulator), for parallel simulation of wireless networks. GloMoSim has been designed to be extensible and composable: the communication protocol stack for wireless networks is divided into a set of layers, each with its own API. Models of protocols at one layer interact with those at a lower (or higher) layer only via these APIs. The modular implementation enables consistent comparison of multiple protocols at a given layer. The parallel implementation of GloMoSim can be executed using a variety of conservative synchronization protocols, which include the null message and conditional event algorithms. This paper describes the GloMoSim library, addresses a number of issues relevant to its parallelization, and presents a set of experimental results on the IBM 9076 SP, a distributed memory multicomputer. These experiments use mo...
LogGP: Incorporating Long Messages into the LogP Model  One step closer towards a realistic model for parallel computation
, 1995
"... We present a new model of parallel computationthe LogGP modeland use it to analyze a number of algorithms, most notably, the single node scatter (onetoall personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the comm ..."
Abstract

Cited by 235 (1 self)
 Add to MetaCart
We present a new model of parallel computationthe LogGP modeland use it to analyze a number of algorithms, most notably, the single node scatter (onetoall personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation [CKP + 93] which abstracts the communication of fixedsized short messages through the use of four parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P ). As evidenced by experimental data, the LogP model can accurately predict communication performance when only short messages are sent (as on the CM5) [CKP + 93, CDMS94]. However, many existing parallel machines have special support for long messages and achieve a much higher bandwidth for long messages compared to short messages (e.g., IBM SP2, Paragon, Meiko CS2, Ncube/2). We extend the basic LogP model with a linear model for long messages. This combination, which we call the LogGP model of parallel computation, has o...
Scalable Parallel Data Mining for Association Rules
, 1997
"... One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency of the occurrences of interesting subset of ..."
Abstract

Cited by 153 (14 self)
 Add to MetaCart
One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency of the occurrences of interesting subset of items (called candidates) in the database of transactions. To prune the exponentially large space of candidates, most existing algorithms, consider only those candidates that have a user defined minimum support. Even with the pruning, the task of finding all association rules requires a lot of computation power and time. Parallel computers offer a potential solution to the computation requirement of this task, provided efficient and scalable parallel algorithms can be designed. In this paper, we present two new parallel algorithms for mining association rules. The Intelligent Data Distribution algorithm efficiently uses aggregate memory of the parallel computer by employing intelligent candi...
Matrix Multiplication on Heterogeneous Platforms
, 2001
"... this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the ..."
Abstract

Cited by 36 (16 self)
 Add to MetaCart
this paper, we address the issue of implementing matrix multiplication on heterogeneous platforms. We target two different classes of heterogeneous computing resources: heterogeneous networks of workstations and collections of heterogeneous clusters. Intuitively, the problem is to load balance the work with different speed resources while minimizing the communication volume. We formally state this problem in a geometric framework and prove its NPcompleteness. Next, we introduce a (polynomial) columnbased heuristic, which turns out to be very satisfactory: We derive a theoretical performance guarantee for the heuristic and we assess its practical usefulness through MPI experiments
Pipelining broadcasts on heterogeneous platforms
, 2005
"... In this paper, we consider the communications involved by the execution of a complex application, deployed on a heterogeneous platform. Such applications extensively use macrocommunication schemes, for example, to broadcast data items. Rather than aiming at minimizing the execution time of a single ..."
Abstract

Cited by 33 (17 self)
 Add to MetaCart
In this paper, we consider the communications involved by the execution of a complex application, deployed on a heterogeneous platform. Such applications extensively use macrocommunication schemes, for example, to broadcast data items. Rather than aiming at minimizing the execution time of a single broadcast, we focus on the steadystate operation. We assume that there is a large number of messages to be broadcast in pipeline fashion, and we aim at maximizing the throughput, i.e., the (rational) number of messages which can be broadcast every timestep. We target heterogeneous platforms, modeled by a graph where resources have different communication and computation speeds. Achieving the best throughput may well require that the target platform is used in totality: We show that neither spanning trees nor DAGs are as powerful as general graphs. We show how to compute the best throughput using linear programming, and how to exhibit a periodic schedule, first when restricting to a DAG, and then when using a general graph. The polynomial compactness of the description comes from the decomposition of the schedule into several broadcast trees that are used concurrently to reach the best throughput. It is important to point out that a concrete scheduling algorithm based upon the steadystate operation is asymptotically optimal, in the class of all possible schedules (not only periodic solutions).
Determining the Execution Time Distribution for a Data Parallel Program in a Heterogeneous Computing Environment
, 1997
"... this paper. Section 2 presents the basic assumptions and a brief overview of the proposed approach. Methods for computing the execution time distribution of a single code block in either SIMD or SPMD mode are discussed in Section 3. The methods for computing the execution time distribution for the ..."
Abstract

Cited by 17 (12 self)
 Add to MetaCart
this paper. Section 2 presents the basic assumptions and a brief overview of the proposed approach. Methods for computing the execution time distribution of a single code block in either SIMD or SPMD mode are discussed in Section 3. The methods for computing the execution time distribution for the entire program executed in SPMD, SIMD, and mixedmode are introduced in Sections 4, 5, and 6, respectively. Section 7 presents a hypothetical numerical example and an application study to demonstrate the effect of mode selections on the distribution of total execution time. The Appendix reviews the basic probability theory and notation used here
A System For FaultTolerant Execution of Data and Compute Intensive Programs Over a Network Of Workstations
, 1996
"... The bag of tasks structure permits dynamic partitioning for a wide class of parallel applications. This paper describes a faulttolerant implementation of this structure using atomic actions (atomic transactions) to operate on persistent objects, which are accessed in a distributed setting via a Rem ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
The bag of tasks structure permits dynamic partitioning for a wide class of parallel applications. This paper describes a faulttolerant implementation of this structure using atomic actions (atomic transactions) to operate on persistent objects, which are accessed in a distributed setting via a Remote Procedure Call (RPC). The system is suited to parallel execution of data and compute intensive programs that require persistent storage and fault tolerance, and runs on stock hardware and software platforms, unix, C++. Its suitability is examined in the context of the measured performance of three applications; ray tracing, matrix multiplication and Cholesky factorization. 1 Introduction Many computations manipulate very large amounts of data. Matrix calculations represent one example class. In a Massively Parallel Processor (MPP) such a vast data set is typically partitioned statically between the very many distributed processing elements and moved amongst them as necessary to perform ...
Efficiency of SharedMemory Multiprocessors for a Genetic Sequence Similarity Search Algorithm
, 1997
"... Molecular biologists who conduct largescale genetic sequencing projects are producing an everincreasing amount of sequence data. GenBank, the primary repository for DNA sequence data is doubling in size every 1.3 years. Keeping pace with the analysis of this data is a difficult task. One of the mo ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
Molecular biologists who conduct largescale genetic sequencing projects are producing an everincreasing amount of sequence data. GenBank, the primary repository for DNA sequence data is doubling in size every 1.3 years. Keeping pace with the analysis of this data is a difficult task. One of the most successful techniques for analyzing genetic data is sequence similarity analysisthe comparison of unknown sequences against known sequences kept in databases. As biologists gather more sequence data, sequence similarity algorithms are more and more useful, but take longer and longer to run. BLAST is one of the most popular sequence similarity algorithms in use today, but its running time is proportional to the size of the database. Sequence similarity analysis using BLAST is becoming a bottleneck. SharedMemory Multiprocessors (SMPs) may offer performance that scales with the growth of the genetic databases. This paper analyzes the performance of BLAST on SMPs, to improve our theoretic...
Resource Deadlocks and Performance of Wormhole Multicast Routing Algorithms
 IEEE Trans. Parallel and Distributed Systems
, 1998
"... We show that deadlocks due to dependencies on consumption channels are a fundamental problem in wormhole multicast routing. This type of resource deadlocks has not been addressed in many previously proposed wormhole multicast algorithms. We also show that deadlocks on consumption channels can be a ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
We show that deadlocks due to dependencies on consumption channels are a fundamental problem in wormhole multicast routing. This type of resource deadlocks has not been addressed in many previously proposed wormhole multicast algorithms. We also show that deadlocks on consumption channels can be avoided by using multiple classes of consumption channels and restricting the use of consumption channels by multicast messages. We provide upper bounds for the number of consumption channels required to avoid deadlocks. In addition, we present a new multicast routing algorithm, columnpath, which is based on the wellknown dimensionorder routing used in many multicomputers and multiprocessors. Therefore, this algorithm could be implemented in existing multicomputers with simple changes to the hardware. Using simulations, we compare the performance of the proposed columnpath algorithm with the previously proposed Hamiltonianpathbased multipath and an ecubebased multicast routing a...