Results 1 - 10
of
10
Performance Models for the Processor Farm Paradigm
- IEEE Transactions on Parallel and Distributed Systems
, 1997
"... In this paper, we describe the design, implementation, and modeling of a runtime kernel to support the processor farm paradigm on multicomputers. We present a general topology-independent framework for obtaining performance models to predict the performance of the start-up, steady-state, and wind- ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
In this paper, we describe the design, implementation, and modeling of a runtime kernel to support the processor farm paradigm on multicomputers. We present a general topology-independent framework for obtaining performance models to predict the performance of the start-up, steady-state, and wind-down phases of a processor farm. An algorithm is described, which for any interconnection network determines a tree-structured subnetwork that optimizes farm performance. The analysis technique is applied to the important case of k-ary tree topologies. The models are compared with the measured performance on a variety of topologies using both constant and varied task sizes. Index Terms---Parallel programming paradigms, performance evaluation, processor farm, tree networks, message passing architecture, network flow, master-slave. ------------------------------ F ------------------------------ 1I NTRODUCTION HE major problems in parallel computation revolve around questions of ease of...
Performance Prediction and Scheduling for Parallel Applications on Multi-User Clusters
, 1998
"... ..."
Babylon V2.0: Support for Distributed, Parallel and Mobile Java Applications
"... This thesis describes the design and implementation of Babylon v2.0. Babylon v2.0 is a 100% Java compatible framework for building parallel, distributed and mobile applications in Java. Babylon v2.0 incorporates features like object migration, asynchronous method invocation and remote class loading ..."
Abstract
-
Cited by 6 (5 self)
- Add to MetaCart
This thesis describes the design and implementation of Babylon v2.0. Babylon v2.0 is a 100% Java compatible framework for building parallel, distributed and mobile applications in Java. Babylon v2.0 incorporates features like object migration, asynchronous method invocation and remote class loading while providing an easy-to-use interface that enables seamless interaction with remote objects and hides the complexities of remote messaging protocols that are normally large part of distributed systems programming. The potential cluster computing benefits of Babylon v2.0 are demonstrated by the evaluation results which show that sequential Java applications can achieve significant performance gains by using Babylon v2.0 to parallelize their work across a cluster of workstations. Intuitive interfaces, ease of use, support for multiple simultaneous users, and services and features that facilitate the development and administration of distributed systems make Babylon v2.0 a unique and powerful system for distributed systems programmers.
in press) Parallel computation in econometrics: a simplified approach
- Handbook on Parallel Computing and Statistics
, 2005
"... Abstract Parallel computation has a long history in econometric computing, but is not at all wide spread. We believe that a major impediment is the labour cost of coding for parallel architectures. Moreover, programs for specific hardware often become obsolete quite quickly. Our approach is to take ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract Parallel computation has a long history in econometric computing, but is not at all wide spread. We believe that a major impediment is the labour cost of coding for parallel architectures. Moreover, programs for specific hardware often become obsolete quite quickly. Our approach is to take a popular matrix programming language (Ox), and implement a message-passing interface using MPI. Next, object-oriented programming allows us to hide the specific parallelization code, so that a program does not need to be rewritten when it is ported from the desktop to a distributed network of computers. Our focus is on so-called embarrassingly parallel computations, and we address the issue of parallel random number generation. Keywords: Code optimization; Econometrics; High-performance computing; Matrix-programming language; Monte Carlo; MPI; Ox; Parallel computing; Random number generation.
Refinement of herpesvirus B-capsid structure on parallel supercomputers
- Biophys. J
, 1998
"... ABSTRACT Electron cryomicroscopy and icosahedral reconstruction are used to obtain the three-dimensional structure of the 1250-Å-diameter herpesvirus B-capsid. The centers and orientations of particles in focal pairs of 400-kV, spot-scan micrographs are determined and iteratively refined by common-l ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
ABSTRACT Electron cryomicroscopy and icosahedral reconstruction are used to obtain the three-dimensional structure of the 1250-Å-diameter herpesvirus B-capsid. The centers and orientations of particles in focal pairs of 400-kV, spot-scan micrographs are determined and iteratively refined by common-lines-based local and global refinement procedures. We describe the rationale behind choosing shared-memory multiprocessor computers for executing the global refinement, which is the most computationally intensive step in the reconstruction procedure. This refinement has been implemented on three different shared-memory supercomputers. The speedup and efficiency are evaluated by using test data sets with different numbers of particles and processors. Using this parallel refinement program, we refine the herpesvirus B-capsid from 355-particle images to 13-Å resolution. The map shows new structural features and interactions of the protein subunits in the three distinct morphological units: penton, hexon, and triplex of this T � 16 icosahedral particle.
Implementing Scoped Behaviour for Flexible Distributed Data Sharing
- IEEE Concurrency
, 2000
"... Distributed-memory hardware platforms, such as a network of workstations, are attractive because of their ubiquitousness and good price-performance. However, there are high communications overheads associated with sharing data between distributed memories. While message-passing programming systems ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Distributed-memory hardware platforms, such as a network of workstations, are attractive because of their ubiquitousness and good price-performance. However, there are high communications overheads associated with sharing data between distributed memories. While message-passing programming systems provide the greatest low-level flexibility to optimize the overheads, shared-data systems provide a higher level of abstraction. Ideally, one would like to have both a high level of abstraction and the flexibility to optimize a program for each data-sharing pattern and for each portion of the source code (i.e., context), such as a particular loop or phase. A novel technique to support this form of optimization flexibility is scoped behaviour. In the Aurora distributed shared data system, the programmer instantiates shared-data objects and uses scoped behaviour to incrementally tune applications on a per-object and per-context basis. We detail how a class library implements shared-da...
Threshold Counters with Increments and Decrements
, 1999
"... A threshold counter is a shared data structure that assumes integer v ues. It provides two operations: Increment changes the current counter val from v to v #1, while Read returns the value #v#w#,wherev is the curre counter value and w is a fixed constant. Thus, the Read operation retur the "approxi ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A threshold counter is a shared data structure that assumes integer v ues. It provides two operations: Increment changes the current counter val from v to v #1, while Read returns the value #v#w#,wherev is the curre counter value and w is a fixed constant. Thus, the Read operation retur the "approximate" value of the counter to within the constant w.Thresho counters have many potential uses, including software barrier synchroniz tion. Threshold networks are a class of distributed data structures that c be used to construct highly-concurrent, low-contention implementations shared threshold counters. In this paper, we give the first proof that a threshold network construction of a threshold counter can be extended support a Decrement operation that changes the counter value from v v# 1. Keywords Distributed Computing, Threshold Counters, Threshold and Weak Thr Networks, Increments, Decrements # A preliminary version of this work appears in the Proceedings of the 6th International C on Structural Information and Communication Complexity, pp. 47--61, Proceedings in Info Carleton Scientific, Lacanau, France, June/July 1999. This work has been supported by DMS-9505949. 1 2 Accepted to Theoretical Computer S 1
Tuple Counting Data Flow Analysis and its Use in Communication Optimization
"... Abstract. Tuplespace provides parallel programmers with an abstraction that hides the speci c underlying architecture, allowing the architecture to be any number of platforms ranging from shared or distributed memory to a cluster of workstations. Unfortunately, any abstraction of this kind necessari ..."
Abstract
- Add to MetaCart
Abstract. Tuplespace provides parallel programmers with an abstraction that hides the speci c underlying architecture, allowing the architecture to be any number of platforms ranging from shared or distributed memory to a cluster of workstations. Unfortunately, any abstraction of this kind necessarily introduces a trade-o for the application programmer between ease-of-use and control over performance. This paper presents adata ow analysis framework which plays a key role in identifying opportunities for communication optimization in tuplespace parallel programs. The enabled optimizations are particularly important for implementations on distributed memory multiprocessors and cluster environments where tuplespace acts as a structured distributed shared memory abstraction and communication overhead is high. 1
Automatic Generation of Parallelized Programs for Stateless Parallel Processing
, 2003
"... This paper presents two topics: ..."
Ubiquitous Multicore (UM) Methodology for Multimedia 1
"... For more than a decade now, multimedia developers have usually “ride the waves”, so to speak, with the coming of each generation of microprocessors, which allows their applications, designs and programs to usually running more proficiently, efficiently and effectively. This so-called ‘free ’ ride se ..."
Abstract
- Add to MetaCart
For more than a decade now, multimedia developers have usually “ride the waves”, so to speak, with the coming of each generation of microprocessors, which allows their applications, designs and programs to usually running more proficiently, efficiently and effectively. This so-called ‘free ’ ride seems to be coming to an end, with results of increases clock speeds, the widening of the gap in processor and memory performance, and the tradeoffs that are needed to meet the former two points, with the new multi-core systems. In this paper, we build upon our previous work within multi-core systems, by proposing a ubiquitous multi-core (UM) design. The goal of such a framework is help researchers to plan and implement their multimedia applications so they can take advantage of speed up computations of multi-core systems and allow real-time multimedia. As our experiments show, our UM system increases performance speeds at an average of 100%, with the average execution cost of 1.4ms, showing that multimedia can use multi-core resources efficiently and effectively. Keywords: Multi-core, Multimedia, Ubiquitous Multi-core Framework 1.

