Results 11  20
of
105
Refinement of herpesvirus Bcapsid structure on parallel supercomputers
 Biophys. J
, 1998
"... ABSTRACT Electron cryomicroscopy and icosahedral reconstruction are used to obtain the threedimensional structure of the 1250Ådiameter herpesvirus Bcapsid. The centers and orientations of particles in focal pairs of 400kV, spotscan micrographs are determined and iteratively refined by commonl ..."
Abstract

Cited by 18 (10 self)
 Add to MetaCart
(Show Context)
ABSTRACT Electron cryomicroscopy and icosahedral reconstruction are used to obtain the threedimensional structure of the 1250Ådiameter herpesvirus Bcapsid. The centers and orientations of particles in focal pairs of 400kV, spotscan micrographs are determined and iteratively refined by commonlinesbased local and global refinement procedures. We describe the rationale behind choosing sharedmemory multiprocessor computers for executing the global refinement, which is the most computationally intensive step in the reconstruction procedure. This refinement has been implemented on three different sharedmemory supercomputers. The speedup and efficiency are evaluated by using test data sets with different numbers of particles and processors. Using this parallel refinement program, we refine the herpesvirus Bcapsid from 355particle images to 13Å resolution. The map shows new structural features and interactions of the protein subunits in the three distinct morphological units: penton, hexon, and triplex of this T � 16 icosahedral particle.
A System For FaultTolerant Execution of Data and Compute Intensive Programs Over a Network Of Workstations
, 1996
"... The bag of tasks structure permits dynamic partitioning for a wide class of parallel applications. This paper describes a faulttolerant implementation of this structure using atomic actions (atomic transactions) to operate on persistent objects, which are accessed in a distributed setting via a Rem ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
The bag of tasks structure permits dynamic partitioning for a wide class of parallel applications. This paper describes a faulttolerant implementation of this structure using atomic actions (atomic transactions) to operate on persistent objects, which are accessed in a distributed setting via a Remote Procedure Call (RPC). The system is suited to parallel execution of data and compute intensive programs that require persistent storage and fault tolerance, and runs on stock hardware and software platforms, unix, C++. Its suitability is examined in the context of the measured performance of three applications; ray tracing, matrix multiplication and Cholesky factorization. 1 Introduction Many computations manipulate very large amounts of data. Matrix calculations represent one example class. In a Massively Parallel Processor (MPP) such a vast data set is typically partitioned statically between the very many distributed processing elements and moved amongst them as necessary to perform ...
Predicting Multiprocessor Memory Access Patterns with Learning Models
, 1997
"... Machine learning techniques are applicable to computer system optimization. We show that shared memory multiprocessors can successfully utilize machine learning algorithms for memory access pattern prediction. In particular three different online machine learning prediction techniques were tested t ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
Machine learning techniques are applicable to computer system optimization. We show that shared memory multiprocessors can successfully utilize machine learning algorithms for memory access pattern prediction. In particular three different online machine learning prediction techniques were tested to learn and predict repetitive memory access patterns for three typical parallel processing applications, the 2D relaxation algorithm, matrix multiply and Fast Fourier Transform on a shared memory multiprocessor. The predictions were then used by a routing control algorithm to reduce control latency in the interconnection network by configuring the interconnection network to provide needed memory access paths before they were requested. Three trainable prediction techniques were used and tested: 1). a Markov predictor, 2). a linear predictor and 3). a time delay neural network (TDNN) predictor. Different predictors performed best on different applications, but the TDNN produced uniformly go...
Automated Performance Prediction for Scalable Parallel Computing
 PARALLEL COMPUTING
, 1997
"... Performance prediction is necessary in order to deal with multidimensional performance effects on parallel systems. The compilergenerated analytical model developed in this paper accounts for the effects of cache behavior, CPU execution time and message passing overhead for real programs writte ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Performance prediction is necessary in order to deal with multidimensional performance effects on parallel systems. The compilergenerated analytical model developed in this paper accounts for the effects of cache behavior, CPU execution time and message passing overhead for real programs written in high level dataparallel languages. The performance prediction technique is shown to be effective in analyzing several nontrivial dataparallel applications as the problem size and number of processors vary. We leverage technology from the Maple symbolic manipulation system and the SPLUS statistical package in order to present users with critical performance information necessary for performance debugging, architectural enhancement and procurement of parallel systems. The usability of these results is improved through specifying confidence intervals as well as predicted execution times for dataparallel applications.
Performance evaluation of a multirobot search & retrieval system: Experiences with MinDART
 Journal of Intelligent and Robotic Systems
, 2003
"... Swarm techniques, where many simple robots are used instead of complex ones for performing a task, promise to reduce the cost of developing robot teams for many application domains. The challenge lies in selecting an appropriate control strategy for the individual units. This work explores the effec ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Swarm techniques, where many simple robots are used instead of complex ones for performing a task, promise to reduce the cost of developing robot teams for many application domains. The challenge lies in selecting an appropriate control strategy for the individual units. This work explores the effect of different control strategies of varying complexity and of various environmental factors on performance of a team of robots at a foraging task when using physical robots (the Minnesota Distributed Autonomous Robotic Team). Specifically we study the effect of localization and of simple communication techniques on task completion time using two sets of foraging experiments. We also present results for task performance with varying team sizes and target distribution. As indicated by the results, control strategies with increasing complexity reduce the variance in the performance, but do not always reduce the time to complete the task. 1
MSP: a class of parallel multistep successive sparse approximate inverse preconditioning strategies
 SIAM J. Sci. Comput
, 2002
"... Abstract. We develop a class of parallel multistep successive preconditioning strategies to enhance efficiency and robustness of standard sparse approximate inverse preconditioning techniques. The key idea is to compute a series of simple sparse matrices to approximate the inverse of the original ma ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Abstract. We develop a class of parallel multistep successive preconditioning strategies to enhance efficiency and robustness of standard sparse approximate inverse preconditioning techniques. The key idea is to compute a series of simple sparse matrices to approximate the inverse of the original matrix. Studies are conducted to show the advantages of such an approach in terms of both improving preconditioning accuracy and reducing computational cost, compared to the standard sparse approximate inverse preconditioners. Numerical experiments using one prototype implementation to solve a few sparse matrices on a distributed memory parallel computer are reported.
Efficient Matrix Chain Ordering in Polylog Time
 IN PROC. OF INT'L PARALLEL PROCESSING SYMP
, 1998
"... The matrix chain ordering problem is to find the cheapest way to multiply a chain of n matrices, where the matrices are pairwise compatible but of varying dimensions. Here we give several new parallel algorithms including O(lg 3 n)time and n/lg nprocessor algorithms for solving the matrix chain o ..."
Abstract

Cited by 11 (4 self)
 Add to MetaCart
(Show Context)
The matrix chain ordering problem is to find the cheapest way to multiply a chain of n matrices, where the matrices are pairwise compatible but of varying dimensions. Here we give several new parallel algorithms including O(lg 3 n)time and n/lg nprocessor algorithms for solving the matrix chain ordering problem and for solving an optimal triangulation problem of convex polygons on the common CRCW PRAM model. Next, by using efficient algorithms for computing row minima of totally monotone matrices, this complexity is improved to O(lg 2 n) time with n processors on the EREW PRAM and to O(lg 2 nlg lg n) time with n/lg lg n processors on a common CRCW PRAM. A new algorithm for computing the row minima of totally monotone matrices improves our parallel MCOP algorithm to O(nlg 1.5 n) work and polylog time on a CREW PRAM. Optimal logtime algorithms for computing row minima of totally monotone matrices will improve our algorithm and enable it to have the same work as the sequential algorithm of Hu and
A comparison of parallel solvers for diagonally dominant and general narrowbanded linear systems
 PARALLEL AND DISTRIBUTED COMPUTING PRACTICES (PCDP) 2000
, 1999
"... We continue the comparison of parallel algorithms for solving diagonally dominant and general narrowbanded linear systems of equations that we started in [2]. The solvers compared are the banded system solvers of ScaLAPACK [6] and those investigated by Arbenz and Hegland [1, 5]. We present the nume ..."
Abstract

Cited by 10 (1 self)
 Add to MetaCart
(Show Context)
We continue the comparison of parallel algorithms for solving diagonally dominant and general narrowbanded linear systems of equations that we started in [2]. The solvers compared are the banded system solvers of ScaLAPACK [6] and those investigated by Arbenz and Hegland [1, 5]. We present the numerical experiments that we conducted on the IBM SP/2.
Scalable Massively Parallel Artificial Neural Networks
 AIAA Paper No. 20057168, AIAA InfoTech@Aeropace Conference
, 2005
"... There is renewed interest in computational intelligence, due to advances in algorithms, neuroscience, and computer hardware. In addition there is enormous interest in autonomous vehicles (air, ground, and sea) and robotics, which need significant onboard intelligence. Work in this area could not onl ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
(Show Context)
There is renewed interest in computational intelligence, due to advances in algorithms, neuroscience, and computer hardware. In addition there is enormous interest in autonomous vehicles (air, ground, and sea) and robotics, which need significant onboard intelligence. Work in this area could not only lead to better understanding of the human brain but also very useful engineering applications. The functioning of the human brain is not well understood, but enormous progress has been made in understanding it and, in particular, the neocortex. There are many reasons to develop models of the brain. Artificial Neural Networks (ANN), one type of model, can be very effective for pattern recognition, function approximation, scientific classification, control, and the analysis of time series data. ANNs often use the backpropagation algorithm for training, and can require large training times especially for large networks, but there are many other types of ANNs. Once the network is trained for a particular problem, however, it can produce results in a very short time. Parallelization of ANNs could drastically reduce the training time. An objectoriented, massivelyparallel ANN (Artificial Neural Network) software package SPANN (Scalable Parallel Artificial Neural Network) has been developed and is described here. MPI was used
Complexity results for collective communications on heterogeneous platforms
 INT. JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS
, 2006
"... In this paper, we consider the communications involved in the execution of a complex application, deployed on a heterogeneous platform. Such applications extensively use macrocommunication schemes, for example to broadcast data items, either to all resources (broadcast) or to a restricted set of ta ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
In this paper, we consider the communications involved in the execution of a complex application, deployed on a heterogeneous platform. Such applications extensively use macrocommunication schemes, for example to broadcast data items, either to all resources (broadcast) or to a restricted set of targets (multicast). Rather than aiming at minimizing the execution time of a single collective communication, we focus on the steadystate operation. We assume that there is a large number of messages to be broadcast or multicast in pipelined fashion, and we aim at maximizing the throughput, i.e. the (rational) number of messages which can be broadcast or multicast every timestep. We target heterogeneous platforms, modeled by a graph where resources have different communication and computation speeds. Achieving the best throughput may well require that the target platform is used in totality: different messages may need to be transferred along different paths. The main focus of the paper is on complexity results. We aim at presenting a unified framework for analyzing the complexity of collective communication schemes. We concentrate on the classification (whether maximizing the throughput is a polynomial or NPhard problem), rather than actually providing efficient polynomial algorithms (when such algorithms are known, we refer to bibliographical pointers).