Results 11  20
of
89
Predicting Multiprocessor Memory Access Patterns with Learning Models
, 1997
"... Machine learning techniques are applicable to computer system optimization. We show that shared memory multiprocessors can successfully utilize machine learning algorithms for memory access pattern prediction. In particular three different online machine learning prediction techniques were tested t ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Machine learning techniques are applicable to computer system optimization. We show that shared memory multiprocessors can successfully utilize machine learning algorithms for memory access pattern prediction. In particular three different online machine learning prediction techniques were tested to learn and predict repetitive memory access patterns for three typical parallel processing applications, the 2D relaxation algorithm, matrix multiply and Fast Fourier Transform on a shared memory multiprocessor. The predictions were then used by a routing control algorithm to reduce control latency in the interconnection network by configuring the interconnection network to provide needed memory access paths before they were requested. Three trainable prediction techniques were used and tested: 1). a Markov predictor, 2). a linear predictor and 3). a time delay neural network (TDNN) predictor. Different predictors performed best on different applications, but the TDNN produced uniformly go...
Efficient Matrix Chain Ordering in Polylog Time
 IN PROC. OF INT'L PARALLEL PROCESSING SYMP
, 1998
"... The matrix chain ordering problem is to find the cheapest way to multiply a chain of n matrices, where the matrices are pairwise compatible but of varying dimensions. Here we give several new parallel algorithms including O(lg 3 n)time and n/lg nprocessor algorithms for solving the matrix chain o ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
The matrix chain ordering problem is to find the cheapest way to multiply a chain of n matrices, where the matrices are pairwise compatible but of varying dimensions. Here we give several new parallel algorithms including O(lg 3 n)time and n/lg nprocessor algorithms for solving the matrix chain ordering problem and for solving an optimal triangulation problem of convex polygons on the common CRCW PRAM model. Next, by using efficient algorithms for computing row minima of totally monotone matrices, this complexity is improved to O(lg 2 n) time with n processors on the EREW PRAM and to O(lg 2 nlg lg n) time with n/lg lg n processors on a common CRCW PRAM. A new algorithm for computing the row minima of totally monotone matrices improves our parallel MCOP algorithm to O(nlg 1.5 n) work and polylog time on a CREW PRAM. Optimal logtime algorithms for computing row minima of totally monotone matrices will improve our algorithm and enable it to have the same work as the sequential algorithm of Hu and
Automated Performance Prediction for Scalable Parallel Computing
 PARALLEL COMPUTING
, 1997
"... Performance prediction is necessary in order to deal with multidimensional performance effects on parallel systems. The compilergenerated analytical model developed in this paper accounts for the effects of cache behavior, CPU execution time and message passing overhead for real programs writte ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Performance prediction is necessary in order to deal with multidimensional performance effects on parallel systems. The compilergenerated analytical model developed in this paper accounts for the effects of cache behavior, CPU execution time and message passing overhead for real programs written in high level dataparallel languages. The performance prediction technique is shown to be effective in analyzing several nontrivial dataparallel applications as the problem size and number of processors vary. We leverage technology from the Maple symbolic manipulation system and the SPLUS statistical package in order to present users with critical performance information necessary for performance debugging, architectural enhancement and procurement of parallel systems. The usability of these results is improved through specifying confidence intervals as well as predicted execution times for dataparallel applications.
Secure File Transfer: A Computational Analog to the Furniture Moving Paradigm
 PARALLEL AND DISTRIBUTED COMPUTING PRACTICES
, 1999
"... One of the most compelling illustrations of the power of parallelism is the furnituremoving paradigm. In it, a large item of furniture needs to be moved from one place to another. A single mover, working alone, must take the item apart, move each piece separately, and then reassemble the item a ..."
Abstract

Cited by 9 (8 self)
 Add to MetaCart
One of the most compelling illustrations of the power of parallelism is the furnituremoving paradigm. In it, a large item of furniture needs to be moved from one place to another. A single mover, working alone, must take the item apart, move each piece separately, and then reassemble the item at the new location, taking a long time to complete the job. By contrast, four movers can simply lift the item and quickly move it to its new location. Thus, the time required to accomplish the task is reduced by a factor significantly larger than four. This paper describes a computational analog to the furnituremoving paradigm. The computation in question is concerned with transferring a computer file from one computer system to another over an insecure communications channel. The file contains private or sensitive information whose secrecy and integrity need to be maintained. Cryptography is used to obtain a digital signature of the file, thereby protecting its integrity, and the...
Complexity results for collective communications on heterogeneous platforms
 INT. JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS
, 2006
"... In this paper, we consider the communications involved in the execution of a complex application, deployed on a heterogeneous platform. Such applications extensively use macrocommunication schemes, for example to broadcast data items, either to all resources (broadcast) or to a restricted set of ta ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
In this paper, we consider the communications involved in the execution of a complex application, deployed on a heterogeneous platform. Such applications extensively use macrocommunication schemes, for example to broadcast data items, either to all resources (broadcast) or to a restricted set of targets (multicast). Rather than aiming at minimizing the execution time of a single collective communication, we focus on the steadystate operation. We assume that there is a large number of messages to be broadcast or multicast in pipelined fashion, and we aim at maximizing the throughput, i.e. the (rational) number of messages which can be broadcast or multicast every timestep. We target heterogeneous platforms, modeled by a graph where resources have different communication and computation speeds. Achieving the best throughput may well require that the target platform is used in totality: different messages may need to be transferred along different paths. The main focus of the paper is on complexity results. We aim at presenting a unified framework for analyzing the complexity of collective communication schemes. We concentrate on the classification (whether maximizing the throughput is a polynomial or NPhard problem), rather than actually providing efficient polynomial algorithms (when such algorithms are known, we refer to bibliographical pointers).
Parallel Two Level Block ILU Preconditioning Techniques for Solving Large Sparse Linear Systems
 Paral. Comput
, 2000
"... We discuss issues related to domain decomposition and multilevel preconditioning techniques which are often employed for solving large sparse linear systems in parallel computations. We introduce a class of parallel preconditioning techniques for general sparse linear systems based on a two level bl ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
We discuss issues related to domain decomposition and multilevel preconditioning techniques which are often employed for solving large sparse linear systems in parallel computations. We introduce a class of parallel preconditioning techniques for general sparse linear systems based on a two level block ILU factorization strategy. We give some new data structures and strategies to construct local coefficient matrix and local Schur complement matrix in each processor. The preconditioner constructed is fast and robust for solving certain large sparse matrices. Numerical experiments show that our domain based two level block ILU preconditioners are more robust and more efficient than some published ILU preconditioners based on Schur complement techniques for parallel sparse matrix solutions.
MSP: a class of parallel multistep successive sparse approximate inverse preconditioning strategies
 SIAM J. Sci. Comput
, 2002
"... Abstract. We develop a class of parallel multistep successive preconditioning strategies to enhance efficiency and robustness of standard sparse approximate inverse preconditioning techniques. The key idea is to compute a series of simple sparse matrices to approximate the inverse of the original ma ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
Abstract. We develop a class of parallel multistep successive preconditioning strategies to enhance efficiency and robustness of standard sparse approximate inverse preconditioning techniques. The key idea is to compute a series of simple sparse matrices to approximate the inverse of the original matrix. Studies are conducted to show the advantages of such an approach in terms of both improving preconditioning accuracy and reducing computational cost, compared to the standard sparse approximate inverse preconditioners. Numerical experiments using one prototype implementation to solve a few sparse matrices on a distributed memory parallel computer are reported.
Refinement of herpesvirus Bcapsid structure on parallel supercomputers
 Biophys. J
, 1998
"... ABSTRACT Electron cryomicroscopy and icosahedral reconstruction are used to obtain the threedimensional structure of the 1250Ådiameter herpesvirus Bcapsid. The centers and orientations of particles in focal pairs of 400kV, spotscan micrographs are determined and iteratively refined by commonl ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
ABSTRACT Electron cryomicroscopy and icosahedral reconstruction are used to obtain the threedimensional structure of the 1250Ådiameter herpesvirus Bcapsid. The centers and orientations of particles in focal pairs of 400kV, spotscan micrographs are determined and iteratively refined by commonlinesbased local and global refinement procedures. We describe the rationale behind choosing sharedmemory multiprocessor computers for executing the global refinement, which is the most computationally intensive step in the reconstruction procedure. This refinement has been implemented on three different sharedmemory supercomputers. The speedup and efficiency are evaluated by using test data sets with different numbers of particles and processors. Using this parallel refinement program, we refine the herpesvirus Bcapsid from 355particle images to 13Å resolution. The map shows new structural features and interactions of the protein subunits in the three distinct morphological units: penton, hexon, and triplex of this T � 16 icosahedral particle.
Randomized Motion Planning on Parallel and Distributed Architectures
 In Euromicro Workshop on Parallel and Distributed Processing
, 1999
"... Motion planning is a fundamental problem in a number of application areas, including robotics, automation, and virtual reality. This paper describes a parallel implementation of a motion planning algorithm particularly suited for complex systems characterized by many degrees of freedom. The implemen ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
Motion planning is a fundamental problem in a number of application areas, including robotics, automation, and virtual reality. This paper describes a parallel implementation of a motion planning algorithm particularly suited for complex systems characterized by many degrees of freedom. The implementation is based on the concurrent exploration of the search space by a randomized planner replicated on each node of the parallel architecture. All processing elements compete to obtain a solution over the entire search space in an ORparallel fashion. Reported results refer to a lowcost cluster of PCs and an SGI Onyx2 parallel machine. The experiments emphasize the effectiveness of the approach for complex, highdimensionality planning problems. We believe that the approach may be useful in other complex search problems, especially when the parallel architecture exhibits relatively high communication latency. 1 Introduction Motion planning consists in the determination of a collisionfree...
Performance evaluation of a multirobot search & retrieval system: Experiences with MinDART
 Journal of Intelligent and Robotic Systems
, 2003
"... Swarm techniques, where many simple robots are used instead of complex ones for performing a task, promise to reduce the cost of developing robot teams for many application domains. The challenge lies in selecting an appropriate control strategy for the individual units. This work explores the effec ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Swarm techniques, where many simple robots are used instead of complex ones for performing a task, promise to reduce the cost of developing robot teams for many application domains. The challenge lies in selecting an appropriate control strategy for the individual units. This work explores the effect of different control strategies of varying complexity and of various environmental factors on performance of a team of robots at a foraging task when using physical robots (the Minnesota Distributed Autonomous Robotic Team). Specifically we study the effect of localization and of simple communication techniques on task completion time using two sets of foraging experiments. We also present results for task performance with varying team sizes and target distribution. As indicated by the results, control strategies with increasing complexity reduce the variance in the performance, but do not always reduce the time to complete the task. 1