Results 1 
9 of
9
Design guidance in the power dimension
 Proceedings of International Conference on Acoustics, Speech and Signal Processing
, 1995
"... This work proposes an approach for high level design guidance for low power using properties of given algorithms and architectures. Several relevant properties (operation count, the ratio of critical path to available time, spatial locality, and regularity) are identified and discussed, with quantit ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
(Show Context)
This work proposes an approach for high level design guidance for low power using properties of given algorithms and architectures. Several relevant properties (operation count, the ratio of critical path to available time, spatial locality, and regularity) are identified and discussed, with quantitative measures being proposed for the latter two. Significant emphasis is placed on exploiting the regularity and spatial locality algorithm properties for the optimization of interconnect power. Examples illustrate the large savings that can be attained through propertybased guidance of algorithm selection and architecture composition. Though demonstrated for ASIC designs, this approach is extensible to different hardware platforms and performance metrics (e.g. speed, area). 1.
Data Management and ControlFlow Aspects of an SIMD/SPMD Parallel Language/Compiler
 IEEE Transactions on Parallel and Distributed Systems
, 1993
"... AbstractFeatures of an explicitly parallel programming language targeted for reconfigurable parallel processing systems, where the machine's1 processing elements (PE's) are capable of operating in both the SIMD and SPMD modes of parallelism, are described. The SPMD (Single ProgramMult ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
AbstractFeatures of an explicitly parallel programming language targeted for reconfigurable parallel processing systems, where the machine's1 processing elements (PE's) are capable of operating in both the SIMD and SPMD modes of parallelism, are described. The SPMD (Single ProgramMultiple Data) mode of parallelism is a subset of the MIMD mode where all processors execute the same program. By providing all aspects of the language with an SIMD mode version and an SPMD mode version that are syntactically and semantically equivalent, the language facilitates experimentation with and exploitation of hybrid SlMDiSPMD machines. Language constructs (and their implementations) for data management, datadependent controlflow, and PEaddress dependent controlflow are presented. These constructs are based on experience gained from programming a parallel machine prototype, and are being incorporated into a compiler under development. Much of the research presented is applicable to general SIMD machines and MIMD machines.
A Methodology for Guided BehavioralLevel Optimization
 PROC. 35TH ACM DESIGN AUTOMATION CONF. (DAC
, 1998
"... Optimization at the early stages of design are crucial. However, due to an overwhelming number of design and optimization options, design exploration is often conducted in a qualitative, adhoc manner. This paper presents a methodology and interactive environment for guiding the exploration process. ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Optimization at the early stages of design are crucial. However, due to an overwhelming number of design and optimization options, design exploration is often conducted in a qualitative, adhoc manner. This paper presents a methodology and interactive environment for guiding the exploration process. A prototype targeting behaviorallevel optimization for datapathintensive ASIC implementations has been developed. The key to the approach is encapsulated knowledge about the various optimizations and a set of techniques to automatically extract the "essence" of a design description. At each stage in the exploration process, the system suggests and ranks potential optimizations, both in terms of immediate and longerterm impact. It also provides evaluations of the design and of the likely affects each optimization will have on metrics like power and performance. In the new approach, the designer is responsible for making the actual optimization selections. However, using the provided guidance, designers can make decisions in a more informed manner, and therefore can explore the design solution space more effectively. The effectiveness of the approach is demonstrated on a number of designs.
A Method For The Embedding Of Arbitrary Communication Topologies Into Configurable Parallel Computers
 Configurable Parallel Computers, ACM Symposium on Applied Computing
, 1998
"... This paper presents a method for embedding arbitrary communication topologies into crossbar interconnection networks. The embedding problem is divided into two parts: first, the placement of the program modules onto the processors, solved by means of quadratic assignment and second, the assignment b ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
This paper presents a method for embedding arbitrary communication topologies into crossbar interconnection networks. The embedding problem is divided into two parts: first, the placement of the program modules onto the processors, solved by means of quadratic assignment and second, the assignment between logical and physical data channels, solved by means of linear programming. Our method differs from those in other papers in the sense that it is a general approach not restricted to a special class of communication or network topologies. We prove that in some cases our method is optimal, that means the number of links occupied by the embedding is minimized. Furthermore the assignment of the communication links is always optimal for a given placement of the program modules. Our technique has been applied to a crossbar network of a parallel signal processor system consisting of 256 modules. We have embedded classical as well as nonstandard topologies, such as hypercube, 3dimensional t...
F.: Analysis of algorithmic structures with heterogeneous tasks
 the International Phoenix Conference on Computers and Communications
, 1996
"... Developing e cient programs for distributed systems is di cult because computations must be e ciently distributed and managed on multiple processors. In particular, the programmer must partition functions and data in an attempt to nd a reasonable balance between parallelism and overhead. Furthermore ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Developing e cient programs for distributed systems is di cult because computations must be e ciently distributed and managed on multiple processors. In particular, the programmer must partition functions and data in an attempt to nd a reasonable balance between parallelism and overhead. Furthermore, it is very expensive to code an algorithm only to nd out that the implementation is not e cient. As a result, it is often necessary to determine and examine those characteristics of an algorithm that can be used to predict its suitability for a distributed computing system. In earlier work [7, 8], we presented a framework for the study of synchronization and communication e ects on the theoretical performance of common homogeneous algorithmic structures. In particular, we examined the synchronous, asynchronous, nearestneighbor, and asynchronous masterslave structures in terms of expected execution times. In this paper, we examine the e ects of synchronization and communication on the expected execution times of heterogeneous algorithmic structures. Speci cally, we consider structures containing two di erent types of tasks, where the execution times of the tasks follow one of two di erent uniform distributions or one of two di erent normal distributions. Furthermore, we compare the expected execution times of the heterogeneous algorithmic structures with times for corresponding homogeneous structures. Finally, wedevelop bounds for the expected execution times of the heterogeneous structures and compare those bounds to simulated execution times.
Applicationspecific Configuration of Heterogeneous Crossbar Networks
"... This paper presents a method for embedding arbitrary communication topologies into heterogeneous crossbar interconnection networks. Our methods extends and improves the method presented in [1] which is based on mathematical programming. It supports not only homogeneous but also heterogeneous network ..."
Abstract
 Add to MetaCart
(Show Context)
This paper presents a method for embedding arbitrary communication topologies into heterogeneous crossbar interconnection networks. Our methods extends and improves the method presented in [1] which is based on mathematical programming. It supports not only homogeneous but also heterogeneous networks with different interfaces. It has been applied to a crossbar network of parallel signal processor system consisting of 256 modules. It is shown that the optimization of the embedding algorithm leads to a considerable enhancement of the performance. Keywords: Configuration, embedding, parallel computing, heterogeneous networks. 1 INTRODUCTION The time to compute large scale applications on massive parallel computers depends crucially on the communication overhead caused by the data exchange between processors [2]. Thus a good match between the data dependencies defined by the application and the topology of the processor interconnection network is essential for an effective parallel proce...
Space Administration
, 1989
"... This is Volume I of two volumes of the Proceedings of the 3rd Annual Conference on Aerospace Computational Control. The term Computational Control was coined this year to encompass that range of computerbased tools and capabilities needed by aerospace control systems engineers for design, analysis, ..."
Abstract
 Add to MetaCart
(Show Context)
This is Volume I of two volumes of the Proceedings of the 3rd Annual Conference on Aerospace Computational Control. The term Computational Control was coined this year to encompass that range of computerbased tools and capabilities needed by aerospace control systems engineers for design, analysis, and testing of current and future missions. This year's conference furthered the dialogue in this area begun
Mapping Conjugate Gradient Algorithms for Neutron Diffusion Applications onto SIMD, MIMD, and
"... The performance of conjugate gradient (CG) algorithms for the solution of the system of linear equations that results from the finitedifferencing of the neutron diffusion equation was analyzed on SIMD, MIMD, and mixedmode parallel machines. A block preconditioner based on the incomplete Cholesky f ..."
Abstract
 Add to MetaCart
The performance of conjugate gradient (CG) algorithms for the solution of the system of linear equations that results from the finitedifferencing of the neutron diffusion equation was analyzed on SIMD, MIMD, and mixedmode parallel machines. A block preconditioner based on the incomplete Cholesky factorization was used to accelerate the conjugate gradient search. The issues involved in mapping both the unpreconditioned and preconditioned conjugate gradient algorithms onto the mixedmode PASM prototype, the SIMD MasPar MP1, and the MIMD Intel Paragon XP/S are discussed. On PASM, the mixedmode implementation outperformed either SIMD or MIMD alone. Theoretical performance predictions were analyzed and compared with the experimental results on the MasPar MP1 and the Paragon XP/S. Other issues addressed include the impact on execution time of the number of processors used, the effect of the interprocessor communication network on performance, and the relationship of the number of processors to the quality of the preconditioning. Applications