Results 1 -
7 of
7
Models of Machines and Computation for Mapping in Multicomputers
, 1993
"... It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonly-accepted framework ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
It is now more than a quarter of a century since researchers started publishing papers on mapping strategies for distributing computation across the computation resource of multiprocessor systems. There exists a large body of literature on the subject, but there is no commonly-accepted framework whereby results in the field can be compared. Nor is it always easy to assess the relevance of a new result to a particular problem. Furthermore, changes in parallel computing technology have made some of the earlier work of less relevance to current multiprocessor systems. Versions of the mapping problem are classified, and research in the field is considered in terms of its relevance to the problem of programming currently available hardware in the form of a distributed memory multiple instruction stream multiple data stream computer: a multicomputer.
MULTIPROCESSOR SCHEDULING TO ACCOUNT FOR INTERPROCESSOR COMMUNICATION
, 1991
"... Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essenti ..."
Abstract
-
Cited by 64 (11 self)
- Add to MetaCart
Interprocessor communication (PC) overheads have emerged as the major performance limitation in parallel processing systems, due to the transmission delays, synchronization overheads, and conflicts for shared communication resources created by data exchange. Accounting for these overheads is essential for attaining efficient hardware utilization. This thesis introduces two new compile-time heuristics for scheduling precedence graphs onto multiprocessor architectures, which account for interprocessor communication overheads and interconnection constraints in the architecture. These algorithms perform scheduling and routing simultaneously to account for irregular interprocessor interconnections, and schedule all communications as well as all computations to eliminate shared resource contention. The first technique, called dynamic-level scheduling, modifies the classical HLFET list scheduling strategy to account for IPC and synchronization overheads. By using dynamically changing priorities to match nodes and processors at each step, this technique attains an equitable tradeoff between load balancing and interprocessor communication cost. This method is fast, flexible, widely targetable, and displays promising perforrnance. The second technique, called declustering, establishes a parallelism hierarchy upon the precedence graph using graph-analysis techniques which explicitly address the tradeoff between exploiting parallelism and incurring communication cost. By systematically decomposing this hierarchy, the declustering process exposes parallelism instances in order of importance, assuring efficient use of the available processing resources. In contrast with traditional clustering schemes, this technique can adjust the level of cluster granularity to suit the characteristics of the specified architecture, leading to a more effective solution.
A partitioning advisory system for networked data-parallel processing
- Concurrency: Practice and Experience
, 1995
"... With the increased performance capabilities of desktop computers, networked computing has become a popular vehicle for using parallelism to solve a variety of computationally intense problems. However, node heterogeneity and high communication costs may limit performance unless the problem space is ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
With the increased performance capabilities of desktop computers, networked computing has become a popular vehicle for using parallelism to solve a variety of computationally intense problems. However, node heterogeneity and high communication costs may limit performance unless the problem space is carefully partitioned across the network in a way that considers both the capabilities of the machines and the high network communication costs. We describe an advisory system that is designed to help the programmer, compiler, or run-time environment choose the best decomposition strategy for partitioning specific data-parallel applications across a given collection of machines. The system includes provisions for assessing the capabilities of the participating machines and the network in light of the current workload. Given information about the problem space, the machine speeds, and the network, the system provides a ranking of three standard partitioning methods. We test the validity of our system by comparing the observed relative performance with predicted relative performance of different data decompositions on a program with a variable number of floating point operations and a 5-point stencil communication pattern.
Assignment of Independent Tasks to Minimize Completion Time
- Software–Practice and Experience
, 1992
"... this paper. Each task of the application is assumed to (1) require execution on a single processor, (2) have an estimate of its maximum execution time, and (3) not wait on communications with other tasks. The objective of the studied schedulers is to map an application's tasks onto the underlying ha ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
this paper. Each task of the application is assumed to (1) require execution on a single processor, (2) have an estimate of its maximum execution time, and (3) not wait on communications with other tasks. The objective of the studied schedulers is to map an application's tasks onto the underlying hardware in such a way that the application's completion time is minimized. Experimental evaluation of the schedulers indicate that in many situations, a more sophisticated scheduler fails to outperform simpler schedulers
A Heuristic model for task allocation in heterogeneous distributed computing systems
, 1999
"... Keywords: Heterogeneous Computing, Distributed Processing, Task Allocation, Simulated Annealing In heterogeneous distributed computing systems, partitioning of the application software into modules and the proper allocation of these modules among dissimilar processors are important factors which d ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Keywords: Heterogeneous Computing, Distributed Processing, Task Allocation, Simulated Annealing In heterogeneous distributed computing systems, partitioning of the application software into modules and the proper allocation of these modules among dissimilar processors are important factors which determine the efficient utilization of resources. This paper presents a new heuristic model, the HMLM/SA, which performs static allocation of such program modules in a heterogeneous distributed computing system in a manner that is designed to minimize the application program's parallel execution time. The new methodology augments the Maximally Linked Module concept by using stochastic techniques and by adding constructs which take into account the limited and uneven distribution of hardware resources often associated with heterogeneous systems. The execution time of the resulting HMLM/SA algorithm and the quality of the allocations produced are shown to be superior to that of the base HMLM algorithm, pure simulated annealing and the randomized algorithm when they were applied to randomly-generated systems and synthetic structures which were derived from real-world problems.
FOR DISTRIBUTED REAL-TIME APPLICATIONS
"... permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotiona ..."
Abstract
- Add to MetaCart
permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania’s products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to
A Methodology For Statically Clustering Active Objects In Distributed Systems
, 1994
"... The complexity inherent in large scale distributed systems accounts for the difficulty in engineering applications on such platforms. The difficulty primarily arises from having to partition an application to exploit concurrency and minimize the overhead due to communication across the network. The ..."
Abstract
- Add to MetaCart
The complexity inherent in large scale distributed systems accounts for the difficulty in engineering applications on such platforms. The difficulty primarily arises from having to partition an application to exploit concurrency and minimize the overhead due to communication across the network. The challenge in building a powerful development environment which simplifies the task of application development therefore lies in the degree to which application partitioning and distribution can be automated or made transparent. Transparency requires powerful static semantic analysis that can massage any given application into a state which an appropriate run time model can then execute efficiently on a distributed architecture. The object model with it's software engineering advantages presents an attractive alternative to the task/process based computational model for distributed systems. A fine grained active object model presents a powerful computational paradigm to engineer a wide variet...

