Results 1  10
of
20
Evaluation of a semistatic approach to mapping dynamic iterative tasks onto heterogeneous computing systems
 J. Parallel Distrib. Comput
, 1999
"... Abstract—To minimize the execution time of an iterative application in a heterogeneous parallel computing environment, an appropriate mapping scheme is needed for matching and scheduling the subtasks of the application onto the processors. When some of the characteristics of the application subtasks ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Abstract—To minimize the execution time of an iterative application in a heterogeneous parallel computing environment, an appropriate mapping scheme is needed for matching and scheduling the subtasks of the application onto the processors. When some of the characteristics of the application subtasks are unknown a priori and will change from iteration to iteration during executiontime, a semistatic methodology can be employed, that starts with an initial mapping but dynamically decides whether to perform a remapping between iterations of the application, by observing the effects of these dynamic parameters on the application’s execution time. The objective of this study is to implement and evaluate such a semistatic methodology. For analyzing the effectiveness of the proposed scheme, it is compared with two extreme approaches: a completely dynamic approach using a fast mapping heuristic and an ideal approach that uses a genetic algorithm online but ignores the time for remapping. Experimental results indicate that the semistatic approach outperforms the dynamic approach and is reasonably close to the ideal but infeasible approach. 1
Discovery and Application of Network Information
, 2000
"... USAF, under agreement number F306029610287. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as neces ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
USAF, under agreement number F306029610287. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Advanced Research
A MultiAgent System for Parallelizing Image Analysis Tasks
 in Proceedings of the International Conference on Intelligent Autonomous Systems 5 (IAS5
, 1998
"... . To exploit the full capacity of distributed systems for image analysis tasks they must be processed in parallel. However, developing parallel programs is complicated and often results in architecturedependent code that is difficult to port to different machines. Thus there is the need of more ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
. To exploit the full capacity of distributed systems for image analysis tasks they must be processed in parallel. However, developing parallel programs is complicated and often results in architecturedependent code that is difficult to port to different machines. Thus there is the need of more flexible, architecture independent methods for an automatic parallelization of tasks. This paper introduces such a method and describes a multiagent system for the automatic parallelization of image analysis tasks. The user provides a specification of the task that is used by the agents to plan and control its parallel processing within a distributed system. At this, they make use of different methods of parallel processing and consider the specific qualification and the actual load of processors when deciding about the scheduling and mapping of tasks and data. 1. Introduction On the hardware sector there is still a strong trend to parallel hardware to speed up image analysis task...
LowCost Task Scheduling for DistributedMemory Machines
 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 2002
"... this paper, we show that list scheduling with statically computed priorities can be performed at a significantly lower cost than existing approaches, without sacrificing performance. Our approach is general, i.e., it can be applied to any list scheduling algorithm with static priorities. The lowco ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
this paper, we show that list scheduling with statically computed priorities can be performed at a significantly lower cost than existing approaches, without sacrificing performance. Our approach is general, i.e., it can be applied to any list scheduling algorithm with static priorities. The lowcomplexity is achieved by using lowcomplexity methods for the most time consuming parts in list scheduling algorithms, i.e., processor selection and task selection, preserving the criteria used in the original algorithms. We exemplify our method by applying it to the MCP algorithm. Using an extension of this method, we can also reduce the time complexity of a particular class of list scheduling with dynamic priorities (including algorithms such as DLS, ETF, or ERT). Our results confirm that the modified versions of the list scheduling algorithms obtain a performance comparable to their original versions, yet at a significantly lower cost. We also show that the modified versions of the list scheduling algorithms consistently outperform multistep algorithms, such as DSCLLB, which also have higher complexity and clearly outperform algorithms in the same class of complexity, such as CPM
Revisiting communication code generation algorithms for messagepassing systems,
 International Journal of Parallel, Emergent and Distributed Systems (JPEDS)
, 2006
"... Abstract In this paper, we investigate algorithms for generating communication code to run on distributedmemory systems. We modify algorithms from previously published work and prove that the algorithms produce correct code. We then extend these algorithms to incorporate the mapping of virtual proc ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
(Show Context)
Abstract In this paper, we investigate algorithms for generating communication code to run on distributedmemory systems. We modify algorithms from previously published work and prove that the algorithms produce correct code. We then extend these algorithms to incorporate the mapping of virtual processors to physical processors and prove the correctness of this extension. This technique can reduce the number of interprocessor messages. In the examples that we show, the total number of messages was reduced from O(N 2 ) to O(P 2 ), where N is the input size and P is the number of physical processors. The reason that it is important to revisit communication code generation and to introduce a formal specication of the incorporation of mapping in the communication code generation is so that we can make use of the many scheduling heuristics proposed in the literature. We need a generalized mapping function so that we can apply dierent mapping and scheduling heuristics proposed in the literature for each input program, therefore improving the average performance.
Stochastic dfs for multiprocessor scheduling of cyclic taskgraphs
 In Proceedings of Parallel, Distributed Computing, Applications and Technologies PDCAT 2004
, 2004
"... Abstract. DFS has previously been shown to be a simple and efficient strategy for removing cycles in graphs allowing the resulting DAGs to be scheduled using one of the many wellestablished DAG multiprocessor scheduling algorithms. In this paper, we investigate the inefficiencies of schedules acqui ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. DFS has previously been shown to be a simple and efficient strategy for removing cycles in graphs allowing the resulting DAGs to be scheduled using one of the many wellestablished DAG multiprocessor scheduling algorithms. In this paper, we investigate the inefficiencies of schedules acquired using DFS cycle removal. Further, an improved randomised DFS cycle removal algorithm is proposed that produces significantly improved results with acceptable computational overheads. 1
Heterogeneous Parallel Computing in Remote Sensing Applications: Current Trends and Future Perspectives
, 2006
"... Heterogeneous networks of computers have rapidly become a very promising commodity computing solution, expected to play a major role in the design of high performance computing systems for remote sensing missions. Currently, only a few parallel processing strategies are available in this research ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Heterogeneous networks of computers have rapidly become a very promising commodity computing solution, expected to play a major role in the design of high performance computing systems for remote sensing missions. Currently, only a few parallel processing strategies are available in this research area, and most of them assume homogeneity in the underlying computing platform. This paper develops several highly innovative heterogeneous parallel algorithms for information extraction from highdimensional remotely sensed images, with particular emphasis on target detection and landcover mapping applications. Experimental results are presented in the context of a realistic application, using real data collected by NASA’s Jet Propulsion Laboratory over the World Trade Center complex in New York City after September 11th, 2001. Parallel performance of the proposed algorithms is discussed using several (fully and partially) heterogeneous networks at University of Maryland, and a massively parallel Beowulf cluster at NASA’s Goddard Space Flight Center. Combined, these parts deliver a snapshot of the stateoftheart in those areas, and a thoughtful perspective on the potential and challenges of applying heterogeneous computing practices to remote sensing problems. 1.
Value prediction in hls allocation problems using intellectual properties
 Applied Artificial Intelligence
"... A value approximationbased global search algorithm is suggested to solve resourceconstrained allocation in high level synthesis problems. Value approximation is preferred, because it can start by using expert heuristics, can estimate the global structure of the search problem, and can optimize heu ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
A value approximationbased global search algorithm is suggested to solve resourceconstrained allocation in high level synthesis problems. Value approximation is preferred, because it can start by using expert heuristics, can estimate the global structure of the search problem, and can optimize heuristics. We are concerned by those allocation problems that have hidden global structure that value approximation may unravel. The value approximation applied here computes the cost of the actual solution and estimates the cost of the solution that could be achieved upon performing a global search on the hidden structure starting from the actual solution. We transcribed the allocation problem into a special form of weighted CNF formulae to suit our approach. We also extended the formalism to pipeline operations. Comparisons are made with expert heuristics. Scaling of computation time and performance are compared. We shall discuss a search method that can be applied on a broad set of problems in resource constrained allocation. The design problem is a highlevel synthesis (HLS) task: register transfer level (RTL) hardware description is to be given from a task formulated at a higher level. Di erent heuristics exist to solve this task (AratoÂ et al. 1999; AratoÂ, VisegraÂ dy, and Jankovits
A graph transformational approach to the multiprocessor scheduling of iterative computations
 In Proceedings of Proceedings of the Fourth International Conference of Parallel and Distributed Computing, Applications and Techniques
, 2003
"... Abstract–A modular strategy for scheduling iterative computations is proposed. An iterative computation is represented using a cyclic taskgraph. The cyclic taskgraph is transformed into an acyclic taskgraph. This acyclic taskgraph is subsequently scheduled using one of the many wellknown and hi ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract–A modular strategy for scheduling iterative computations is proposed. An iterative computation is represented using a cyclic taskgraph. The cyclic taskgraph is transformed into an acyclic taskgraph. This acyclic taskgraph is subsequently scheduled using one of the many wellknown and highquality static scheduling strategies from the literature. Graph unfolding is not employed and the generated schedules therefore require less memory than schedules generated through graph unfolding. Further, the number of iterations does not need to be known at compiletime. The effectiveness of the approach is compared to other methods including a graph unfolding strategy. In addition, the paper experimentally quantifies how the task transformation affects the makespan of the schedules.
Optimal scheduling of iterative dataflow programs onto multiprocessors with nonnegligible interprocessor communication
 Proceedings of HPCN Europe
, 1999
"... ..."