Results 1 - 10
of
42
A Resource Query Interface for Network-Aware Applications
- Cluster Computing
, 1999
"... Development of portable network-aware applications demands an interface to the network that allows an application to obtain information about its execution environment. This paper motivates and describes the design of Remos, an API that allows network-aware applications to obtain relevant informatio ..."
Abstract
-
Cited by 55 (15 self)
- Add to MetaCart
Development of portable network-aware applications demands an interface to the network that allows an application to obtain information about its execution environment. This paper motivates and describes the design of Remos, an API that allows network-aware applications to obtain relevant information. The major challenges in defining a uniform interface are network heterogeneity, diversity in traffic requirements, variability of the information, and resource sharing in the network. Remos addresses these issues with two abstraction levels, explicit management of resource sharing, and statistical measurements. The flows abstraction captures the communication between nodes, and the topologies abstraction provides a logical view of network connectivity. Remos measurements are made at network level, and therefore information to manage sharing of resources is available. Remos is designed to deliver best effort information to applications, and it explicitly adds statistical reliability and va...
ReMoS: A Resource Monitoring System for Network-Aware Applications
, 1997
"... Development of portable network-aware applications demands an interface to the network that allows an application to obtain information about its execution environment. This paper motivates and describes the design of Remos, an API that allows network-aware applications to obtain relevant informatio ..."
Abstract
-
Cited by 41 (8 self)
- Add to MetaCart
Development of portable network-aware applications demands an interface to the network that allows an application to obtain information about its execution environment. This paper motivates and describes the design of Remos, an API that allows network-aware applications to obtain relevant information. The major challenges in defining a uniform interface are network heterogeneity, diversity in traffic requirements, variability of the information, and resource sharing in the network. Remos addresses these issues with two abstraction levels, explicit management of resource sharing, and statistical measurements. The flows abstraction captures the communication between nodes, and the topologies abstraction provides a logical view of network connectivity. Remos measurements are made at network level, and therefore information to manage sharing of resources is available. Remos is designed to deliver best effort information to applications, and it explicitly adds statistical reliability and variability measures to the core information. The paper also presents preliminary results and experience with a prototype Remos implementation for a high speed IP-based network testbed.
Decentralizing execution of composite web services
- In OOPSLA ’04: Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
, 2004
"... Distributed enterprise applications today are increasingly being built from services available over the web. A unit of functionality in this framework is a web service, a software application that exposes a set of “typed ” connections that can be accessed over the web using standard protocols. These ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
Distributed enterprise applications today are increasingly being built from services available over the web. A unit of functionality in this framework is a web service, a software application that exposes a set of “typed ” connections that can be accessed over the web using standard protocols. These units can then be composed into a composite web service. BPEL (Business Process Execution Language) is a high-level distributed programming language for creating composite web services. Although a BPEL program invokes services distributed over several servers, the orchestration of these services is typically under centralized control. Because performance and throughput are major concerns in enterprise applications, it is important to remove the inefficiencies introduced by the centralized control. In a distributed, or decentralized
Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflow
- ALGORITHMICA
, 2007
"... Mapping applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline or fork graphs. Several antagonist criteria should be optimized for workflow applications, such as throughput and latency (or a combination). In this paper, we consider a si ..."
Abstract
-
Cited by 15 (12 self)
- Add to MetaCart
Mapping applications onto parallel platforms is a challenging problem, even for simple application patterns such as pipeline or fork graphs. Several antagonist criteria should be optimized for workflow applications, such as throughput and latency (or a combination). In this paper, we consider a simplified model with no communication cost, and we provide an exhaustive list of complexity results for different problem instances. Pipeline or fork stages can be replicated in order to increase the throughput by sending consecutive data sets onto different processors. In some cases, stages can also be data-parallelized, i.e. the computation of one single data set is shared between several processors. This leads to a decrease of the latency and an increase of the throughput. Some instances of this simple model are shown to be NP-hard, thereby exposing the inherent complexity of the mapping problem. We provide polynomial algorithms for other problem instances. Altogether, we provide solid theoretical foundations for the study of mono-criterion or bi-criteria mapping optimization problems.
A Mapping Methodology for Designing Software Task Pipelines for Embedded Signal Processing
- In the 3rd International Workshop on Embedded HPC Systems and Applications (EHPC’98) at the 12th International Parallel Processing Sysposium (IPPS’98
, 1998
"... . In this paper, we present a methodology for mapping an Embedded Signal Processing (ESP) application onto HPC platforms such that the throughput performance is maximized. Previous approaches used a linear pipelined execution model which restrict the mapping choices. We show that the "optimal" s ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
. In this paper, we present a methodology for mapping an Embedded Signal Processing (ESP) application onto HPC platforms such that the throughput performance is maximized. Previous approaches used a linear pipelined execution model which restrict the mapping choices. We show that the "optimal" solution obtained under that model can be improved, using the proposed execution model. Based on the new model, a three-step task mapping methodology is developed. The methodology is demonstrated by designing Software Task Pipelines for modern radar and sonar signal processing applications. Experimental results show improved performance using our approach over those obtained by previous approaches. 1 Introduction In this paper, we address the problem of maximizing the throughput of an ESP application on a given number of processors of a High Performance Computing (HPC) platform. ESP applications are typically composed of a sequence of computation stages with varying computational comp...
COLT_HPF, a Run-Time Support for the High-Level Coordination of HPF Tasks
- of HPF Tasks, Concurrency: Practice and Experience, Vol
, 1999
"... ions (SDAs), using a syntax similar to that of HPF. Each instance of an SDA encapsulates distributed data and methods, where methods have exclusive access to encapsulated data. Data parallel tasks are thus started by creating instances of specific SDAs, while the inter-task co-operation takes place ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
ions (SDAs), using a syntax similar to that of HPF. Each instance of an SDA encapsulates distributed data and methods, where methods have exclusive access to encapsulated data. Data parallel tasks are thus started by creating instances of specific SDAs, while the inter-task co-operation takes place by means of remote synchronous (or asynchronous) method invocations. Note that SDA instances are started dynamically by a so called coordination task, so that the run-time that implements inter-task communication has to control passing distributed data structures from one task to another, including any possible remapping that might be needed. The run-time accomplishes this through a handshaking protocol, which exchanges the distribution information about the actual argument (on the caller SDA) and the formal one (on the callee SDA) of a given method. Note that this handshaking protocol is very similar to the COLT HPF protocol to create a channel between two tasks. Finally, even though in th...
Optimizing Latency and Reliability of Pipeline Workflow Applications
, 2008
"... Mapping applications onto heterogeneous platforms is a difficult challenge, even for simple application patterns such as pipeline graphs. The problem is even more complex when processors are subject to failure during the execution of the application. In this paper, we study the complexity of a bi-cr ..."
Abstract
-
Cited by 7 (6 self)
- Add to MetaCart
Mapping applications onto heterogeneous platforms is a difficult challenge, even for simple application patterns such as pipeline graphs. The problem is even more complex when processors are subject to failure during the execution of the application. In this paper, we study the complexity of a bi-criteria mapping which aims at optimizing the latency (i.e., the response time) and the reliability (i.e., the probability that the computation will be successful) of the application. Latency is minimized by using faster processors, while reliability is increased by replicating computations on a set of processors. However, replication increases latency (additional communications, slower processors). The application fails to be executed only if all the processors fail during execution. While simple polynomial algorithms can be found for fully homogeneous platforms, the problem becomes NP-hard when tackling heterogeneous platforms. This is yet another illustration of the additional complexity added by heterogeneity.
Multi-criteria scheduling of pipeline workflows
- In HeteroPar’07, the 6th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
, 2007
"... apport de recherche ISSN 0249-6399 ISRN INRIA/RR--6232--FR+ENGMulti-criteria scheduling of pipeline workflows ..."
Abstract
-
Cited by 7 (7 self)
- Add to MetaCart
apport de recherche ISSN 0249-6399 ISRN INRIA/RR--6232--FR+ENGMulti-criteria scheduling of pipeline workflows
Mapping Linear Workflows with Computation/Communication Overlap
"... This paper presents theoretical results related to mapping and scheduling linear workflows onto heterogeneous platforms. We use a realistic architectural model with bounded communication capabilities and full computation/communication overlap. This model is representative of current multi-threaded s ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
This paper presents theoretical results related to mapping and scheduling linear workflows onto heterogeneous platforms. We use a realistic architectural model with bounded communication capabilities and full computation/communication overlap. This model is representative of current multi-threaded systems. In these workflow applications, the goal is often to maximize throughput or to minimize latency. We present several complexity results related to both these criteria. To be precise, we prove that maximizing the throughput is NP-complete even for homogeneous platforms and minimizing the latency is NP-complete for heterogeneous platforms. Moreover, we present an approximation algorithm for throughput maximization for linear chain applications on homogeneous platforms, and an approximation algorithm for latency minimization for linear chain applications on all platforms where communication is homogeneous (the processor speeds can differ). In addition, we present algorithms for several important special cases for linear chain applications. Finally, we consider the implications of adding feedback loops to linear chain applications.
Performance Prediction for Simple CPU and Network Sharing
- IN LACSI SYMPOSIUM 2002
, 2002
"... Performance of virtually all parallel and distributed applications deteriorates when a CPU or a communication link has to be shared, but the extent of the slowdown is application dependent. In our experiments with the NAS benchmarks, the slowdown due to congestion on a single link varied from neglig ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Performance of virtually all parallel and distributed applications deteriorates when a CPU or a communication link has to be shared, but the extent of the slowdown is application dependent. In our experiments with the NAS benchmarks, the slowdown due to congestion on a single link varied from negligible to 120 percent. Estimation of performance of an application under given network conditions is of central importance for resource selection and resource management in shared computing environments. This paper develops a framework to model the performance of applications with CPU and link sharing. The methodology is based on monitoring the application behavior and resource usage on a controlled testbed. The procedure does not require access to the source code or the libraries. We demonstrate that the performance of applications in simple scenarios of network and CPU sharing can be predicted fairly accurately. For the NAS benchmark suite, we observed that the average error in predicting the execution time in different resource sharing scenarios was in the range of 2-6% and the maximum error was below 12%.

