Results 1 - 10
of
25
Utopia: a Load Sharing Facility for Large, Heterogeneous Distributed Computer Systems
, 1993
"... ..."
WebWave: Globally Load Balanced Fully Distributed Caching of Hot Published Documents
- In Proceedings of the 17th International Conference on Distributed Computing Systems
, 1997
"... Document publication service over such a large network as the Internet challenges us to harness available server and network resources to meet fast growing demand. In this paper, we show that large-scale dynamic caching can be employed to globally minimize server idle time, and hence maximize the ag ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
Document publication service over such a large network as the Internet challenges us to harness available server and network resources to meet fast growing demand. In this paper, we show that large-scale dynamic caching can be employed to globally minimize server idle time, and hence maximize the aggregate server throughput of the whole service. To be efficient, scalable and robust, a successful caching mechanism must have three properties: (1) maximize the global throughput of the system, (2) find cache copies without recourse to a directory service, or to a discovery protocol, and (3) be completely distributed in the sense of operating only on the basis of local information. In this paper, we develop a precise definition, which we call tree load-balance (TLB), of what it means for a mechanism to satisfy these three goals. We present an algorithm that computes TLB off-line, and a distributed protocol that induces a load distribution that converges quickly to a TLB one. Both algorithms...
NPSI Adaptive Synchronization Algorithms for PDES
- In 1995 Winter Simulation Proceedings
, 1995
"... Research in parallel discrete event simulation indicates that neither purely conservative nor purely optimistic synchronization algorithms will perform well consistently. We survey several new approaches that attempt to improve performance by limiting optimistic execution. In most of these, the c ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
Research in parallel discrete event simulation indicates that neither purely conservative nor purely optimistic synchronization algorithms will perform well consistently. We survey several new approaches that attempt to improve performance by limiting optimistic execution. In most of these, the criterion for limiting optimism is static or based on local information, which conflicts with the dynamic nature of discrete event simulations. We contend that an adaptive approach based on low cost near-perfect system state information is the most likely to yield a consistently efficient synchronization algorithm. We suggest a framework by which NPSI (near-perfect state information) adaptive protocols could be designed and describe the first such protocol - Elastic Time Algorithm. We present performance results from an implementation of this algorithm which show that adaptive protocols based on the use of NPSI are promising. In particular, we show that NPSI adaptive protocols have th...
History, an Intelligent Load Sharing Filter
- In Proceedings of the 10th International Conference on Distributed Computing Systems
, 1990
"... Load sharing can improve performance in distributed systems by transferring work from heavily loaded nodes to lightly loaded nodes. We propose a filter component to be included in a load sharing algorithm to detect short-lived jobs not worth considering for remote execution. One filter, called Histo ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Load sharing can improve performance in distributed systems by transferring work from heavily loaded nodes to lightly loaded nodes. We propose a filter component to be included in a load sharing algorithm to detect short-lived jobs not worth considering for remote execution. One filter, called History, detects short-lived jobs by using job names and statistics based on previous executions. Job traces were collected from diskless workstations connected by a local area network and supported by a distributed file system. Trace driven simulation was then used to evaluate History with respect to other filters. Two load sharing algorithms showed significant improvement of the mean job response ratio when the History filter was added. 1 Introduction In a computing environment that consists of workstations, a high-speed interprocessor communication network, and shared resources such as file servers and printers, users often observe very unsatisfactory performance. This is often due to the imb...
Regular Versus Irregular Problems and Algorithms.
- In Proc. of IRREGULAR'95
, 1995
"... . Viewing a parallel execution as a set of tasks that execute on a set of processors, a main problem is to find a schedule of the tasks that provides an efficient execution. This usually leads to divide algorithms into two classes: static and dynamic algorithms, depending on whether the schedule dep ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
. Viewing a parallel execution as a set of tasks that execute on a set of processors, a main problem is to find a schedule of the tasks that provides an efficient execution. This usually leads to divide algorithms into two classes: static and dynamic algorithms, depending on whether the schedule depends on the indata or not. To improve this rough classification we study, on some key applications of the Stratag` eme project [21, 22], the different ways schedules can be obtained and the associated overheads. This leads us to propose a classification based on regularity criteria i.e. measures of how much an algorithm is regular (or irregular). For a given algorithm, this expresses more the quality of the schedules that can be found (irregular versus regular) as opposed to the way the schedules are obtained (dynamic versus static). These studies reveal some paradigms of parallel programming for irregular algorithms. Thus, in a second part we study a parallel programming model that takes i...
GATOSTAR: A Fault Tolerant Load Sharing Facility for Parallel Applications
, 1994
"... . This paper presents how and why to unify load sharing and fault tolerance facilities. A realization of a fault tolerant load sharing facility, GATOSTAR, is presented and discussed. It is based on the integration of two applications developed on top of Unix: GATOS and STAR. GATOS is a load sharing ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
. This paper presents how and why to unify load sharing and fault tolerance facilities. A realization of a fault tolerant load sharing facility, GATOSTAR, is presented and discussed. It is based on the integration of two applications developed on top of Unix: GATOS and STAR. GATOS is a load sharing manager which automatically distributes parallel applications among heterogeneous hosts according to multicriteria allocation algorithms. STAR is a software fault tolerance manager which automatically recovers processes of faulty machines based on checkpointing and message logging. The main advantage of this approach is to increase fault tolerant performance by taking advantage of the load sharing policies when allocating or recovering processes. This unification not only improves the efficiency of both facilities but avoids many redundancies mechanisms between them. Indeed, each facility needs to manage at least three common features: global knowledge of the running processors, a crash dete...
An Experimental Study of Load Balancing on Amoeba
- IN PROC. AIZU INTERNATIONAL SYMPOSIUM ON PARALLEL ALGORITHMS / ARCHITECTURE SYNTHESIS
, 1995
"... This paper presents the results of an experimental study of load balancing using job initiation and process migration, carried out on Amoeba. The results indicate the need for a load balancing facility in a distributed system to improve system performance, e.g., the average response time of processe ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This paper presents the results of an experimental study of load balancing using job initiation and process migration, carried out on Amoeba. The results indicate the need for a load balancing facility in a distributed system to improve system performance, e.g., the average response time of processes. A number of load balancing algorithms, including the bidding and neighbouring algorithms, have been studied in this work. A comparison between these algorithms under various conditions is presented, which indicates that in a system with 10 -- 20 computers a centralized algorithm outperforms a distributed one and job initiation plays an important role in a load balancing scheme. We also point out some requirements for an operating system in order to support an efficient load balancing facility, on the basis of our experience. We conclude with a summary of our experiences and suggestions for further work.
Workload characteristics for Process Migration and Load Balancing
- In International Conference on Distributed Computing Systems
, 1997
"... Is process migration useful for load balancing? We present experimental results indicating that the answer to this question depends largely on the characteristics of the applied workload. Experiments with our Shiva system, which supports remote execution and process migration, show that only those ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Is process migration useful for load balancing? We present experimental results indicating that the answer to this question depends largely on the characteristics of the applied workload. Experiments with our Shiva system, which supports remote execution and process migration, show that only those CPU-bound workloads which were generated using an unrealistic exponential distribution for execution times show improvements for dynamic load balancing. (We use the term `dynamic ' to indicate remote execution determined at and not prior to run time. The latter is known as `static' load balancing.) Using a more realistic workload distribution and adding a number of short-lived tasks prevents dynamic algorithms from working. Migration is only useful with heterogeneous workloads. We find the migration of executing tasks to remote data to be effective for balancing I/O-bound workloads, and indicate the region of `workload variable space' for which this migrate-to-data approach is useful. Keywo...
Instrumentation, Modeling and Analysis of Dynamic, Distributed Real-time Systems
- Tech Rep, CSE Dept, UTA
, 1998
"... This paper considers time-constrained systems which must operate in dynamic environments. Systems which operate in such environments may have unknown worst-case scenarios, may have large variances in the sizes of the data and event sets that they process (hence, they may have large variances in exec ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
This paper considers time-constrained systems which must operate in dynamic environments. Systems which operate in such environments may have unknown worst-case scenarios, may have large variances in the sizes of the data and event sets that they process (hence, they may have large variances in execution latencies and resource requirements), and may not be statically characterizable, even by time-invariant statistical distributions. To enable the engineering of such systems, we present an abstract model that combines statically specified system information with the dynamically instrumented state of environmentdependent features. The model is also used to define techniques for QoS (quality-of-service) monitoring and forecasting, QoS diagnosis, and resource allocation analysis. Experimental results show the effectiveness of the approach for detection, prediction and diagnosis of QoS failures, and for restoration of acceptable QoS via reallocation. 1 Sponsored in part by DARPA/NCCOSC co...

