Results 1  10
of
44
On honey bees and dynamic server allocation in internet hosting centers
 Adaptive Behavior
, 2004
"... On behalf of: ..."
Metacomputing with MILAN
 In Proceeding of the 8 th Heterogeneous Computing Workshop
, 1999
"... The MILAN project, a joint effort involving Arizona State University and NewYork University, has produced and validated fundamental techniques for the realization of efficient, reliable, predictable virtual machines on top of metacomputing environments that consist of an unreliable and dynamically c ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
The MILAN project, a joint effort involving Arizona State University and NewYork University, has produced and validated fundamental techniques for the realization of efficient, reliable, predictable virtual machines on top of metacomputing environments that consist of an unreliable and dynamically changing set of machines. In addition to the techniques, the principal outcomes of the project include three parallel programming systemsCalypso, Chime, and Charlotte which enable applications be developed for ideal, shared memory, parallel machines to execute on distributed platforms that are subject to failures, slowdowns, and changing resource availability. The lessons learnt from the MILAN project are being used to design Computing Communities, a metacomputing framework for general computations. 1. Motivation MILAN (Metacomputing In Large Asynchronous Networks) is a joint project of Arizona State University and NewYork University. The primary objective of the MILAN project is to p...
Fault Tolerant Scheduling in Distributed Networks
, 1996
"... We present a model for applicationlevel fault tolerance for parallel applications. The objective is to achieve high reliability with minimal impact on the application. Our approach is based on a full replication of all parallel application components in a distributed widearea environment in which ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
We present a model for applicationlevel fault tolerance for parallel applications. The objective is to achieve high reliability with minimal impact on the application. Our approach is based on a full replication of all parallel application components in a distributed widearea environment in which each replica is independently scheduled in a different site. A system architecture for coordinating the replicas is described. The fault tolerance mechanism is being added to a widearea scheduler prototype in the Legion parallel processing system. A performance evaluation of the fault tolerant scheduler and a comparison to the traditional means of fault tolerance, checkpointrecovery, is planned. 1 1.0 Introduction Distributed networks of heterogeneous workstations have the potential to become the dominant platform for highperformance computing. One obstacle to realizing this potential is reliability or fault tolerance. Reliability becomes increasingly important as the computing environ...
The Complexity of Synchronous Iterative DoAll with Crashes
, 2001
"... DoAll is the problem of performing N tasks in a distributed system of P failureprone processors [9]. Many distributed and parallel algorithms have been developed for this basic problem and several algorithm simulations have been developed by iterating DoAll algorithms. The eciency of the solut ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
DoAll is the problem of performing N tasks in a distributed system of P failureprone processors [9]. Many distributed and parallel algorithms have been developed for this basic problem and several algorithm simulations have been developed by iterating DoAll algorithms. The eciency of the solutions for DoAll is measured in terms of work complexity where all processing steps taken by the processors are counted. Work is ideally expressed as a function of N , P , and f , the number of processor crashes. However the known lower bounds and the upper bounds for extant algorithms do not adequately show how work depends on f . We present the rst nontrivial lower bounds for DoAll that capture the dependence of work on N , P and f . For the model of computation where processors are able to make perfect loadbalancing decisions locally, we also present matching upper bounds. Thus we give the rst complete analysis of DoAll for this model. We dene the riterative DoAll problem that abstracts the repeated use of DoAll such as found in algorithm simulations. Our fsensitive analysis enables us to derive a tight bound for riterative DoAll work (that is stronger than the rfold work complexity of a single DoAll). Our approach that models perfect loadbalancing allows for the analysis of specic algorithms to be divided into two parts: (i) the analysis of the cost of tolerating failures while performing work, and (ii) the analysis of the cost of implementing loadbalancing. We demonstrate the utility and generality of this approach by improving the analysis of two known ecient algorithms. We give an improved analysis of an ecient messagepassing algorithm (algorithm AN [5]). We also derive a new and complete analysis of the best known DoAll algorithm for...
Java on Networks of Workstations (JavaNOW): A Parallel Computing Framework Inspired by Linda and the Message Passing Interface (MPI)
"... Networks of workstations are a dominant force in the distributed computing arena, due primarily to the excellent price/performance ratio of such systems when compared to traditionally massively parallel architectures. It is therefore critical to develop programming languages and environments that ca ..."
Abstract

Cited by 11 (2 self)
 Add to MetaCart
Networks of workstations are a dominant force in the distributed computing arena, due primarily to the excellent price/performance ratio of such systems when compared to traditionally massively parallel architectures. It is therefore critical to develop programming languages and environments that can potentially harness the raw computational power availab le on these systems. In this article, we present JavaNOW (Java on Networks of Workstations), a Java based framework for parallel programming on networks of workstations. It creates a virtual parallel machine similar to the MPI (Message Passing Interface) model, and provides distributed associative shared memory similar to Linda memory model but with a flexible set of primitive operations. JavaNOW provides a simple yet powerful framework for performing computation on networks of workstations. In addition to the Linda memory model, it provides for shared objects, implicit multithreading, implicit synchronization, object dataflow, and collective communications similar to those defined in MPI. JavaNOW is also a component of the Computational Neighborhood [63], a Javaenabled suite of services for desktop computational sharing. The intent of JavaNOW is to present an environment for parallel computing that is both expressive and reliable and ultimately can deliver good to excellent performance. As JavaNOW is a work in progress, this article emphasizes the expressive potential of the JavaNOW environment and presents preliminary performance results only.
Javelin: Parallel Computing on the Internet
 Future Generation Computer Systems
, 1999
"... Java offers the basic infrastructure needed to integrate computers connected to the Internet into a seamless distributed computational resource: an infrastructure for running coarsegrained parallel applications on numerous, anonymous machines. First, we sketch such a resource's essential tec ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
Java offers the basic infrastructure needed to integrate computers connected to the Internet into a seamless distributed computational resource: an infrastructure for running coarsegrained parallel applications on numerous, anonymous machines. First, we sketch such a resource's essential technical properties. Then, we present a prototype of Javelin, an infrastructure for global computing. The system is based on Internet software that is interoperable, increasingly secure, and ubiquitous: Javaenabled Web technology. Ease of participation is seen as a key property for such a resource to realize the vision of a multiprocessing environment comprising thousands of computers. Javelin's architecture and implementation require participants to have access to only a Javaenabled Web browser. Experimental results are given in the form of a Mersenne Prime application and a raytracing application that run on a heterogeneous network of several parallel machines, workstations, and PCs. Tw...
A Framework for Automatic Adaptation of Tunable Distributed Applications. Cluster Computing 4(1
, 2001
"... ..."
Exploiting Application Tunability for Efficient, Predictable Resource Management in Parallel and Distributed Systems
 In Proc. 13th Intl. Parallel Processing Symposium
, 1999
"... this paper, we propose a novel approach ..."
Towards Practical Deterministic WriteAll Algorithms
 IN PROC., 13TH ACM SYMP. ON PARALLEL ALGORITHMS AND ARCHITECTURES, 2001
, 2001
"... The problem of performing t tasks on n asynchronous or undependable processors is a basic problem in parallel and distributed computing. We consider an abstraction of this problem called the WriteAl l problemusing n processors write 1's into all locations of an array of size t. The most e# ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
The problem of performing t tasks on n asynchronous or undependable processors is a basic problem in parallel and distributed computing. We consider an abstraction of this problem called the WriteAl l problemusing n processors write 1's into all locations of an array of size t. The most e#cient known deterministic asynchronous algorithms for this problem are due to Anderson and Woll. The first class of algorithms has work complexity of O(t ), for n t and any #>0, and they are the best known for the full range of processors (n = t). To schedule the work of the processors, the algorithms use lists of q permutations on [q](q n) that have certain combinatorial properties. Instantiating such an algorithm for a specific # either requires substantial preprocessing (exponential in 1/# )to find the requisite permutations, or imposes a prohibitive constant (exponential in 1/# ) hidden by the asymptotic analysis. The second class deals with the specific case of t = n 2, and these algorithms have work complexity of O(t log t). They also use lists of permutations with the same combinatorial properties. However instantiating these algorithms requires exponential in n preprocessing to find the permutations. To alleviate this costly instantiation Kanellakis and Shvartsman proposed a simple way of computing the permutations. They conjectured that their construction has the desired properties but they provided no analysis. In this paper
Optimal Scheduling for Disconnected Cooperation
, 2001
"... We consider a distributed environment consisting of n processors that need to perform t tasks. We assume that communication is initially unavailable and that processors begin work in isolation. At some unknown point of time an unknown collection of processors may establish communication. Before proc ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
We consider a distributed environment consisting of n processors that need to perform t tasks. We assume that communication is initially unavailable and that processors begin work in isolation. At some unknown point of time an unknown collection of processors may establish communication. Before processors begin communication they execute tasks in the order given by their schedules. Our goal is to schedule work of isolated processors so that when communication is established for the rst time, the number of redundantly executed tasks is controlled. We quantify worst case redundancy as a function of processor advancements through their schedules. In this work we rene and simplify an extant deterministic construction for schedules with n t, and we develop a new analysis of its waste. The new analysis shows that for any pair of schedules, the number of redundant tasks can be controlled for the entire range of t tasks. Our new result is asymptotically optimal: the tails of these schedules are within a 1 +O(n 1 4 ) factor of the lower bound. We also present two new deterministic constructions one for t n, and the other for t n 3=2 , which substantially improve pairwise waste for all prexes of length t= p n, and oer near optimal waste for the tails of the schedules. Finally, we present bounds for waste of any collection of k 2 processors for both deterministic and randomized constructions. 1