Results 1 -
2 of
2
Parallel Cluster Labeling on a Network of Workstations
- In Proceedings of the Thirteenth Brazilian Symposium on Computer Networks
, 1995
"... In recent years, encouraged by today's fast workstations and by software systems designed to transform workstation clusters into parallel programming environments, network of workstations have been increasingly used as computational engines. Networked workstations, however, are not ideal replacement ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In recent years, encouraged by today's fast workstations and by software systems designed to transform workstation clusters into parallel programming environments, network of workstations have been increasingly used as computational engines. Networked workstations, however, are not ideal replacements for supercomputers, because of the low interconnection capacity provided by current local area networks. In this paper, we present an application using the EcliPSe toolkit, a system for replication-based parallel processing in heterogeneous environments. Although primarily designed for replicative applications (that generally do not require large amounts of communication), EcliPSe can be used for more general forms of message-passing parallel processing. We describe the use of the toolkit in the parallelization of a cluster labeling algorithm. The algorithm is designed so that it uses some of EcliPSe's features to reduce communication overhead, making the algorithm suitable for execution o...
Fail-Safe Concurrency in the Eclipse System
, 1996
"... Local or wide-area heterogeneous workstation clusters are relatively cheap and highly effective, though inherently unstable operating environments for long-running distributed computations. We found this to be the case in early experiments with a prototype of the EcliPSe system, a software toolkit f ..."
Abstract
- Add to MetaCart
Local or wide-area heterogeneous workstation clusters are relatively cheap and highly effective, though inherently unstable operating environments for long-running distributed computations. We found this to be the case in early experiments with a prototype of the EcliPSe system, a software toolkit for replicative applications on heterogeneous workstation clusters. Hardware or network failures in computations that executed for over a day were not uncommon. In this work, a variety of features for the incorporation of failure resilience in the EcliPSe system are described. Key characteristics of this fault-tolerant system are ease of use, low state-saving cost, system scalability, and good performance. We present results of some experiments demonstrating low state-saving overheads and small systemrecovery times, as a function of the amount of state saved. Research supported in part by NATO-CRG900108, ONR-9310233, ONR-9310278, and ARO-93G0045 y Research supported by CNPq-Brazil, proces...

