Results 1 - 10
of
39
Modeling Machine Availability in Enterprise and Wide-area Distributed Computing Environments
- In Euro-Par’05
, 2003
"... In this paper, we consider the problem of modeling machine availability in enterprise-area and wide-area distributed computing settings. Using availability data gathered from three different environments, we detail the suitability of four potential statistical distributions for each data set: expone ..."
Abstract
-
Cited by 51 (7 self)
- Add to MetaCart
In this paper, we consider the problem of modeling machine availability in enterprise-area and wide-area distributed computing settings. Using availability data gathered from three different environments, we detail the suitability of four potential statistical distributions for each data set: exponential, Pareto, Weibull, and hyperexponential. In each case, we use software we have developed to determine the necessary parameters automatically from each data collection.
Exploiting Availability Prediction in Distributed Systems
, 2006
"... Loosely-coupled distributed systems have significant scale and cost advantages over more traditional architectures, but the availability of the nodes in these systems varies widely. Availability modeling is crucial for predicting per-machine resource burdens and understanding emergent, system-wide p ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
Loosely-coupled distributed systems have significant scale and cost advantages over more traditional architectures, but the availability of the nodes in these systems varies widely. Availability modeling is crucial for predicting per-machine resource burdens and understanding emergent, system-wide phenomena. We present new techniques for predicting availability and test them using traces taken from three distributed systems. We then describe three applications of availability prediction. The first, availability-guided replica placement, reduces object copying in a distributed data store while increasing data availability. The second shows how availability prediction can improve routing in delay-tolerant networks. The third combines availability prediction with virus modeling to improve forecasts of global infection dynamics.
Automatic Methods for Predicting Machine Availability in Desktop Grid and Peer-to-peer Systems
- In Proceedings of the of the IEEE International Symposium on Cluster Computing and the Grid (CCGrid’04
, 2004
"... In this paper, we examine the problem of predicting machine availability in desktop and enterprise computing environments. Predicting the duration that a machine will run until it restarts (availability duration) is critically useful to application scheduling and resource characterization in federat ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
In this paper, we examine the problem of predicting machine availability in desktop and enterprise computing environments. Predicting the duration that a machine will run until it restarts (availability duration) is critically useful to application scheduling and resource characterization in federated systems. We describe one parametric model fitting technique and two non-parametric prediction techniques, comparing their accuracy in predicting the quantiles of empirically observed machine availability distributions. We describe each method analytically and evaluate its precision using a synthetic trace of machine availability constructed from a known distribution. To detail their practical efficacy, we apply them to machine availability traces from three separate desktop and enterprise computing environments, and evaluate each method in terms of the accuracy with which it predicts availability in a trace driven simulation. Our results indicate that availability duration can be predicted with quantifiable confidence bounds and that these bounds can be used as conservative bounds on lifetime predictions. Moreover, a non-parametric method based on a binomial approach generates the most accurate estimates.
Quantifying Machine Availability in Networked and Desktop Grid Systems
- In Proceedings of CCGrid04
, 2003
"... In this paper, we examine the problem of predicting machine availability in desktop and enterprise computing environments. Predicting the duration that a machine will run until it restarts (availability duration) is critically useful to application scheduling and resource characterization in federat ..."
Abstract
-
Cited by 12 (5 self)
- Add to MetaCart
In this paper, we examine the problem of predicting machine availability in desktop and enterprise computing environments. Predicting the duration that a machine will run until it restarts (availability duration) is critically useful to application scheduling and resource characterization in federated systems. We describe one parametric model fitting technique and two non-parametric prediction techniques, comparing their accuracy in predicting the quantiles of empirically observed machine availability distributions. We describe each method analytically and evaluate its precision using a synthetic trace of machine availability constructed from a known distribution. To detail their practical efficacy, we apply them to machine availability traces from three separate desktop and enterprise computing environments, and evaluate each method in terms of the accuracy with which it predicts availability in a trace driven simulation.
Classic Paxos vs. Fast Paxos: Caveat Emptor
- In Proceedings of the IEEE Workshop on Hot Topics in System Dependability (HotDep
, 2007
"... Classic Paxos and Fast Paxos are two protocols that are the core of efficient implementations of replicated state machines. In runs with no failures and no conflicts, Fast Paxos requires fewer communication steps for learners to learn of a request compared to Classic Paxos. However, there are realis ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Classic Paxos and Fast Paxos are two protocols that are the core of efficient implementations of replicated state machines. In runs with no failures and no conflicts, Fast Paxos requires fewer communication steps for learners to learn of a request compared to Classic Paxos. However, there are realistic scenarios in which Classic Paxos has a significant probability of having a lower latency. This paper discusses one such scenario with an analytical comparison of the protocols and simulation results. 1
Balanced multicasting: High-throughput communication for grid applications
- in Proceedings of ACM/IEEE Conference on Supercomputing
, 2005
"... Many grid applications need to transfer large amounts of data between the geographically distributed sites of a grid environment. Network heterogeneity between these sites makes throughput optimization of data transfers to multiple sites (multicast) hard or even impossible. We present a technique ca ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Many grid applications need to transfer large amounts of data between the geographically distributed sites of a grid environment. Network heterogeneity between these sites makes throughput optimization of data transfers to multiple sites (multicast) hard or even impossible. We present a technique called balanced multicasting that uses monitoring information for both bandwidth capacity and achievable bandwidth to compute balanced multicast trees at runtime that use application-level traffic shaping at the sender side to avoid self-induced congestion. Our experimental evaluation shows that our approach outperforms existing multicast strategies by large margins. 1.
Trace-Based Evaluation of Job Runtime and Queue Wait Time Predictions in Grids
"... Large-scale distributed computing systems such as grids are serving a growing number of scientists. These environments bring about not only the advantages of an economy of scale, but also the challenges of resource and workload heterogeneity. A consequence of these two forms of heterogeneity is that ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Large-scale distributed computing systems such as grids are serving a growing number of scientists. These environments bring about not only the advantages of an economy of scale, but also the challenges of resource and workload heterogeneity. A consequence of these two forms of heterogeneity is that job runtimes and queue wait times are highly variable, which generally reduces system performance and makes grids difficult to use by the common scientist. Predicting job runtimes and queue wait times have been widely studied for parallel environments. However, there is no detailed investigationonhowtheproposedpredictionmethodsperform in grids, whose resource structure and workload characteristics are very different from those in parallel systems. In this paper, we assess the performance and benefit of predicting job runtimes and queue wait times in grids based on traces gathered from various research and production grid environments. First, we evaluate the performance of simple yet widely used time series prediction methods and the effect of applying them to different types of job classes (e.g., all jobs submitted by single users or to single sites). Then, we investigate the performance of two kinds of queue wait time prediction methods for grids. Last, we investigate whether prediction-based grid-level scheduling policies can have better performance than policies that do not use predictions.
Resource Bundles: Using Aggregation for Statistical Wide-Area Resource Discovery and Allocation
, 2007
"... Resource discovery is an important process for finding suitable nodes that satisfy application requirements in large loosely-coupled distributed systems. Besides inter-node heterogeneity, many of these systems also show a high degree of intra-node dynamism, so that selecting nodes based only on thei ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Resource discovery is an important process for finding suitable nodes that satisfy application requirements in large loosely-coupled distributed systems. Besides inter-node heterogeneity, many of these systems also show a high degree of intra-node dynamism, so that selecting nodes based only on their recently observed resource capacities for scalability reasons can lead to poor deployment decisions resulting in application failures or migration overheads. In this paper, we propose the notion of a resource bundle— a representative resource usage distribution for a group of nodes with similar resource usage patterns—that employs two complementary techniques to overcome the limitations of existing techniques: resource usage histograms to provide statistical guarantees for resource capacities, and clustering-based resource aggregation to achieve scalability. Using trace-driven simulations and data analysis of a month-long PlanetLab trace, we show that resource bundles are able to provide high accuracy for statistical resource discovery (up to 56 % better precision than using only recent values), while achieving high scalability (up to 55% fewer messages than a non-aggregation algorithm). We also show that resource bundles are ideally suited for identifying group-level characteristics such as finding load hot spots and estimating total group capacity (within 8 % of actual values). 1.
MOB: Zero-configuration High-throughput Multicasting for Grid Applications
- In Proc. of the 16th International Symposium on High-Performance Distributed Computing (HPDC-16
, 2007
"... Grid applications often need to distribute large amounts of data efficiently from one cluster to multiple others (multicast). Existing methods usually arrange nodes in optimized tree structures, based on external network monitoring data. This dependence on monitoring data, however, severely impacts ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Grid applications often need to distribute large amounts of data efficiently from one cluster to multiple others (multicast). Existing methods usually arrange nodes in optimized tree structures, based on external network monitoring data. This dependence on monitoring data, however, severely impacts both ease of deployment and adaptivity to dynamically changing network conditions. In this paper, we present Multicast Optimizing Bandwidth (MOB), a high-throughput multicast approach, inspired by the BitTorrent protocol [4]. With MOB, data transfers are initiated by the receivers that try to steal data from peer clusters. Instead of using potentially outdated monitoring data, MOB automatically adapts to the currently achievable bandwidth ratios. Our experimental evaluation compares MOB to both the BitTorrent protocol and to our previous approach, Balanced Multicasting [11], the latter optimizing multicast trees based on external monitoring data. We show that MOB outperforms the BitTorrent protocol. MOB is competitive with Balanced Multicasting as long as the network bandwidth remains stable. With dynamically changing bandwith, MOB outperforms Balanced Multicasting by wide margins.
Efficient Resource Virtualization and Sharing Strategies for Heterogeneous Grid Environments
"... Abstract — Resource virtualization has emerged as a powerful technique for customized resource provisioning in grid and data center environments. In this paper, we describe efficient strategies for policy-based controlling of virtualization of the physical resources. With these strategies, virtualiz ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract — Resource virtualization has emerged as a powerful technique for customized resource provisioning in grid and data center environments. In this paper, we describe efficient strategies for policy-based controlling of virtualization of the physical resources. With these strategies, virtualization is controlled taking into account workload requirements, available capacities of physical resources, and the governing policies. Realizing this control requires simultaneous handling of three problems: (i) determining the virtual resource configurations, (ii) the mapping of resulting virtual resources to physical resources, and (iii) the mapping of workloads to the virtual resources. We pose this as an optimization problem and solve this problem using a linear programming (LP) based approach. We evaluate this approach by implementing it in the Harmony grid environment consisting of heterogeneous resources and heterogeneous workload. Experimental results indicate that our approach is efficient and effective. We extend this approach further by using a two-phase heuristic that allows the decision making component to scale up to handle large scale grid systems. I.

