Results 1 - 10
of
39
Grids and Grid Technologies for Wide-Area Distributed Computing
- SOFTWARE: PRACTICE AND EXPERIENCE
, 2002
"... The last decade has seen a substantial increase in commodity computer and network performance, mainly as a result of faster hardware and more sophisticated software. Nevertheless, there are still problems, in the fields of science, engineering, and business, which cannot be effectively dealt with us ..."
Abstract
-
Cited by 60 (15 self)
- Add to MetaCart
The last decade has seen a substantial increase in commodity computer and network performance, mainly as a result of faster hardware and more sophisticated software. Nevertheless, there are still problems, in the fields of science, engineering, and business, which cannot be effectively dealt with using the current generation of supercomputers. In fact, due to their size and complexity, these problems are often very numerically and/or data intensive and consequently require a variety of heterogeneous resources that are not available on a single machine. A number of teams have conducted experimental studies on the cooperative use of geographically distributed resources unified to act as a single powerful computer. This new approach is known by several names, such as, metacomputing, scalable computing, global computing, Internet computing, and more recently peer-to-peer or Grid computing. The early efforts in Grid computing started as a project to link supercomputing sites, but have now grown far beyond its original intent. In fact, many applications that can benefit from the Grid infrastructure, including collaborative engineering, data exploration, high throughput computing, and of course distributed supercomputing. Moreover, due to the rapid growth of the Internet and Web, there has been a rising interest in Web-based distributed computing, and many projects have been started and aim to exploit the Web as an infrastructure for running coarse-grained distributed and parallel applications. In this context, the Web has the capability to a platform for parallel and collaborative work as well as a key technology to create a pervasive and ubiquitous Grid-based infrastructure. This paper aims to present the state-of-the-art of Grid computing and attempts to survey the m...
Predicting the Performance of Wide Area Data Transfers
, 2002
"... As Data Grids become more commonplace, large data sets are being replicated and distributed to multiple sites, leading to the problem of determining which replica can be accessed most efficiently. The answer to this question can depend on many factors, including physical characteristics of the resou ..."
Abstract
-
Cited by 58 (9 self)
- Add to MetaCart
As Data Grids become more commonplace, large data sets are being replicated and distributed to multiple sites, leading to the problem of determining which replica can be accessed most efficiently. The answer to this question can depend on many factors, including physical characteristics of the resources and the load behavior on the CPUs, networks, and storage devices that are part of the end-to-end path linking possible sources and sinks.
A Metadata Catalog Service for Data Intensive Applications
, 2003
"... Advances in computational, storage and network technologies as well as middle ware such as the Globus Toolkit allow scientists to expand the sophistication and scope of data-intensive applications. ..."
Abstract
-
Cited by 56 (2 self)
- Add to MetaCart
Advances in computational, storage and network technologies as well as middle ware such as the Globus Toolkit allow scientists to expand the sophistication and scope of data-intensive applications.
Modeling Machine Availability in Enterprise and Wide-area Distributed Computing Environments
- In Euro-Par’05
, 2003
"... In this paper, we consider the problem of modeling machine availability in enterprise-area and wide-area distributed computing settings. Using availability data gathered from three different environments, we detail the suitability of four potential statistical distributions for each data set: expone ..."
Abstract
-
Cited by 51 (7 self)
- Add to MetaCart
In this paper, we consider the problem of modeling machine availability in enterprise-area and wide-area distributed computing settings. Using availability data gathered from three different environments, we detail the suitability of four potential statistical distributions for each data set: exponential, Pareto, Weibull, and hyperexponential. In each case, we use software we have developed to determine the necessary parameters automatically from each data collection.
Storage Resource Managers: Middleware Components for Grid Storage
, 2002
"... The amount of scientific data generated by simulations or collected from large scale experiments have reached levels that cannot be stored in the researcher's workstation or even in his/her local computer center. Such data are vital to large scientific collaborations dispersed over wide-area network ..."
Abstract
-
Cited by 49 (5 self)
- Add to MetaCart
The amount of scientific data generated by simulations or collected from large scale experiments have reached levels that cannot be stored in the researcher's workstation or even in his/her local computer center. Such data are vital to large scientific collaborations dispersed over wide-area networks. In the past, the concept of a Grid infrastructure [1] mainly emphasized the computational aspect of supporting large distributed computational tasks, and managing the sharing of the network bandwidth by using bandwidth reservation techniques. In this paper we discuss the concept of Storage Resource Managers (SRMs) as components that complement this with the support for the storage management of large distributed datasets. The access to data is becoming the main bottleneck in such "data intensive" applications because the data cannot be replicated in all sites. SRMs are designed to dynamically optimize the use of storage resources to help unclog this bottleneck.
GriPhyN and LIGO, Building a Virtual Data Grid for Gravitational Wave Scientists
- 11th Intl Symposium on High Performance Distributed Computing
, 2002
"... Many Physics experiments today generate large volumes of data. That data is then processed in a variety of ways in order to achieve the understanding of fundamental physical phenomena. The goal of the NSF-funded GriPhyN project (Grid Physics Network) is to enable scientists to seamlessly access data ..."
Abstract
-
Cited by 43 (17 self)
- Add to MetaCart
Many Physics experiments today generate large volumes of data. That data is then processed in a variety of ways in order to achieve the understanding of fundamental physical phenomena. The goal of the NSF-funded GriPhyN project (Grid Physics Network) is to enable scientists to seamlessly access data whether it is raw experimental data or a data product which is a result of further processing. GriPhyN provides a new degree of transparency in how datahandling and processing capabilities are integrated to deliver data products to end-users or applications, so that requests for such products are easily mapped into computation and/or data access at multiple locations. GriPhyN refers to the set of all data products available to the user as Virtual Data. Among the physics applications participating in the project is the Laser Interferometer Gravitationalwave Observatory (LIGO), which is being built to observe the gravitational waves predicted by general relativity. In this paper, we describe our initial design and prototype of a Virtual Data Grid for LIGO.
Data Replication Strategies in Grid Environments
- in Proceedings of the Fifth International Conference on Algorithms and Architectures for Parallel Processing (ICA3PP’02
, 2002
"... Data Grids provide geographically distributed resources for large-scale data-intensive applications that generate large data sets. However, ensuring efficient and fast access to such huge and widely distributed data is hindered by the high latencies of the Internet. To address these problems we intr ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
Data Grids provide geographically distributed resources for large-scale data-intensive applications that generate large data sets. However, ensuring efficient and fast access to such huge and widely distributed data is hindered by the high latencies of the Internet. To address these problems we introduce a set of replication management services and protocols that offer high data availability, low bandwidth consumption, increased fault tolerance, and improved scalability of the overall system. Replication decisions are made based on a cost model that evaluates data access costs and performance gains of creating each replica. The estimation of costs and gains is based on factors such as run-time accumulated read/write statistics, response time, bandwidth, and replica size. To address scalability, replicas are organized in a combination of hierarchical and flat topologies that represent propagation graphs that minimize interreplica communication costs. To evaluate our model we use the network simulator NS. Our results prove that replication improves the performance of the data access on Data Grids, and that the gain increases with the size of the datasets used. 1.
The Earth System Grid: Supporting the Next Generation of Climate Modeling Research
- Proceedings of the IEEE
, 2005
"... Abstract—Understanding the Earth’s climate system and how it might be changing is a preeminent scientific challenge. Global climate models are used to simulate past, present, and future climates, and experiments are executed continuously on an array of distributed supercomputers. The resulting data ..."
Abstract
-
Cited by 30 (14 self)
- Add to MetaCart
Abstract—Understanding the Earth’s climate system and how it might be changing is a preeminent scientific challenge. Global climate models are used to simulate past, present, and future climates, and experiments are executed continuously on an array of distributed supercomputers. The resulting data archive, spread over several sites, currently contains upwards of one hundred terabytes of simulation data and is growing rapidly. Looking towards mid-decade and beyond, we must anticipate and prepare for distributed climate research data holdings of many petabytes. The Earth System Grid (ESG) is a collaborative interdisciplinary project aimed at addressing the challenge of enabling management, discovery, access, and analysis of these critically important datasets in a distributed and heterogeneous computational environment. The problem is fundamentally a Grid problem. Building upon
Using Regression Techniques to Predict Large Data Transfers
- International Journal of High Performance Computing Applications
, 2003
"... {vazhkuda, ..."
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
- ACM Comput. Surv
, 2006
"... Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. ..."
Abstract
-
Cited by 27 (7 self)
- Add to MetaCart
Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases.

