Results 1 - 10
of
20
A semantic framework for integrated asset management
- in Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid
, 2007
"... Integrated Asset Management (IAM) is the vision of ITenabled transformation of oilfield operations where information integration from a variety of tools for reservoir modeling, simulation, and performance prediction will lead to rapid decision making for continuous production optimization. This pape ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Integrated Asset Management (IAM) is the vision of ITenabled transformation of oilfield operations where information integration from a variety of tools for reservoir modeling, simulation, and performance prediction will lead to rapid decision making for continuous production optimization. This paper describes the design of a model-based IAM system for production forecasting. Domain knowledge is captured through a formal modeling language that forms the basis for an intuitive user interface to the system. An IAM metacatalog captures domain knowledge as well as metadata about computational resources and data sets in a single ontological framework, thereby providing a unified mechanism for application, data, and workflow integration. The framework is designed to be portable across oilfield assets, to allow different classes of end users to interact with the integrated system, and to accomodate new domain knowledge, software applications, data sets, and workflows for IAM. 1.
Scheduling of Scientific Workflows on Data Grids
"... Selection of resources for execution of scientific workflows in data grids becomes challenging with the exponential growth of files as a result of the distribution of scientific experiments around the world. With more runs of these experiments, huge number of data-files produced can be made availabl ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Selection of resources for execution of scientific workflows in data grids becomes challenging with the exponential growth of files as a result of the distribution of scientific experiments around the world. With more runs of these experiments, huge number of data-files produced can be made available from numerous resources. There is lack of work in optimal selection of data-hosts and compute resources in the presence of replicated files for scientific workflows. Foreseeing this, the thesis work aims at proposing novel workflow scheduling algorithms on data grids with large number of replicated files that incorporates practical constraints in heterogeneous environments such as Grids. In this paper, we define the workflow scheduling problem statement in the context of data grids, supported by motivating applications; list research issues arising from practical constraints; propose two algorithms for experimenting with the problem; report simulation results obtained as a result of preliminary studies. The results are promising enough to motivate us to research on the problem stated. 1
Making the Best of a Bad Situation: Prioritized Storage Management in GEMS Abstract
"... As distributed storage systems grow, the response time between detection and repair of the error becomes significant. Systems built on shared servers have additional complexity because of the high rate of service outages and revocation. Managing high replica counts in this environment becomes very c ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
As distributed storage systems grow, the response time between detection and repair of the error becomes significant. Systems built on shared servers have additional complexity because of the high rate of service outages and revocation. Managing high replica counts in this environment becomes very costly in terms of the storage required and bandwidth consumption for file copies. The storage challenge for this situation can thus be phrased as an attempt to function inexpensively with respect to cost constraints such as: disk utilization, network bandwidth consumption, and server CPU time. The GEMS (Grid Enabled Molecular Simulation) storage system provides a replicated and shared workspace for large scale molecular dynamics simulations, and exemplifies the above issues. The GEMS framework offers a solution to this problem by accessing metadata, prioritizing observed faults, and repairing them in an intelligent manner. In this paper, we provide observations from the operation of GEMS and compare its error handling to like storage systems. Key words: replicated, shared, simulation, storage 1
Fair Resource Sharing in Hierarchical Virtual Organizations for Global Grids
"... Abstract — In global Grid computing, users and resource providers organize various Virtual Organizations (VOs) to share resources and services. A VO organizes other sub-VOs for the purpose of achieving the VO goal, which forms hierarchical VO environments. Resource providers and VOs agree upon VO re ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — In global Grid computing, users and resource providers organize various Virtual Organizations (VOs) to share resources and services. A VO organizes other sub-VOs for the purpose of achieving the VO goal, which forms hierarchical VO environments. Resource providers and VOs agree upon VO resource sharing policies, such as resource sharing amount. Thus, users in lower-layer VOs can access resources in higher-layer VOs to accomplish their common goals. In this paper, we deal with fair resource allocation problem in hierarchical VOs, so that an appropriate proportion of a VO resource for each lower-layer VO is analyzed. In addition, we provide a resource allocation scheme based on these predefined proportions. Simulation results show that the proposed approach gives better fairness as well as performance compared with other schemes. I.
Data Transfers in the Grid: Workload Analysis of Globus GridFTP
"... One of the basic services in grids is the transfer of data between remote machines. Files may be transferred at the explicit request of the user or as part of delegated resource management services, such as data replication or job scheduling. GridFTP is an important tool for such data transfers sinc ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
One of the basic services in grids is the transfer of data between remote machines. Files may be transferred at the explicit request of the user or as part of delegated resource management services, such as data replication or job scheduling. GridFTP is an important tool for such data transfers since it builds on the common FTP protocol, has a large user base with multiple implementations, and it uses the GSI security model that allows delegated operations. This paper presents a workload analysis of the implementation of the GridFTP protocol provided by the Globus Toolkit. We studied more than 1.5 years of traces reported from all over the world by Globus GridFTP installed components. Our study focuses on three dimensions: first, it quantifies the volume of data transferred and characterizes user behavior. Second, it attempts to show how tuning capabilities are used in practice. Finally, it quantifies the user base as recorded in the database and highlights the usage trends of this software component.
1 Data Replication in Data Intensive Scientific Applications With Performance Guarantee
"... Abstract — Data replication has been well adopted in data intensive scientific applications to reduce data file transfer time and bandwidth consumption. However, the problem of data replication in Data Grids, an enabling technology for data intensive applications, has proven to be NP-hard and even n ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract — Data replication has been well adopted in data intensive scientific applications to reduce data file transfer time and bandwidth consumption. However, the problem of data replication in Data Grids, an enabling technology for data intensive applications, has proven to be NP-hard and even non-approximable, making this problem difficult to solve. Meanwhile, most of the previous research in this field is either theoretical investigation without practical consideration, or heuristics-based with little or no theoretical performance guarantee. In this paper, we propose a data replication algorithm that not only has a provable theoretical performance guarantee, but also can be implemented in a distributed and practical manner. Specifically, we design a polynomial time centralized replication algorithm that reduces the total data file access delay by at least half of that reduced by the optimal replication solution. Based on this centralized algorithm, we also design a distributed caching algorithm, which can be easily adopted in a distributed environment such as Data Grids. Extensive simulations are performed to validate the efficiency of our proposed algorithms. Using our own simulator, we show that our centralized replication algorithm performs comparably to the optimal algorithm and other intuitive heuristics under different network parameters. Using GridSim, a popular distributed Grid simulator, we demonstrate that the distributed caching technique significantly outperforms an existing popular file caching technique in Data Grids, and it is more scalable and adaptive to the dynamic change of file access patterns in Data Grids.
A Taxonomy of Desktop Grids and its Mapping to State of the Art Systems
"... Desktop Grid has emerged as an attractive computing paradigm for high throughput applications. However, building such systems is complicated due to resources ’ heterogeneity, failures, nondedication, volatility, and lack of trust, since they (that is, desktop computers) are at the edge of the Intern ..."
Abstract
- Add to MetaCart
Desktop Grid has emerged as an attractive computing paradigm for high throughput applications. However, building such systems is complicated due to resources ’ heterogeneity, failures, nondedication, volatility, and lack of trust, since they (that is, desktop computers) are at the edge of the Internet and owned by different individuals. Therefore, it is important to understand how these distinct characteristics impact on architecture, execution model, resource management, and scheduling. In this article, we investigate architectural elements and then provide a new taxonomy
A Vision for Next Generation Query Processors and an Associated Research Agenda
"... Abstract. Query processing is one of the most important mechanisms for data management, and there exist mature techniques for effective query optimization and efficient query execution. The vast majority of these techniques assume workloads of rather small transactional tasks with strong requirement ..."
Abstract
- Add to MetaCart
Abstract. Query processing is one of the most important mechanisms for data management, and there exist mature techniques for effective query optimization and efficient query execution. The vast majority of these techniques assume workloads of rather small transactional tasks with strong requirements for ACID properties. However, the emergence of new computing paradigms, such as grid and cloud computing, the increasingly large volumes of data commonly processed, the need to support data driven research, intensive data analysis and new scenarios, such as processing data streams on the fly or querying web services, the fact that the metadata fed to optimizers are often missing at compile time, and the growing interest in novel optimization criteria, such as monetary cost or energy consumption, create a unique set of new requirements for query processing systems. These requirements cannot be met by modern techniques in their entirety, although interesting solutions and efficient tools have already been developed for some of them in isolation. Next generation query processors are expected to combine features addressing all of these issues, and, consequently, lie at the confluence of several research initiatives. This paper aims to present a vision for such processors, to explain their functionality requirements, and to discuss the open issues, along with their challenges. 1
Database Oriented Grid Middlewares
"... Abstract—Efficient management of massive data sets is a key aspect in typical grid and e-science applications. To this end, the benefits of employing database technologies in such applications has been identified since the early days of grid computing, which aims at enabling coordinated resource sha ..."
Abstract
- Add to MetaCart
Abstract—Efficient management of massive data sets is a key aspect in typical grid and e-science applications. To this end, the benefits of employing database technologies in such applications has been identified since the early days of grid computing, which aims at enabling coordinated resource sharing, knowledge generation and problem solving in dynamic, multi-institutional virtual organizations through a distributed, scalable, adaptive and autonomous infrastructure. Nowadays, databases are playing an increasingly important role, both when the data is (semi-) structured and when it is stored in flat files. This survey paper discusses in detail existing databaseoriented grid middleware. It focuses on several complementary aspects, such as dynamicity, autonomy, resilience to failures, and performance, and presents the characteristics, capabilities and limitations of existing solutions. I.

