Results 1 -
3 of
3
The requirements of recording and using provenance in e-science experiments
- Journal of Grid Computing
, 2005
"... Abstract. In e-Science experiments, it is vital to record the experimental process for later use such as in interpreting results, verifying that the correct process took place or tracing where data came from. The process that led to some data is called the provenance of that data, and a provenance a ..."
Abstract
-
Cited by 35 (10 self)
- Add to MetaCart
Abstract. In e-Science experiments, it is vital to record the experimental process for later use such as in interpreting results, verifying that the correct process took place or tracing where data came from. The process that led to some data is called the provenance of that data, and a provenance architecture is the software architecture for a system that will provide the necessary functionality to record, store and use process documentation to determine the provenance of data items. However, there has been little principled analysis of what is actually required of a provenance architecture, so it is impossible to determine the functionality they would ideally support. In this paper, we present use cases for a provenance architecture from current experiments in biology, chemistry, physics and computer science, and analyse the use cases to determine the technical requirements of a generic, application-independent architecture. We propose an architecture that meets these requirements and evaluate a preliminary implementation by attempting to realise two of the use cases. 1.
Steps Toward Managing Lineage Metadata in Grid Clusters
"... The lineage of a piece of data is of utility to a wide range of domains. Several application-specific extensions have been built to facilitate tracking the origin of the output that the software produces. In the quest to provide such support to extant programs, efforts have been recently made to dev ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The lineage of a piece of data is of utility to a wide range of domains. Several application-specific extensions have been built to facilitate tracking the origin of the output that the software produces. In the quest to provide such support to extant programs, efforts have been recently made to develop operating system functionality for auditing filesystem activity to infer lineage relationships. We report on our exploration of mechanisms to manage the lineage metadata in Grid clusters. 1
unknown title
"... Several projects have recognized the need for abstract scientific workflows and processes to produce executable workflows from them (see the “Related Work ” sidebar for more information). myGrid, a UK-based e-science pilot project, is implementing an environment to support in silico experimentation ..."
Abstract
- Add to MetaCart
Several projects have recognized the need for abstract scientific workflows and processes to produce executable workflows from them (see the “Related Work ” sidebar for more information). myGrid, a UK-based e-science pilot project, is implementing an environment to support in silico experimentation guided by existing bioinformatics scenarios, including investigation of the genetic basis of Grave’s disease. 1 By describing and registering services and by writing and executing workflows, we’ve reiterated the need for abstraction while uncovering significant complexity beneath an apparently simple story. This is particularly the case in dealing with the surprising diversity among apparently identical services. Using the Basic Local Alignment Search Tool, or BLAST, as an example (see the “BLAST ” sidebar), we can demonstrate this diversity and the mechanisms of workflow harmonization necessary to address it. The bioinformatics experimental life cycle Current practice in laboratory e-science can be described as an experimental life cycle. The scientist begins with a high-level goal to test a hypothesis or integrate new discoveries with existing knowledge. Both before and during an experiment, the scientist must make decisions about the granularity of each subtask in the experimental design, thus ensuring that each task is unambiguous and realizable (that is, some service actually exists that might achieve this task). The decisions involve decomposing high-level goals into simpler tasks and choosing the most appropriate class of service to accomplish each task. We’re developing a methodology and the software environment to help scientists use the Grid’s resources in performing these in silico experiments. To illustrate this, consider a scenario involving a hypothesis about whether a novel protein found in diseased tissue could have a causative role in that disease. The first step consists of finding out whether biologists have designed these types of in silico experiments before. A centralized service registry categorizes previously published experimental designs and services using associated metadata (see the “Metadata ” sidebar on page 50). If no existing design matches the desired experiment, the biologist will instead search or browse for designs that possess some features relevant to the current goal, including those that operate on protein data and those that infer functional information. Such informal registries already

