Results 1 -
9 of
9
Provenance management in curated databases
- In SIGMOD ’06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
, 2006
"... Curated databases in bioinformatics and other disciplines are the result of a great deal of manual annotation, correction and transfer of data from other sources. Provenance information concerning the creation, attribution, or version history of such data is crucial for assessing its integrity and s ..."
Abstract
-
Cited by 66 (16 self)
- Add to MetaCart
Curated databases in bioinformatics and other disciplines are the result of a great deal of manual annotation, correction and transfer of data from other sources. Provenance information concerning the creation, attribution, or version history of such data is crucial for assessing its integrity and scientific value. General purpose database systems provide little support for tracking provenance, especially when data moves among databases. This paper investigates general-purpose techniques for recording provenance for data that is copied among databases. We describe an approach in which we track the user’s actions while browsing source databases and copying data into a curated database, in order to record the user’s actions in a convenient, queryable form. We present an implementation of this technique and use it to evaluate the feasibility of database support for provenance management. Our experiments show that although the overhead of a naïve approach is fairly high, it can be decreased to an acceptable level using simple optimizations. 1.
Performance Evaluation of the Karma Provenance Framework for Scientific Workflows
- in: International Provenance and Annotation Workshop (IPAW
, 2006
"... Abstract. Provenance about workflow executions and data derivations in scientific applications help estimate data quality, track resources, and validate in silico experiments. The Karma provenance framework provides a means to collect workflow, process, and data provenance from data-driven scientifi ..."
Abstract
-
Cited by 10 (4 self)
- Add to MetaCart
Abstract. Provenance about workflow executions and data derivations in scientific applications help estimate data quality, track resources, and validate in silico experiments. The Karma provenance framework provides a means to collect workflow, process, and data provenance from data-driven scientific workflows and is used in the Linked Environments for Atmospheric Discovery (LEAD) project. This paper presents a performance analysis of the Karma service as compared against the contemporary PReServ provenance service. Our study finds that Karma scales exceedingly well for collecting and querying provenance records, showing linear or sub-linear scaling with increasing number of provenance records and clients when tested against workloads in the order of 10,000 application-service invocations and over 36 concurrent clients. 1
Named Graphs as a Mechanism for Reasoning about Provenance
"... Named Graphs is a simple, compatible extension to the RDF abstract syntax that enables statements to be made about RDF graphs. This approach is in contrast to earlier attempts such as RDF reification, or knowledge-base specific extensions including quads and contexts. In this paper we demonstrate t ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Named Graphs is a simple, compatible extension to the RDF abstract syntax that enables statements to be made about RDF graphs. This approach is in contrast to earlier attempts such as RDF reification, or knowledge-base specific extensions including quads and contexts. In this paper we demonstrate the use of Named Graphs and our experiences developing new kinds of semantic web application that build on Named Graphs for digital signatures, provenance, and semantic reasoning. We present a working example based on the Named Graphs for Jena (NG4J) API, from which we developed a semantic version control system for Software Engineering capable of reasoning about Named Graph-based provenance. We go on to discuss the implications of Named Graphs for Description Logics and semantic inference strategies.
Effective Metadata Management in Federated Sensor Networks
"... Abstract—As sensor networks become increasingly popular, heterogeneous sensor networks are being interconnected into federated sensor networks and provide huge volumes of sensor data to large user communities for a variety of applications. Effective metadata management plays a crucial role in proces ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—As sensor networks become increasingly popular, heterogeneous sensor networks are being interconnected into federated sensor networks and provide huge volumes of sensor data to large user communities for a variety of applications. Effective metadata management plays a crucial role in processing and properly interpreting raw sensor measurement data, and needs to be performed in a collaborative fashion. Previous data management work has concentrated on metadata and data as two separate entities and has not provided specific support for joint real-time processing of metadata and sensor data. In this paper we propose a framework that allows effective sensor data and metadata management based on real-time metadata creation and join processing over federated sensor networks. The framework is established on three key mechanisms: (i) distributed metadata joins to allow streaming sensor data to be efficiently processed with their associated metadata, regardless of their location in the network, (ii) automated metadata generation to permit users to define monitoring conditions or operations for extracting and storing metadata from streaming sensor data, (iii) advanced metadata search utilizing various techniques specifically designed for sensor metadata querying and visualization. This framework is currently deployed and used as the backbone of a concrete application in environmental science and engineering, the Swiss Experiment, which runs a wide variety of measurements and experiments for environmental hazard forecasting and warning. I.
Neuroimaging Data Provenance Using the LONI Pipeline Workflow Environment
"... Abstract. Provenance, the description of the history of a set of data, has become important in the neurosciences with the proliferation of research consortia-related neuroimaging efforts. Knowledge about the origin, preprocessing, analysis and post hoc processing of neuroimaging volumes is essential ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Provenance, the description of the history of a set of data, has become important in the neurosciences with the proliferation of research consortia-related neuroimaging efforts. Knowledge about the origin, preprocessing, analysis and post hoc processing of neuroimaging volumes is essential for establishing data and results quality, the reproducibility of findings, and their scientific interpretation. Neuroimaging provenance also includes the specifics of the software routines, algorithmic parameters, and operating system settings that were employed in the analysis protocol. The LONI Pipeline
Policy-Based Integration of Provenance Metadata
"... Abstract—Reproducibility has been a cornerstone of the scientific method for hundreds of years. The range of sources from which data now originates, the diversity of the individual manipulations performed, and the complexity of the orchestrations of these operations all limit the reproducibility tha ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract—Reproducibility has been a cornerstone of the scientific method for hundreds of years. The range of sources from which data now originates, the diversity of the individual manipulations performed, and the complexity of the orchestrations of these operations all limit the reproducibility that a scientist can ensure solely by manually recording their actions. We use an architecture where aggregation, fusion, and composition policies define how provenance records can be automatically merged to facilitate the analysis and reproducibility of experiments. We show that the overhead of collecting and storing provenance metadata can vary dramatically depending on the policy used to integrate it. Keywords-lineage; aggregation; fusion; flexible I.
Enabling provenance on large scale e- Science applications
"... Abstract. Large-scale e-Science experiments present unprecedented data handling requirements with their multi-petabyte data storages. Complex software applications, such as the ATLAS High Energy Physics experiment at CERN, run throughout Grid computing sites around the world in a distributed environ ..."
Abstract
- Add to MetaCart
Abstract. Large-scale e-Science experiments present unprecedented data handling requirements with their multi-petabyte data storages. Complex software applications, such as the ATLAS High Energy Physics experiment at CERN, run throughout Grid computing sites around the world in a distributed environment, with scientists performing concurrent analysis on data and producing new data products shared among the collaboration. In this paper, we introduce a multi-phase infrastructure to achieve data provenance for an e-Science experiment. We propose an infrastructure to integrate provenance onto an existing legacy application with strong emphasis on scalability and explore the relationship between provenance and metadata introducing a model where data provenance is made available as metadata through a separate reasoning phase. 1
A User-Orientated Approach to . . .
"... We present a novel user-orientated approach to provenance capture and representation for in silico experiments, contrasted against the more systems-orientated approaches that have been typical within the e-Science domain. In our approach we seek to capture the scientist’s reasoning in the form of a ..."
Abstract
- Add to MetaCart
We present a novel user-orientated approach to provenance capture and representation for in silico experiments, contrasted against the more systems-orientated approaches that have been typical within the e-Science domain. In our approach we seek to capture the scientist’s reasoning in the form of annotations as an experiment evolves, whilst using the scientist’s terminology in the representation of process provenance. Our user-orientated approach is applied in a case study within the atmospheric chemistry domain: where we consider the design, development and evaluation of an Electronic Laboratory Notebook, a provenance capture and storage tool, for iterative model development.

