Results 1 - 10
of
129
Provenance management in curated databases
- In SIGMOD ’06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
, 2006
"... Curated databases in bioinformatics and other disciplines are the result of a great deal of manual annotation, correction and transfer of data from other sources. Provenance information concerning the creation, attribution, or version history of such data is crucial for assessing its integrity and s ..."
Abstract
-
Cited by 66 (16 self)
- Add to MetaCart
Curated databases in bioinformatics and other disciplines are the result of a great deal of manual annotation, correction and transfer of data from other sources. Provenance information concerning the creation, attribution, or version history of such data is crucial for assessing its integrity and scientific value. General purpose database systems provide little support for tracking provenance, especially when data moves among databases. This paper investigates general-purpose techniques for recording provenance for data that is copied among databases. We describe an approach in which we track the user’s actions while browsing source databases and copying data into a curated database, in order to record the user’s actions in a convenient, queryable form. We present an implementation of this technique and use it to evaluate the feasibility of database support for provenance management. Our experiments show that although the overhead of a naïve approach is fairly high, it can be decreased to an acceptable level using simple optimizations. 1.
Managing rapidly-evolving scientific workflows
- In International Provenance and Annotation Workshop (IPAW), LNCS 4145
, 2006
"... Abstract. We give an overview of VisTrails, a system that provides an infrastructure for systematically capturing detailed provenance and streamlining the data exploration process. A key feature that sets Vis-Trails apart from previous visualization and scientific workflow systems is a novel action- ..."
Abstract
-
Cited by 57 (21 self)
- Add to MetaCart
Abstract. We give an overview of VisTrails, a system that provides an infrastructure for systematically capturing detailed provenance and streamlining the data exploration process. A key feature that sets Vis-Trails apart from previous visualization and scientific workflow systems is a novel action-based mechanism that uniformly captures provenance for data products and workflows used to generate these products. This mechanism not only ensures reproducibility of results, but it also simplifies data exploration by allowing scientists to easily navigate through the space of workflows and parameter settings for an exploration task. 1
A Survey of Trust in Computer Science and the Semantic Web
, 2007
"... Trust is an integral component in many kinds of human interaction, allowing people to act under uncertainty and with the risk of negative consequences. For example, exchanging money for a service, giving access to your property, and choosing between conflicting sources of information all may utilize ..."
Abstract
-
Cited by 45 (1 self)
- Add to MetaCart
Trust is an integral component in many kinds of human interaction, allowing people to act under uncertainty and with the risk of negative consequences. For example, exchanging money for a service, giving access to your property, and choosing between conflicting sources of information all may utilize some form of trust. In computer science, trust is a widelyused term whose definition differs among researchers and application areas. Trust is an essential component of the vision for the Semantic Web, where both new problems and new applications of trust are being studied. This paper gives an overview of existing trust research in computer science and the Semantic Web.
Provenance and scientific workflows: challenges and opportunities
- In Proceedings of ACM SIGMOD
, 2008
"... Provenance in the context of workflows, both for the data they derive and for their specification, is an essential component to allow for result reproducibility, sharing, and knowledge re-use in the scientific community. Several workshops have been held on the topic, and it has been the focus of man ..."
Abstract
-
Cited by 35 (10 self)
- Add to MetaCart
Provenance in the context of workflows, both for the data they derive and for their specification, is an essential component to allow for result reproducibility, sharing, and knowledge re-use in the scientific community. Several workshops have been held on the topic, and it has been the focus of many research projects and prototype systems. This tutorial provides an overview of research issues in provenance for scientific workflows, with a focus on recent literature and technology in this area. It is aimed at a general database research audience and at people who work with scientific data and workflows. We will (1) provide a general overview of scientific workflows, (2) describe research on provenance for scientific workflows and show in detail how provenance is supported in existing systems; (3) discuss emerging applications that are enabled by provenance; and (4) outline open problems and new directions for database-related research.
On the expressiveness of implicit provenance in query and update languages
- In ICDT
, 2007
"... Abstract. Information concerning the origin of data (that is, its provenance) is important in many areas, especially scientific recordkeeping. Currently, provenance information must be maintained explicitly, by added effort of the database maintainer. Since such maintenance is tedious and error-pron ..."
Abstract
-
Cited by 33 (12 self)
- Add to MetaCart
Abstract. Information concerning the origin of data (that is, its provenance) is important in many areas, especially scientific recordkeeping. Currently, provenance information must be maintained explicitly, by added effort of the database maintainer. Since such maintenance is tedious and error-prone, it is desirable to provide support for provenance in the database system itself. In order to provide such support, however, it is important to provide a clear explanation of the behavior and meaning of existing database operations, both queries and updates, with respect to provenance. In this paper we take the view that a query or update implicitly defines a provenance mapping linking components of the output to the originating components in the input. Our key result is that the proposed semantics are expressively complete relative to natural classes of queries that explicitly manipulate provenance. 1
Provenance collection support in the Kepler scientific workflow system
- In Proceedings of the International Provenance and Annotation Workshop (IPAW
, 2006
"... Abstract. In many data-driven applications, analysis needs to be performed on scientific information obtained from several sources and generated by computations on distributed resources. Systematic analysis of this scientific information unleashes a growing need for automated data-driven application ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
Abstract. In many data-driven applications, analysis needs to be performed on scientific information obtained from several sources and generated by computations on distributed resources. Systematic analysis of this scientific information unleashes a growing need for automated data-driven applications that also can keep track of the provenance of the data and processes with little user interaction and overhead. Such data analysis can be facilitated by the recent advancements in scientific workflow systems. A major profit when using scientific workflow systems is the ability to make provenance collection a part of the workflow. Specifically, provenance should include not only the standard data lineage information but also information about the context in which the workflow was used, execution that processed the data, and the evolution of the workflow design. In this paper we describe a complete framework for data and process provenance in the Kepler Scientific Workflow System. We outline the requirements and issues related to data and workflow provenance in a multidisciplinary workflow system and introduce how generic provenance capture can be facilitated in Kepler’s actor-oriented workflow environment. We also describe the usage of the stored provenance information for efficient rerun of scientific workflows. 1
The Open Provenance Model
, 2008
"... The Open Provenance Model (OPM) is a community-driven data model for Provenance that is designed to support inter-operability of provenance technology. Underpinning OPM, is a notion of directed acyclic graph, used to represent data products and processes involved in past computations, and causal dep ..."
Abstract
-
Cited by 28 (4 self)
- Add to MetaCart
The Open Provenance Model (OPM) is a community-driven data model for Provenance that is designed to support inter-operability of provenance technology. Underpinning OPM, is a notion of directed acyclic graph, used to represent data products and processes involved in past computations, and causal dependencies between these. The Open Provenance Model was derived following two “Provenance Challenges”, international, multidisciplinary activities trying to investigate how to exchange information between multiple systems supporting provenance and how to query it. The OPM design was mostly driven by practical and pragmatic considerations, and is being tested in a third Provenance Challenge, which has just started. The purpose of this paper is to investigate the theoretical foundations of this data model. The formalisation consists of a set-theoretic definition of the data model, a definition of the inferences by transitive closure that are permitted, a formal description of how the model can be used to express dependencies in past computations, and finally, a description of the kind of time-based inferences that are supported. A novel element that OPM introduces is the concept of an account, by which multiple descriptions of a same execution are allowed to co-exist in a same graph. Our formalisation gives a precise meaning to such accounts and associated notions of alternate and refinement. Warning It was decided that this paper should be released as early as possible since it brings useful clarifications on the Open Provenance Model, and therefore can benefit the Provenance Challenge 3 community. The reader should recognise that this paper is however an early draft, and several sections are incomplete. Additionally, figures rely on colours but these may be difficult to read when printed in a black and white. It is advisable to print the paper in colour. 1 1
Provenance in Databases: Past, Current, and Future
, 2007
"... The need to understand and manage provenance arises in almost every scientific application. In many cases, information about provenance constitutes the proof of correctness of results that are generated by scientific applications. It also determines the quality and amount of trust one places on the ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
The need to understand and manage provenance arises in almost every scientific application. In many cases, information about provenance constitutes the proof of correctness of results that are generated by scientific applications. It also determines the quality and amount of trust one places on the results. For these reasons, the knowledge of provenance of a scientific result is typically regarded to be as important as the result itself. In this paper, we provide an overview of research in provenance in databases and discuss some future research directions. The content of this paper is largely based on the tutorial presented at SIGMOD 2007 [11].
A Framework for Collecting Provenance in Data-Centric Scientific Workflows
- In ICWS
, 2006
"... The increasing ability for the earth sciences to sense the world around us is resulting in a growing need for datadriven applications that are under the control of data-centric workflows composed of grid- and web- services. The focus of our work is on provenance collection for these workflows, neces ..."
Abstract
-
Cited by 26 (3 self)
- Add to MetaCart
The increasing ability for the earth sciences to sense the world around us is resulting in a growing need for datadriven applications that are under the control of data-centric workflows composed of grid- and web- services. The focus of our work is on provenance collection for these workflows, necessary to validate the workflow and to determine quality of generated data products. The challenge we address is to record uniform and usable provenance metadata that meets the domain needs while minimizing the modification burden on the service authors and the performance overhead on the workflow engine and the services. The framework, based on a loosely-coupled publish-subscribe architecture for propagating provenance activities, satisfies the needs of detailed provenance collection while a performance evaluation of a prototype finds a minimal performance overhead (in the range of 1 % for an eight service workflow using 271 data products). 1.
Provenance as dependency analysis
- Proceedings of the 11th International Symposium on Database Programming Languages (DBPL 2007), number 4797 in LNCS
, 2007
"... Abstract. Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic foundations of data provenance have received comp ..."
Abstract
-
Cited by 25 (9 self)
- Add to MetaCart
Abstract. Provenance is information recording the source, derivation, or history of some information. Provenance tracking has been studied in a variety of settings; however, although many design points have been explored, the mathematical or semantic foundations of data provenance have received comparatively little attention. In this paper, we argue that dependency analysis techniques familiar from program analysis and program slicing provide a formal foundation for forms of provenance that are intended to show how (part of) the output of a query depends on (parts of) its input. We introduce a semantic characterization of such dependency provenance, show that this form of provenance is not computable, and provide dynamic and static approximation techniques. 1

