Results 1 - 10
of
164
Scientific workflow management and the Kepler system. Special issue: workflow in grid systems
- Concurr. Comput.: Pract. Exp
, 2006
"... Many scientific disciplines are now data and information driven, and new scientific knowledge is often gained by scientists putting together data analysis and knowledge discovery “pipelines”. A related trend is that more and more scientific communities realize the benefits of sharing their data and ..."
Abstract
-
Cited by 111 (9 self)
- Add to MetaCart
Many scientific disciplines are now data and information driven, and new scientific knowledge is often gained by scientists putting together data analysis and knowledge discovery “pipelines”. A related trend is that more and more scientific communities realize the benefits of sharing their data and computational services, and are thus contributing to a distributed data and computational community infrastructure (a.k.a. “the Grid”). However, this infrastructure is only a means to an end and scientists ideally should be bothered little with its existence. The goal is for scientists to focus on development and use of what we call scientific workflows. These are networks of analytical steps that may involve, e.g., database access
Taverna: a tool for building and running workflows of services
- Nucleic Acids Res
, 2006
"... Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as seq ..."
Abstract
-
Cited by 71 (4 self)
- Add to MetaCart
Taverna is an application that eases the use and integration of the growing number of molecular biology tools and databases available on the web, especially web services. It allows bioinformaticians to construct workflows or pipelines of services to perform a range of different analyses, such as sequence analysis and genome annotation. These high-level workflows can integrate many different resources into a single analysis. Taverna is available freely under the terms of the GNU Lesser General Public License (LGPL) from
Actor-Oriented Design of Scientific Workflows
- In 24st Intl. Conference on Conceptual Modeling
, 2005
"... Scientific workflows are becoming increasingly important as a unifying mechanism for interlinking scientific data management, analysis, simulation, and visualization tasks. Scientific workflow systems are problem-solving environments, supporting scientists in the creation and execution of scienti ..."
Abstract
-
Cited by 47 (18 self)
- Add to MetaCart
Scientific workflows are becoming increasingly important as a unifying mechanism for interlinking scientific data management, analysis, simulation, and visualization tasks. Scientific workflow systems are problem-solving environments, supporting scientists in the creation and execution of scientific workflows.
Curated databases
- PODS'08
, 2008
"... Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries – dictionaries, encyclopedias, gazetteers etc. – are now curated databases. Since it is now easy to publish databa ..."
Abstract
-
Cited by 43 (6 self)
- Add to MetaCart
Curated databases are databases that are populated and updated with a great deal of human effort. Most reference works that one traditionally found on the reference shelves of libraries – dictionaries, encyclopedias, gazetteers etc. – are now curated databases. Since it is now easy to publish databases on the web, there has been an explosion in the number of new curated databases used in scientific research. The value of curated databases lies in the organization and the quality of the data they contain. Like the paper reference works they have replaced, they usually represent the efforts of a dedicated group of people to produce a definitive description of some subject area. Curated databases present a number of challenges for database research. The topics of annotation, provenance, and citation are central, because curated databases are heavily cross-referenced with, and include data from, other databases, and much of the work of a curator is annotating existing data. Evolution of structure is important because these databases often evolve from semistructured representations, and because they have to accommodate new scientific discoveries. Much of the work in these areas is in its infancy, but it is beginning to provide suggest new research for both theory and practice. We discuss some of this research and emphasize the need to find appropriate models of the processes associated with curated databases.
A Notation and System for Expressing and Executing Cleanly Typed Workflows on Messy Scientific Data
- SIGMOD Record
, 2005
"... The description, composition, and execution of even logically simple scientific workflows are often complicated by the need to deal with “messy ” issues like heterogeneous storage formats and ad-hoc file system structures. We show how these difficulties can be overcome via a typed, compositional wor ..."
Abstract
-
Cited by 35 (9 self)
- Add to MetaCart
The description, composition, and execution of even logically simple scientific workflows are often complicated by the need to deal with “messy ” issues like heterogeneous storage formats and ad-hoc file system structures. We show how these difficulties can be overcome via a typed, compositional workflow notation within which issues of physical representation are cleanly separated from logical typing, and by the implementation of this notation within the context of a powerful runtime system that supports distributed execution. The resulting notation and system are capable both of expressing complex workflows in a simple, compact form, and of enacting those workflows in distributed environments. We apply our technique to cognitive neuroscience workflows that analyze functional MRI image data, and demonstrate significant reductions in code size relative to other approaches. 1
Applying Semantic Web Services to Bioinformatics Experiences Gained, lessons learnt
, 2004
"... We have seen an increasing amount of interest in the application of Semantic Web technologies to Web services. The aim is to support automated discovery and composition of the services allowing seamless and transparent interoperability. In this paper we discuss three projects that are applying s ..."
Abstract
-
Cited by 33 (10 self)
- Add to MetaCart
We have seen an increasing amount of interest in the application of Semantic Web technologies to Web services. The aim is to support automated discovery and composition of the services allowing seamless and transparent interoperability. In this paper we discuss three projects that are applying such technologies to bioinformatics: MOBY-Services and Semantic-MOBY. Through an examination of the di#erences and similarities between the solutions produced, we highlight some of the practical di#culties in developing Semantic Web services and suggest that the experiences with these projects have implications for the development of Semantic Web services as a whole.
Grid Service Orchestration using the Business Process Execution Language (BPEL
- Journal of Grid Computing
, 2005
"... Abstract. Modern scientific applications often need to be distributed across grids. Increasingly applications rely on services, such as job submission, data transfer or data portal services. We refer to such services as grid services. While the invocation of grid services could be hard coded in theo ..."
Abstract
-
Cited by 26 (5 self)
- Add to MetaCart
Abstract. Modern scientific applications often need to be distributed across grids. Increasingly applications rely on services, such as job submission, data transfer or data portal services. We refer to such services as grid services. While the invocation of grid services could be hard coded in theory, scientific users want to orchestrate service invocations more flexibly. In enterprise applications, the orchestration of web services is achieved using emerging orchestration standards, most notably the Business Process Execution Language (BPEL). We describe our experience in orchestrating scientific workflows using BPEL. We have gained this experience during an extensive case study that orchestrates grid services for the automation of a polymorph prediction application. Using this example, we explain the extent with which the BPEL language supports the definition of scientific workflows. We then describe the reliability, performance and scalability that can be achieved by executing a complex scientific workflow with ActiveBPEL, an industrial strength but freely available BPEL engine.
Exploring Williams-Beuren syndrome using myGrid
- IN PROCEEDINGS OF 12TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS IN MOLECULAR BIOLOGY
, 2003
"... Motivation: In silico experiments necessitate the virtual organization of people, data, tools and machines.The scientific process also necessitates an awareness of the experience base, both of personal data as well as the wider context of work.The management of all these data and the co-ordination o ..."
Abstract
-
Cited by 25 (10 self)
- Add to MetaCart
Motivation: In silico experiments necessitate the virtual organization of people, data, tools and machines.The scientific process also necessitates an awareness of the experience base, both of personal data as well as the wider context of work.The management of all these data and the co-ordination of resources to manage such virtual organizations and the data surrounding them needs significant computational infrastructure support. Results: In this
A model for user-oriented data provenance in pipelined scientific workflows
- IN IPAW
, 2006
"... Integrated provenance support promises to be a chief advantage of scientific workflow systems over script-based alternatives. While it is often recognized that information gathered during scientific workflow execution can be used automatically to increase fault tolerance (via checkpointing) and to o ..."
Abstract
-
Cited by 20 (7 self)
- Add to MetaCart
Integrated provenance support promises to be a chief advantage of scientific workflow systems over script-based alternatives. While it is often recognized that information gathered during scientific workflow execution can be used automatically to increase fault tolerance (via checkpointing) and to optimize performance (by reusing intermediate data products in future runs), it is perhaps more significant that provenance information also may be used by scientists to reproduce results from earlier runs, to explain unexpected results, and to prepare results for publication. Current workflow systems offer little or no direct support for these “scientist-oriented ” queries of provenance information. Indeed the use of advanced execution models in scientific workflows (e.g., process networks, which exhibit pipeline parallelism over streaming data) and failure to record certain fundamental events such as state resets of processes, can render existing provenance schemas useless for scientific applications of provenance. We develop a simple provenance model that is capable of supporting a wide range of scientific use cases even for complex models of computation such as process networks. Our approach reduces these use cases to database queries over event logs, and is capable of reconstructing complete data and invocation dependency graphs for a workflow run.

