Results 1 - 10
of
96
Bio-ontologies: current trends and future directions
- Briefings in Bioinformatics
, 2006
"... In recent years, as a knowledge-based discipline, bioinformatics has been made more computationally amenable. After its beginnings as a technology advocated by computer scientists to overcome problems of heterogeneity, ontology has been taken up by biologists themselves as a means to consistently an ..."
Abstract
-
Cited by 121 (7 self)
- Add to MetaCart
(Show Context)
In recent years, as a knowledge-based discipline, bioinformatics has been made more computationally amenable. After its beginnings as a technology advocated by computer scientists to overcome problems of heterogeneity, ontology has been taken up by biologists themselves as a means to consistently annotate features from genotype to phenotype. In medical informatics, artifacts called ontologies have been used for a longer period of time to produce controlled lexicons for coding schemes. In this article, we review the current position in ontologies and how they have become institutionalized within biomedicine. As the field has matured, the much older philosophical aspects of ontology have come into play. With this and the institutionalization of ontology has come greater formality. We review this trend and what benefits it might bring to ontologies and their use within biomedicine.
2003. Mapping data in peer-to-peer systems: Semantics and algorithmic issues
- In Proceedings of the ACM SIGMOD 2003
"... We consider the problem of mapping data in peer-topeer data-sharing systems. Such systems often rely on the use of mapping tables listing pairs of corresponding values to search for data residing in different peers. In this paper, we address semantic and algorithmic issues related to the use of mapp ..."
Abstract
-
Cited by 118 (8 self)
- Add to MetaCart
(Show Context)
We consider the problem of mapping data in peer-topeer data-sharing systems. Such systems often rely on the use of mapping tables listing pairs of corresponding values to search for data residing in different peers. In this paper, we address semantic and algorithmic issues related to the use of mapping tables. We begin by arguing why mapping tables are appropriate for data mapping in a peer-to-peer environment. We discuss alternative semantics for these tables and we present a language that allows the user to specify mapping tables under different semantics. Then, we show that by treating mapping tables as constraints (called mapping constraints) on the exchange of information between peers it is possible to reason about them. We motivate why reasoning capabilities are needed to manage mapping tables and show the importance of inferring new mapping tables from existing ones. We study the complexity of this problem and we propose an efficient algorithm for its solution. Finally, we present an implementation along with experimental results that show that mapping tables may be managed efficiently in practice. 1.
TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources
, 1998
"... The TAMBIS project aims to provide transparent access to disparate biological databases and analysis tools, enabling users to utilize a wide range of resources with the minimum of effort. A prototype system has been developed that includes a knowledge base of biological terminology (the biologi ..."
Abstract
-
Cited by 82 (14 self)
- Add to MetaCart
The TAMBIS project aims to provide transparent access to disparate biological databases and analysis tools, enabling users to utilize a wide range of resources with the minimum of effort. A prototype system has been developed that includes a knowledge base of biological terminology (the biological Concept Model), a model of the underlying data sources (the Source Model) and a `knowledge-driven' user interface. Biological concepts are captured in the knowledge base using a description logic called GRAIL. The Concept Model provides the user with the concepts necessary to construct a wide range of multiple-source queries, and the user interface provides a flexible means of constructing and manipulating those queries. The Source Model provides a description of the underlying sources and mappings between terms used in the sources and terms in the biological Concept Model. The Concept Model and Source Model provide a level of indirection that shields the user from source details, providing a high level of source transparency. Source independent, declarative queries formed from terms in the Concept Model are transformed into a set of source dependent, executable procedures. Query formulation, translation and execution is demonstrated using a working example.
K2/Kleisli and GUS: experiments in integrated access to genomic data sources
- IBM SYSTEMS JOURNAL
, 2001
"... The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the- fly " integration through views, and integration through warehousing. In this paper we report on our experi ..."
Abstract
-
Cited by 74 (7 self)
- Add to MetaCart
(Show Context)
The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the- fly " integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear "winner". Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application.
Mondrian: Annotating and querying databases through colors and blocks
- in ICDE ’06: Proceedings of the 22nd International Conference on Data Engineering (ICDE’06
, 2006
"... Annotations play a central role in the curation of scientific databases. Despite their importance, data formats and schemas are not designed to manage the increasing variety of annotations. Moreover, DBMS’s often lack support for storing and querying annotations. Furthermore, annotations and data ar ..."
Abstract
-
Cited by 61 (2 self)
- Add to MetaCart
(Show Context)
Annotations play a central role in the curation of scientific databases. Despite their importance, data formats and schemas are not designed to manage the increasing variety of annotations. Moreover, DBMS’s often lack support for storing and querying annotations. Furthermore, annotations and data are only loosely coupled. This paper introduces an annotation-oriented data model for the manipulation and querying of both data and annotations. In particular, the model allows for the specification of annotations on sets of values and for effectively querying the information on their association. We use the concept of block to represent an annotated set of values. Different colors applied to the blocks represent different annotations. We introduce a color query language for our model and prove it to be both complete (it can express all possible queries over the class of annotated databases), and minimal (all the algebra operators are primitive). We present MONDRIAN, a prototype implementation of our annotation mechanism, and we conduct experiments that investigate the set of parameters which influence the evaluation cost for color queries. 1.
Data provenance: some basic issues
- In Foundations of Software Technology and Theoretical Computer Science
, 2000
"... Abstract. The ease with which one can copy and transform data on the Web, has made it increasingly di cult to determine the origins of a piece of data. We use the term data provenance to refer to the process of tracing and recording the origins of data and its movement between databases. Provenance ..."
Abstract
-
Cited by 58 (0 self)
- Add to MetaCart
(Show Context)
Abstract. The ease with which one can copy and transform data on the Web, has made it increasingly di cult to determine the origins of a piece of data. We use the term data provenance to refer to the process of tracing and recording the origins of data and its movement between databases. Provenance is now an acute issue in scienti c databases where it central to the validation of data. In this paper we discuss some of the technical issues that have emerged in an initial exploration of the topic. 1
A Classification of Tasks in Bioinformatics
, 2001
"... Motivation: This paper reports on a survey of bioinformatics tasks currently undertaken by working biologists. The aim was to find the range of tasks that need to be supported and the components needed to do this in a general query system. This enabled a set of evaluation criteria to be used to asse ..."
Abstract
-
Cited by 52 (8 self)
- Add to MetaCart
Motivation: This paper reports on a survey of bioinformatics tasks currently undertaken by working biologists. The aim was to find the range of tasks that need to be supported and the components needed to do this in a general query system. This enabled a set of evaluation criteria to be used to assess both the biology and mechanical nature of general query systems. Results: A classification of the biological content of the tasks gathered offers a check-list for those tasks (and their specialisations) that should be offered in a general bioinformatics query system. This semantic analysis was contrasted with a syntactic analysis that revealed the small number of components required to describe all bioinformatics questions. Both the range of biological tasks and syntactic task components can be seen to provide a set of bioinformatics requirements for general query systems. These requirements were used to evaluate two bioinformatics query systems. Contact: robert.stevens@cs.man.ac.uk. Sup...
Integration of Biological Sources: Current Systems and Challenges Ahead
- Sigmod Record
, 2004
"... This paper surveys the area of biological and genomic sources integration, which has recently become a major focus of the data integration research field. The challenges that an integration system for biological sources must face are due to several factors such as the variety and amount of data avai ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
This paper surveys the area of biological and genomic sources integration, which has recently become a major focus of the data integration research field. The challenges that an integration system for biological sources must face are due to several factors such as the variety and amount of data available, the representational heterogeneity of the data in the different sources, and the autonomy and differing capabilities of the sources.
iProClass: an integrated database of protein family, function and structure information
- Nucleic Acids Res
, 2003
"... The i ProClass databaseprovi--M comprehensihe value-added descriscrxL ofprotei ns and serves as a framework for dataitaxL atixi adi5F i5FF networki-- enviqF-- ent. Theprotei iote matii i ProClassiCl udesfamiL relati6Mx ie as well as structural and functi nal classisx6M5/ s and featur ..."
Abstract
-
Cited by 43 (17 self)
- Add to MetaCart
The i ProClass databaseprovi--M comprehensihe value-added descriscrxL ofprotei ns and serves as a framework for dataitaxL atixi adi5F i5FF networki-- enviqF-- ent. Theprotei iote matii i ProClassiCl udesfamiL relati6Mx ie as well as structural and functi nal classisx6M5/ s and features. The currentversiM consi sts of about 830 000 non-redundant PIR-PSD, SWISS-PROT, and TrEMBLprotei s organixD wia more than 36 000 PIR superfamiLM6 , 145 000fami lii 4000 doma 1300moti fs and 550 000 FASTA siTA ariA clusters. It provi46 rii li ks to over 50 database ofprotei n sequences, fami li es, functi ons and pathways, protei n-- protei ni----F actixD4 post-tran slatiran modi catir protei n expressi--L , structures and structural classi ficati--x genes and genomes, ontologi es, liLM--6 ure and taxonomy. Pro and super summary reports present exte nsix annot atit itxjLM i andidxL de membershi p stati tit and grap hipx dipx ay of domaim and moti4M iProClass employs an open and modular archi ecture forirxL operabiera and scalab ialab It i ix6/M/ tedi the Oracle object-relatijx l database system and i updatedbiLqj ly. The database i freely accessi ble from the web six at http: georgetown.edu/i5--xD ass/ and searchable by sequence or text stri g. The dataitax6 i ProClass supports exploratip ofprotei n relatixD shiat Such knowledgei fundamental to the understandia ofprotei nevoluti on, structure andfuncti on and cruciF to functicxL genomi and proteomi research.
An Examination of DSLs for Concisely Representing Model Traversals and Transformations
- 36th Annual Hawaii International Conference on System Sciences (HICSS'03) - Track 9, p. 325a, January 06 - 09
, 2003
"... A key advantage for the use of a Domain-Specific Language (DSL) is the leverage that can be captured from a concise representation of a programmer's intention. This paper reports on three different DSLs that were developed for two different projects. Two of the DSLs assisted in the specificatio ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
A key advantage for the use of a Domain-Specific Language (DSL) is the leverage that can be captured from a concise representation of a programmer's intention. This paper reports on three different DSLs that were developed for two different projects. Two of the DSLs assisted in the specification of various modeling tool ontologies, and the integration of models across these tools. On another project, a different DSL has been applied as a language to assist in aspect-oriented modeling. Each of these three languages was converted to C++ using different code generators. These DSLs were concerned with issues of traversing a model and performing transformations. The paper also provides quantitative data on the relative sizes of the intention (as expressed in the DSL) and the generated C++ code. Observations are made regarding the nature of the benefits and the manner in which the conciseness of the DSL is best leveraged.