Results 1 -
8 of
8
BioKleisli: A Digital Library for Biomedical Researchers
, 1996
"... Data of interest to biomedical researchers associated with the Human Genome Project (HGP) is stored all over the world in a number of different electronic data formats and accessible through a varietyof interfaces and retrieval languages. These data sources include conventional relational databases ..."
Abstract
-
Cited by 70 (15 self)
- Add to MetaCart
Data of interest to biomedical researchers associated with the Human Genome Project (HGP) is stored all over the world in a number of different electronic data formats and accessible through a varietyof interfaces and retrieval languages. These data sources include conventional relational databases with SQL interfaces, formatted text files on top of which indexing is provided for efficient retrieval (ASN.1-Entrez), and binary files that can be interpreted textually or graphically via special purpose interfaces (ACeDB). Researchers within the HGP wanttocombine data from these different data sources, add value through sophisticated data analysis techniques (such as the biosequence comparison software BLAST and FASTA), and view it using special purpose scientific visualization tools. However, currently there are no commercial tools for enabling such an integrated digital library, and a fundamental barrier to developing such tools appears to be one of language design and optimization: The data f...
A Data Transformation System for Biological Data Sources
- In Proceedings of 21st International Conference on Very Large Data Bases
, 1995
"... Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well as sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contai ..."
Abstract
-
Cited by 69 (19 self)
- Add to MetaCart
Scientific data of importance to biologists in the Human Genome Project resides not only in conventional databases, but in structured files maintained in a number of different formats (e.g. ASN.1 and ACE) as well as sequence analysis packages (e.g. BLAST and FASTA). These formats and packages contain a number of data types not found in conventional databases, such as lists and variants, and may be deeply nested. We present in this paper techniques for querying and transforming such data, and illustrate their use in a prototype system developed in conjunction with the Human Genome Center for Chromosome 22. We also describe optimizations performed by the system, a crucial issue for bulk data. 1 Introduction The goal of the Human Genome Project (HGP) is to sequence the 24 distinct chromosomes comprising the human genome. Much of the information associated with the HGP resides not in conventional databases, but in files that have been formatted according to a variety of conventions. These...
K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources
, 2000
"... The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with t ..."
Abstract
-
Cited by 52 (4 self)
- Add to MetaCart
The integration of heterogeneous data sources and software systems is a major issue in the biomedical community and several approaches have been explored: linking databases, "on-the-fly" integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear "winner". Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application.
Kleisli, a Functional Query System
- J. Funct. Prog
, 1998
"... Kleisli is a modern data integration system that has made a significant impact on bioinformatics data integration. This paper contains a brief introduction to the Kleisli system and an example to illustrate its uses in the bioinformatics arena. The primary query language provided by Kleisli is calle ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Kleisli is a modern data integration system that has made a significant impact on bioinformatics data integration. This paper contains a brief introduction to the Kleisli system and an example to illustrate its uses in the bioinformatics arena. The primary query language provided by Kleisli is called CPL, which is a functional query language whose surface syntax is based on the comprehension syntax. Kleisli is itself implemented using the functional language SML. So this paper also describes the influence of functional programming research that benefits the Kleisli system, especially the less obvious ones at the implementation level. Availability. Kleisli has been commercialized under the name "KRIS". It is available from Kris Technology Inc., 713 Santa Cruz Ave, #2, Menlo Park, CA 94025. Direct email to info@kris-inc.com and web browser to http://www.kris-inc.com. 1 Introduction The Kleisli system (Davidson et al., 1997) is an advanced broad-scale integration technology that has pro...
Database Techniques for Biological Materials and Methods
- In 1st Inter'l Conf. on Intelligent Systems for Molecular Biology
, 1993
"... The Biological sciences produce an enormous research literature every year. Research papers are highly structured documents whose content is not captured using the traditional techniques of information retrieval: keywords and flat text. This is especially true of the Materials & Methods section of e ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The Biological sciences produce an enormous research literature every year. Research papers are highly structured documents whose content is not captured using the traditional techniques of information retrieval: keywords and flat text. This is especially true of the Materials & Methods section of experimental papers. A great deal of highly structured information is packed into this section. It involves logical and temporal sequences of operations that combine and operate on materials using various instruments and depending on many parameters. We are designing and implementing databases that will allow this complex knowledge to be represented, stored in object-oriented databases and retrieved. We are developing an application of this technology called the Laboratory Notebook. This application is a software system that will contain personal laboratory information as well as have access to databases of Materials & Methods sections drawn from the literature. 1 Introduction. Biology is a very large and diverse field. The primary output of the enterprise is its published research literature, which consists of about 600,000 papers every
Object-oriented Knowledge Bases for the Analysis of Prokaryotic and Eukaryotic Genomes
, 1993
"... example, the expressivity of E. coli genes can be inferred from complex computations applied to sequences. A query to retrieve highly expressed genes needs that the system "knows" the method to estimate expressivity. Object models are particularly adapted to manage and to integrate these two kinds o ..."
Abstract
- Add to MetaCart
example, the expressivity of E. coli genes can be inferred from complex computations applied to sequences. A query to retrieve highly expressed genes needs that the system "knows" the method to estimate expressivity. Object models are particularly adapted to manage and to integrate these two kinds of knowledge. The amount of biological sequences introduced in the general collections, and the growing complexity of the biological knowledge require the construction of models to formalize this knowledge and particularly the relationships between several data types. Two examples of such situations are presented here, they result from the biological research lead in our team in the field of molecular evolution. ColiGene is a modelling of E. coli genetics devoted to the analysis of relationships between genomic sequences and gene expressivity. MultiMap implements a new formalization of genome maps allowing manipulation of "maps of maps" in two species. Application of ColiGene and MultiMap a...
Object-Oriented Modelling in Molecular Biology
- Proceedings of the Artificial Intelligence and Genome WorkshoI, JCAI
, 1993
"... d of modelling in molecular biology and we have developed various tools to handle and study genomic sequences. We have been among the firsts to propose genomic data bases (Gautier et al., 1981), then to develop a Data Base Management System (DBMS) dedicated to the biological sequences: ACNUC (Gouy e ..."
Abstract
- Add to MetaCart
d of modelling in molecular biology and we have developed various tools to handle and study genomic sequences. We have been among the firsts to propose genomic data bases (Gautier et al., 1981), then to develop a Data Base Management System (DBMS) dedicated to the biological sequences: ACNUC (Gouy et al., 1985). In association with this data base, we have built the Analseq package (Jacobzone and Gautier, 1989) for sequence analysis. These two softwares are examples of systems in which it exist a strong separation between interrogation and analysis of the biological sequences. More recently, we have developed tools that integrate both biological and methodological knowledge. ColiGene is a modelling of E. coli genetics devoted to the analysis of relationships between genomic sequences and gene expressivity and MultiMap implements a formalization of genome maps allowing manipulation of localization informations with homology modelling in man and mouse. Modelling of biological knowledge

