Results 1 -
8 of
8
Integrated Access to Metabolic and Genomic Data
- Journal of Computational Biology
, 1996
"... 1 The EcoCyc system consists of a knowledge base (KB) that describes the genes and intermediary metabolism of E. coli, and a graphical user interface (GUI) for accessing that knowledge. This paper addresses two problems: How can we create a GUI that provides integrated access to metabolic and genomi ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
1 The EcoCyc system consists of a knowledge base (KB) that describes the genes and intermediary metabolism of E. coli, and a graphical user interface (GUI) for accessing that knowledge. This paper addresses two problems: How can we create a GUI that provides integrated access to metabolic and genomic data? We describe the design and implementation of visual presentations that closely mimic those found in the biology literature, and that offer hypertext navigation among related entities, and multiple views of the same entity. We employ a frame knowledge representation system (FRS) called HyperTHEO to manage the EcoCyc knowledge base. Among the advantages of FRSs are an expressive data model for capturing the complexities of biological information, and schema-evolution capabilities that facilitate the constant schema changes that biological databases tend to undergo. HyperTHEO also includes rule-based inference facilities that are the foundation of expert systems, a constraint language for maintaining data integrity, and a declarative query language. A graphical KB editor and browser allows the EcoCyc developers to interactively inspect and modify this evolving KB. 2 1
View-Concepts: Knowledge-Based Access to Databases
- In First International Conference on Information and Knowledge Management
, 1992
"... Semantic data models for database systems provide powerful tools to assist database administrators in designing and maintaining schemas, but provide little or no direct support for users of the database. Some research has been done on mapping user models of a domain to the underlying database using ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
Semantic data models for database systems provide powerful tools to assist database administrators in designing and maintaining schemas, but provide little or no direct support for users of the database. Some research has been done on mapping user models of a domain to the underlying database using semantic schemas. Little has been done, however, on mapping conceptually meaningful data structures to a database lacking a semantic schema, or to a multi-database system that lacks a consistent semantic schema. We argue for the appropriateness of a knowledge representation language as a language for describing the database schema, user data structures, and the mapping between them; present a problem domain in which an existing relational database without a semantic schema must be accessed by a knowledge-based application; and describe our implementation of a system that provides access to a relational database from a KL-ONEstyle knowledge representation language. 1 Introduction The integra...
QGB: A System for Querying Sequence Database Fields and Features
, 1994
"... We have developed a general system, QGB, for performing complex queries on the information in the DDBJ/EMBL/GenBank databases, including queries over the structural features of sequences implied in the FEATURE TABLE. Queries are formed in an SQL-like syntax with language extensions to support comple ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We have developed a general system, QGB, for performing complex queries on the information in the DDBJ/EMBL/GenBank databases, including queries over the structural features of sequences implied in the FEATURE TABLE. Queries are formed in an SQL-like syntax with language extensions to support complex types (e.g., sets, ordered sets and records) appropriate for representing and querying sequence data. A novel aspect of QGB is its ability to deduce missing features and infer relationships among features as a consequence of constructing a parse tree of sequence structure from information described in the FEATURE TABLE. The grammar for the parse tree is implemented in a customized form of the Definite Clause Grammar syntax of the logic programming language Prolog. The logic grammar formalism was chosen because it provides a perspicuous representation for features and constraints, and Prolog provides an execution model for the grammar rules. Construction of the parse tree also identifies in...
Building and Sharing Large Knowledge Bases in Molecular Genetics
, 1993
"... Large volumes of genomic sequences are being produced at increasing rates for many living organisms. The computer analysis of these data, using numerical and symbolic methods, generates various and numerous objects (such as genes, signals, repetitive sequences, etc.), the status of which is often hy ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Large volumes of genomic sequences are being produced at increasing rates for many living organisms. The computer analysis of these data, using numerical and symbolic methods, generates various and numerous objects (such as genes, signals, repetitive sequences, etc.), the status of which is often hypothetical. Object-based knowledge models allow to describe these objects which are inter-related and sometimes organised in higher level structures, such as operons in the case of bacterial genomes. Annotations under the form of hypertext can be associated to these descriptions. Methodological knowledge on the analysis methods themselves can also be described. These capabilities are presented on the example of two operational object-oriented knowledge bases which have been designed in the context of a tight collabora - tion with the "Laboratoire de Biométrie, Génétique et Biologie des Populations" of Claude Bernard University in Lyon. It is argued that such knowledge bases can become a very...
Knowledge Discovery in GenBank
- In Proceedings of the First International Conference on Intelligent Systems for Molecular Biology
, 1993
"... We describe various methods designed to discover knowledge in the GenBank nucleic acid sequence database. Using a grammatical model of gene structure, we create a parse tree of a gene using features listed in the FEATURE TABLE. The parse tree infers features that are not explicitly listed, but which ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
We describe various methods designed to discover knowledge in the GenBank nucleic acid sequence database. Using a grammatical model of gene structure, we create a parse tree of a gene using features listed in the FEATURE TABLE. The parse tree infers features that are not explicitly listed, but which follow from the listed features. This method discovers 30% more introns and 40% more exons when applied to a globin gene subset of GenBank. Parse tree construction also entails resolving ambiguity and inconsistency within a FEATURE TABLE. We transform the parse tree into an augmented FEATURE TABLE that represents inferred gene structure explicitly and unambiguously, thereby greatly improving the utility of the FEATURE TABLE to researchers. We then describe various analogical reasoning techniques designed to exploit the homologous nature of genes. We build a classification hierarchy that reflects the evolutionary relationship between genes. Descriptive grammars of gene classes are then induc...
Analogical Reasoning for Knowledge Discovery in a Molecular Biology Database
- Association for Computing Machinery
, 1993
"... Genes are highly structured objects. Identifying genetic structure is experimentally costly and in general, is only partially determined in the lab. As such, genomic databases such as GenBank are vastly underspecified. We describe various Case Based Reasoning (CBR) techniques, designed to take advan ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Genes are highly structured objects. Identifying genetic structure is experimentally costly and in general, is only partially determined in the lab. As such, genomic databases such as GenBank are vastly underspecified. We describe various Case Based Reasoning (CBR) techniques, designed to take advantage of the homologous nature of genes, in order to discover regulatory features not present in the GenBank nucleic acid sequence database. We build a classification hierarchy reflecting the evolutionary relationship between genes. A sequence based CBR technique uses neighboring genes within the hierarchy as cases in order to predict the presence and location of regulatory features not listed in GenBank. Using a grammatical model of gene structure, we describe each gene with an instance grammar, and induce descriptive grammars of gene classes from these instance grammars. Descriptive grammars then serve as cases that enable us to identify regulatory features not present within the instance g...
Object-oriented Knowledge Bases for the Analysis of Prokaryotic and Eukaryotic Genomes
, 1993
"... example, the expressivity of E. coli genes can be inferred from complex computations applied to sequences. A query to retrieve highly expressed genes needs that the system "knows" the method to estimate expressivity. Object models are particularly adapted to manage and to integrate these two kinds o ..."
Abstract
- Add to MetaCart
example, the expressivity of E. coli genes can be inferred from complex computations applied to sequences. A query to retrieve highly expressed genes needs that the system "knows" the method to estimate expressivity. Object models are particularly adapted to manage and to integrate these two kinds of knowledge. The amount of biological sequences introduced in the general collections, and the growing complexity of the biological knowledge require the construction of models to formalize this knowledge and particularly the relationships between several data types. Two examples of such situations are presented here, they result from the biological research lead in our team in the field of molecular evolution. ColiGene is a modelling of E. coli genetics devoted to the analysis of relationships between genomic sequences and gene expressivity. MultiMap implements a new formalization of genome maps allowing manipulation of "maps of maps" in two species. Application of ColiGene and MultiMap a...
Object-Oriented Modelling in Molecular Biology
- Proceedings of the Artificial Intelligence and Genome WorkshoI, JCAI
, 1993
"... d of modelling in molecular biology and we have developed various tools to handle and study genomic sequences. We have been among the firsts to propose genomic data bases (Gautier et al., 1981), then to develop a Data Base Management System (DBMS) dedicated to the biological sequences: ACNUC (Gouy e ..."
Abstract
- Add to MetaCart
d of modelling in molecular biology and we have developed various tools to handle and study genomic sequences. We have been among the firsts to propose genomic data bases (Gautier et al., 1981), then to develop a Data Base Management System (DBMS) dedicated to the biological sequences: ACNUC (Gouy et al., 1985). In association with this data base, we have built the Analseq package (Jacobzone and Gautier, 1989) for sequence analysis. These two softwares are examples of systems in which it exist a strong separation between interrogation and analysis of the biological sequences. More recently, we have developed tools that integrate both biological and methodological knowledge. ColiGene is a modelling of E. coli genetics devoted to the analysis of relationships between genomic sequences and gene expressivity and MultiMap implements a formalization of genome maps allowing manipulation of localization informations with homology modelling in man and mouse. Modelling of biological knowledge

