Results 1 -
6 of
6
Multi-Class Protein Fold Classification Using a New Ensemble Machine Learning Approach
, 2003
"... Protein structure classification represents an important process in understanding the associations between sequence and structure as well as possible functional and evolutionary relationships. ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Protein structure classification represents an important process in understanding the associations between sequence and structure as well as possible functional and evolutionary relationships.
HyperThesis: the gRNA spell on the curse of bioinformatics applications integration
- in Proc. 2003 ACM CIKM International Conference on Information and Knowledge Management (CIKM 2003
, 2003
"... In this paper, we describe a graphical workflow management system called HyperThesis to address the challenges of integrating bioinformatics applications. HyperThesis is an integral component of the Genomics Research Network Architecture (gRNA). The gRNA was designed and developed to address the cha ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper, we describe a graphical workflow management system called HyperThesis to address the challenges of integrating bioinformatics applications. HyperThesis is an integral component of the Genomics Research Network Architecture (gRNA). The gRNA was designed and developed to address the challenges of developing new bioinformatics applications. Specifically, HyperThesis makes constructing workflows (pipelines of execution of applications) in the gRNA fast and intuitive for biologists and bio-programmers alike. It provides a large repository of interconnectable, parameterized workflow components for processing and relating diverse biological data and software programs. It also enables us to add new workflow components as new algorithms develop in ones area of interest. HyperThesis has been fully implemented using Java.
Summary
"... In the last decades, biological databases became the major knowledge resource for researchers in the field of molecular biology. The distribution of information among these databases is one of the major problems. An overview about the subject area of data access and representation of protein and pro ..."
Abstract
- Add to MetaCart
In the last decades, biological databases became the major knowledge resource for researchers in the field of molecular biology. The distribution of information among these databases is one of the major problems. An overview about the subject area of data access and representation of protein and protein-protein interaction data within public biological databases is described. For a comprehensive and consistent way of searching and analysing integrated protein and protein-protein interaction data, the InSilico Proteomics (ISP) project has been initiated. Its three main objectives are (1) to provide an integrated knowledge pool for data investigation and global network analysis functions for a better understanding of a cell’s interactome, (2) employment of public data for plausibility analysis and validation of in-house experimental data and (3) testing the applicability of Microsoft’s.NET architecture for bioinformatics applications. Data integrated into the ISP database can be queried through the Web portal PRIMOS (PRotein Interaction and MOlecule Search) which is freely available at
An example of an Integrated Bioinformatic System to support detection/quantification of GMOs
, 2003
"... Introduction Few domains experienced an explosion of "data" production as Biology. Over the past two decades thousands of databases of biological knowledge have been produced ranging from very large initiatives, such as the Genome Project to specialized databases produced by small research groups a ..."
Abstract
- Add to MetaCart
Introduction Few domains experienced an explosion of "data" production as Biology. Over the past two decades thousands of databases of biological knowledge have been produced ranging from very large initiatives, such as the Genome Project to specialized databases produced by small research groups around the word coming from "in vivo", in "vitro" and in "silico" analysis. The need for tools able to manage a large biological knowledge base is growing [1]. Integrated bioinformatics platforms that make more effective use of repositories of structured molecular data integrated with bioinformatics analysis capabilities will be widely employed in the near future [2]. The need to construct such a kind of bioinformatics instrument came from the European Community in the 2000, when the European Network of Genetic Modified Organism (GMO) Laboratories (ENGL) was created. They needed of a Molecular Register containing data on molecular characterization of GMOs approved for placing on the market i
Multi-Class Protein Fold Classification Using a New Ensemble Machine Learning Approach
"... Protein structure classification represents an important process in understanding the associations between sequence and structure as well as possible functional and evolutionary relationships. Recent structural genomics initiatives and other high-throughput experiments have populated the biological ..."
Abstract
- Add to MetaCart
Protein structure classification represents an important process in understanding the associations between sequence and structure as well as possible functional and evolutionary relationships. Recent structural genomics initiatives and other high-throughput experiments have populated the biological databases at a rapid pace. The amount of structural data has made traditional methods such as manual inspection of the protein structure become impossible. Machine learning has been widely applied to bioinformatics and has gained a lot of success in this research area. This work proposes a novel ensemble machine learning method that improves the coverage of the classifiers under the multi-class imbalanced sample sets by integrating knowledge induced from different base classifiers, and we illustrate this idea in classifying multi-class SCOP protein fold data. We have compared our approach with PART and show that our method improves the sensitivity of the classifier in protein fold classification. Furthermore, we have extended this method to learning over multiple data types, preserving the independence of their corresponding data sources, and show that our new approach performs at least as well as the traditional technique over a single joined data source. These experimental results are encouraging, and can be applied to other bioinformatics problems similarly characterised by multi-class imbalanced data sets held in multiple data sources.
GeNS: a Biological Data Integration Platform
"... Abstract—The scientific achievements coming from molecular biology depend greatly on the capability of computational applications to analyze the laboratorial results. A comprehensive analysis of an experiment requires typically the simultaneous study of the obtained dataset with data that is availab ..."
Abstract
- Add to MetaCart
Abstract—The scientific achievements coming from molecular biology depend greatly on the capability of computational applications to analyze the laboratorial results. A comprehensive analysis of an experiment requires typically the simultaneous study of the obtained dataset with data that is available in several distinct public databases. Nevertheless, developing a centralized access to these distributed databases rises up a set of challenges such as: what is the best integration strategy, how to solve nomenclature clashes, how to solve database overlapping data and how to deal with huge datasets. In this paper we present GeNS, a system that uses a simple and yet innovative approach to address several biological data integration issues. Compared with existing systems, the main advantages of GeNS are related to its maintenance simplicity and to its coverage and scalability, in terms of number of supported databases and data types. To support our claims we present the current use of GeNS in two concrete applications. GeNS currently contains more than 140 million of biological relations and it can be publicly downloaded or remotely access through SOAP web services. Keywords—Data integration, biological databases I.

