• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Sequenced-based protein function prediction (0)

by B Poulin
Add To MetaCart

Tools

Sorted by:
Results 1 - 3 of 3

Improving protein function prediction using the hierarchical structure of the Gene Ontology

by Roman Eisner, Brett Poulin, Duane Szafron, Paul Lu, Russ Greiner - In Proc. IEEE CIBCB , 2005
"... Abstract—High performance and accurate protein function prediction is an important problem in molecular biology. Many contemporary ontologies, such as Gene Ontology (GO), have a hierarchical structure that can be exploited to improve the prediction accuracy, and lower the computational cost, of prot ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
Abstract—High performance and accurate protein function prediction is an important problem in molecular biology. Many contemporary ontologies, such as Gene Ontology (GO), have a hierarchical structure that can be exploited to improve the prediction accuracy, and lower the computational cost, of protein function prediction. We leverage the hierarchical structure of the ontology in two ways. First, we present a method of creating hierarchy-aware training sets for machine-learned classifiers and we show that, in the case of GO molecular function, it is the most accurate method compared to not considering the hierarchy during training. Second, we use the hierarchy to reduce the computational cost of classification. We also introduce a sound methodology for evaluating hierarchical classifiers using global cross-validation. Biologists often use sequence similarity (e.g. BLAST) to identify a “nearest neighbor ” sequence and use the database annotations of this neighbor to predict protein function. In these cases, we use the hierarchy to improve accuracy by a small amount. When no similar sequences can be found (which is true for up to 40 % of some common proteomes), our technique can improve accuracy by a more significant amount. Although this paper focuses on a specific important application—protein function prediction for the GO hierarchy—the techniques may be applied to any classification problem over a hierarchical ontology. I.

A Review of Performance Evaluation Measures for Hierarchical Classifiers

by Eduardo P. Costa, Depto Ciências De Computação, Icmc/usp São Carlos, Ana C. Lorena, Icmc/usp São Carlos
"... Criteria for evaluating the performance of a classifier are an important part in its design. They allow to estimate the behavior of the generated classifier on unseen data and can be also used to compare its performance against the performance of classifiers generated by other classification algorit ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Criteria for evaluating the performance of a classifier are an important part in its design. They allow to estimate the behavior of the generated classifier on unseen data and can be also used to compare its performance against the performance of classifiers generated by other classification algorithms. There are currently several performance measures for binary and flat classification problems. For hierarchical classification problems, where there are multiple classes which are hierarchically related, the evaluation step is more complex. This paper reviews the main evaluation metrics proposed in the literature to evaluate hierarchical classification models.

Pathway Analyst—Automating Biochemical Pathway Prediction

by Luca Pireddu, Marcel St, Pe G, Luca Pireddu, Duane Szafron, Mike Deyholos, Paul Lu, Russ Greiner
"... Permission is hereby granted to the University of Alberta Library to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. The author reserves all other publication and other rights in association with the copyright in the ..."
Abstract - Add to MetaCart
Permission is hereby granted to the University of Alberta Library to reproduce single copies of this thesis and to lend or sell such copies for private, scholarly or scientific research purposes only. The author reserves all other publication and other rights in association with the copyright in the thesis, and except as herein before provided, neither the thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatever without the author’s prior written permission.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University