BAG: a graph theoretic sequence clustering algorithm (2003)
| Venue: | Int. J. Data Mining and Bioinformatics |
| Citations: | 12 - 8 self |
BibTeX
@ARTICLE{Kim03bag:a,
author = {Sun Kim},
title = {BAG: a graph theoretic sequence clustering algorithm},
journal = {Int. J. Data Mining and Bioinformatics},
year = {2003},
volume = {1},
pages = {178--200}
}
OpenURL
Abstract
Recently developed sequence clustering algorithms based on graph theory have been successful in clustering a large number of sequences into families of sequences of specific categories. In this paper, we present a new sequence clustering algorithm BAG based on graph theory. Our algorithm clusters sequences using two properties of graph, biconnected component and articulation point. As computation of biconnected components and articulation points is efficient, linear in relation to the number of vertices and edges, our algorithms are well suited for comparing a large number of proteins from multiple genomes. Our experiments with protein sequences from multiple genomes show that our algorithms generate families of high quality. For example, our algorithm correctly classified 3,306 predicted proteins from E. coli and H. influenzae into 1,427 families without human intervention. We also dicuss the importance of large scale sequence comparisons from our experience in clustering many different genomes, including Arabidopsis thaliana. 1







