Results 1 -
7 of
7
Multiple genome alignment by clustering pairwise matches
- RECOMB Comparative Genomics Satellite Workshop, Lecture Notes in Bioinformatics
, 2004
"... Abstract. We have developed a multiple genome alignment algorithm by using a sequence clustering algorithm to combine local pairwise genome sequence matches produced by pairwise genome alignments, e.g, BLASTZ. Sequence clustering algorithms often generate clusters of sequences such that there exists ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Abstract. We have developed a multiple genome alignment algorithm by using a sequence clustering algorithm to combine local pairwise genome sequence matches produced by pairwise genome alignments, e.g, BLASTZ. Sequence clustering algorithms often generate clusters of sequences such that there exists a common shared region among all sequences in each cluster. To use a sequence clustering algorithm for genome alignment, it is necessary to handle numerous local alignments between a pair of genomes. We propose a multiple genome alignment method that converts the multiple genome alignment problem to the sequence clustering problem. This method does not need to make a guide tree to determine the order of multiple alignment, and it accurately detects multiple homologous regions. As a result, our multiple genome alignment algorithm performs competitively over existing algorithms. This is shown using an experiment which compares the performance of TBA, MultiPipMaker (MPM) and our algorithm in aligning 12 groups of 56 microbial genomes and by evaluating the number of common COGs detected. 1
PLATCOM: a platform for computational comparative genomics
- Bioinformatics
, 2005
"... The exponential accumulation of genomic sequence data demands systematic analysis of genetic information and requires use of various computational approaches to handle such huge sets of genomic data. Comparative genomics, ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
The exponential accumulation of genomic sequence data demands systematic analysis of genetic information and requires use of various computational approaches to handle such huge sets of genomic data. Comparative genomics,
A.: Cluster utility: A new metric to guide sequence clustering
, 2004
"... Automatic sequence clustering has become increasingly important in analyzing the ever increasing number of biological sequences. Although there has been significant progress recently in developing high performance sequence clustering algorithms, correctly clustering a large number of sequences still ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Automatic sequence clustering has become increasingly important in analyzing the ever increasing number of biological sequences. Although there has been significant progress recently in developing high performance sequence clustering algorithms, correctly clustering a large number of sequences still remains a huge challenge. More often than not, the clusters generated end up being incorrect or fragmented. We have developed a new metric called the cluster utility to guide cluster splitting. We have illustrated the effectiveness of this technique by implementing it in the BAG clustering algorithm. Experiments with the entire COG database show that the proposed technique can effectively guide correct sequence clustering even while keeping the number of fragmented clusters significantly low.
Motif discovery from large number of sequences: A case study with disease resistance genes in arabidopsos thaliana. METMBS
, 2003
"... Motif discovery from a set of sequences is a very important problem in biology. Although a lot of research has been done on computational techniques for (sequence) motif discovery, discovering motifs in a large number of sequences still remains challenging. We propose a novel computational framework ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Motif discovery from a set of sequences is a very important problem in biology. Although a lot of research has been done on computational techniques for (sequence) motif discovery, discovering motifs in a large number of sequences still remains challenging. We propose a novel computational framework that combines multiple computational techniques such as pairwise sequence comparison, clustering, HMM based sequence search, motif finding, and block comparisons. We tested this computational framework in its ability to extract motifs from disease resistance genes and candidates in Arabidopsis thaliana genome and discovered all known motifs relating to disease resistance. When the same set of sequences was submitted to MEME and Pratt (motif discovery tools) as a whole without clustering, they failed to detect disease resistance gene motifs. The crucial component in this framework is clustering. Among the benefits of clustering is computational efficiency since the set of sequences are divided into smaller groups using a clustering algorithm. 1.
PLATCOM: Current Status and Plan for the Next Stages
"... We have been developing a system for comparing multiple genomes, PLATCOM, where users can choose genomes of their choice freely and perform analysis of the selected genomes with a suite of computational tools. PLATCOM is built on internal databases such as GenBank, COG, KEGG, and Pairwise Comparison ..."
Abstract
- Add to MetaCart
We have been developing a system for comparing multiple genomes, PLATCOM, where users can choose genomes of their choice freely and perform analysis of the selected genomes with a suite of computational tools. PLATCOM is built on internal databases such as GenBank, COG, KEGG, and Pairwise Comparison Database (PCDB) that contains all pairwise comparisons (97,034 entries) of protein sequence files (.faa) and whole genome sequence files (.fna) of 312 replicons. PCDB is designed to incorporate new genomes automatically, so that PLAT-COM can evolve as new genomes become available. PLATCOM is available at
Motif Discovery for Proteins Using Subsequence Clustering
"... We propose an algorithm for discovering motifs using clustering of subsequences. In our previous approach, we were successful in guiding motif discovery by sampling subsequences and inputting them to an existing motif discovery tool MEME. In this paper, we show that clustering subsequences can also ..."
Abstract
- Add to MetaCart
We propose an algorithm for discovering motifs using clustering of subsequences. In our previous approach, we were successful in guiding motif discovery by sampling subsequences and inputting them to an existing motif discovery tool MEME. In this paper, we show that clustering subsequences can also detect motifs without using other motif discovery tools. Generally, motif discovery algorithms do not perform well when the input set consists of nonhomogeneous sequences. Clustering tools have the inherent ability to generate clusters of homogeneous sequences when the input sequences are non-homogeneous. For this reason, we use our clustering algorithm to generate aligned subsequence clusters and then rank them according to their information contents to produce final motifs. The algorithm was tested with PROSITE database and the results suggest that the algorithm is very effective in finding motifs even when input sequences are from different protein families.
BMC Genomics BioMed Central Methodology article
, 2007
"... De novo identification of LTR retrotransposons in eukaryotic genomes ..."

