Results 1 
4 of
4
PatternHunter: faster and more sensitive homology search
 BIOINFORMATICS
, 2002
"... Motivation: Genomics and proteomics studies routinely depend on homology searches based on the strategy of finding short seed matches which are then extended. The exploding genomic data growth presents a dilemma for DNA homology search techniques: increasing seed size decreases sensitivity whereas d ..."
Abstract

Cited by 267 (24 self)
 Add to MetaCart
Motivation: Genomics and proteomics studies routinely depend on homology searches based on the strategy of finding short seed matches which are then extended. The exploding genomic data growth presents a dilemma for DNA homology search techniques: increasing seed size decreases sensitivity whereas decreasing seed size slows down computation. Results: We present a new homology search algorithm "PatternHunter" that uses a novel seed model for increased sensitivity and new hitprocessing techniques for significantly increased speed. At Blast levels of sensitivity, PatternHunter is able to find homologies between sequences as large as human chromosomes, in mere hours on a desktop. Availability: PatternHunter is available at
On Spaced Seeds for Similarity Search
 Discrete Appl. Math
, 2002
"... Genomics studies routinely depend on similarity searches based on the strategy of finding short seed matches (contiguous k bases) which are then extended. The particular choice of the seed length, k, is determined by the tradeoff between search speed (larger k reduces chance hits) and sensitivity ..."
Abstract

Cited by 62 (11 self)
 Add to MetaCart
(Show Context)
Genomics studies routinely depend on similarity searches based on the strategy of finding short seed matches (contiguous k bases) which are then extended. The particular choice of the seed length, k, is determined by the tradeoff between search speed (larger k reduces chance hits) and sensitivity (smaller k finds weaker similarities). A novel idea of using a single deterministic optimized spaced seed was introduced in [10] to the above similarity search process and it was empirically demonstrated that the optimal spaced seed quadruples the search speed, without sacrificing sensitivity.
Clustering of Database Sequences for Fast Homology Search Using Upper Bounds on Alignment Score
"... Homology data are among the most important information used to predict the functions of unknown proteins and thus fast and accurate methods are needed. In this paper, we propose a new approach for fast and accurate homology search using precomputed allagainstall similarity scores in a target data ..."
Abstract
 Add to MetaCart
(Show Context)
Homology data are among the most important information used to predict the functions of unknown proteins and thus fast and accurate methods are needed. In this paper, we propose a new approach for fast and accurate homology search using precomputed allagainstall similarity scores in a target database. We previously developed a method for derivation of an upper bound of the SmithWaterman score (SWscore) between a query and a homolog candidate sequence using the SWscore between the candidate and a sequence similar to the query. In this paper, by using this upper bound, we first cluster the sequences in the target database so that upper bounds of SWscores for all the members in the clusters are less than a given value and select representative sequences for respective clusters. Then, the query sequence is searched against the representative sequences and the upper bounds of SWscores for respective clusters are estimated. Only if the upper bound is higher than a given threshold, SWalignments are computed for all the sequences in the cluster. We performed computational experiments to test efficiency of the proposed method for the KEGG/GENES database using the KEGG/SSDB. The results suggest that our method is efficient for redundant databases that include multiple closely related species.