Results 1 
2 of
2
Amino Acid Classification and Hash Seeds for Homology Search
 BICOB
, 2009
"... Spaced seeds have been extensively studied in the homology search field. A spaced seed can be regarded as a very special type of hash function on kmers, where two kmers have the same hash value if and only if they are identical at the w (w
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Spaced seeds have been extensively studied in the homology search field. A spaced seed can be regarded as a very special type of hash function on kmers, where two kmers have the same hash value if and only if they are identical at the w (w <k) positions designated by the seed. Spaced seeds substantially increased the homology search sensitivity. It is then a natural question to ask whether there is a better hash function (called hash seed) that provides better sensitivity than the spaced seed. We study this question in the paper. We propose a strategy to classify amino acids, which leads to a better hash seed. Our results raise a new question about how to design the best hash seed.
Efficient computation of spaced seeds
 BMC RESEARCH NOTES
, 2012
"... Background: The most frequently used tools in bioinformatics are those searching for similarities, or local alignments, between biological sequences. Since the exact dynamic programming algorithm is quadratic, lineartime heuristics such as BLAST are used. Spaced seeds are much more sensitive than th ..."
Abstract
 Add to MetaCart
Background: The most frequently used tools in bioinformatics are those searching for similarities, or local alignments, between biological sequences. Since the exact dynamic programming algorithm is quadratic, lineartime heuristics such as BLAST are used. Spaced seeds are much more sensitive than the consecutive seed of BLAST and using several seeds represents the current state of the art in approximate search for biological sequences. The most important aspect is computing highly sensitive seeds. Since the problem seems hard, heuristic algorithms are used. The leading software in the common Bernoulli model is the SpEED program. Findings: SpEED uses a hill climbing method based on the overlap complexity heuristic. We propose a new algorithm for this heuristic that improves its speed by over one order of magnitude. We use the new implementation to compute improved seeds for several software programs. We compute as well multiple seeds of the same weight as MegaBLAST, that greatly improve its sensitivity. Conclusion: Multiple spaced seeds are being successfully used in bioinformatics software programs. Enabling researchers to compute very fast high quality seeds will help expanding the range of their applications.