Results 1 
8 of
8
Ultrafast and memoryefficient alignment of short DNA sequences to the human genome
 GENOME BIOLOGY
, 2009
"... ..."
Hardness of optimal spaced seed design
 PARK (EDS.), PROCEEDINGS OF THE 16TH ANNUAL SYMPOSIUM ON COMBINATORIAL PATTERN MATCHING (CPM’05)
, 2005
"... Speeding up approximate pattern matching is a line of research in stringology since the 80’s. Practically fast approaches belong to the class of filtration algorithms, in which text regions dissimilar to the pattern are first excluded, and the remaining regions are then compared to the pattern by dy ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Speeding up approximate pattern matching is a line of research in stringology since the 80’s. Practically fast approaches belong to the class of filtration algorithms, in which text regions dissimilar to the pattern are first excluded, and the remaining regions are then compared to the pattern by dynamic programming. Among the conditions used to test similarity between the regions and the pattern, many require a minimum number of common substrings between them. When only substitutions are taken into account for measuring dissimilarity, counting spaced subwords instead of substrings improves the filtration efficiency. However, a preprocessing step is required to design one or more patterns, called spaced seeds (or gapped seeds), for the subwords, depending on the search parameters. Two distinct lines of research appear the literature: one with probabilistic formulations of seed design problems, in which one wishes for instance to compute a seed with the highest probability to detect the desired similarities (lossy filtration), a second line with combinatorial formulations, where the goal is to find a seed that detects all or a maximum number
Subset Seed Automaton
 in "12th International Conference on Implementation and Application of Automata (CIAA 07)", Lecture Notes in Computer Science
"... Abstract. We study the pattern matching automaton introduced in [1] for the purpose of seedbased similarity search. We show that our definition provides a compact automaton, much smaller than the one obtained by applying the AhoCorasick construction. We study properties of this automaton and prese ..."
Abstract

Cited by 9 (7 self)
 Add to MetaCart
Abstract. We study the pattern matching automaton introduced in [1] for the purpose of seedbased similarity search. We show that our definition provides a compact automaton, much smaller than the one obtained by applying the AhoCorasick construction. We study properties of this automaton and present an efficient implementation of the automaton construction. We also present some experimental results and show that this automaton can be successfully applied to more general situations. inria00170414, version 1 7 Sep 2007 1
Superiority of Spaced Seeds for Homology Search
 TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (TCBB)
, 2007
"... In homology search, good spaced seeds have higher sensitivity for the same cost (weight). However, elucidating the mechanism that confers power to spaced seeds and characterizing optimal spaced seeds still remain unsolved. This paper investigates these two important open questions by formally analyz ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
In homology search, good spaced seeds have higher sensitivity for the same cost (weight). However, elucidating the mechanism that confers power to spaced seeds and characterizing optimal spaced seeds still remain unsolved. This paper investigates these two important open questions by formally analyzing the average number of nonoverlapping hits and the hit probability of a spaced seed in the Bernoulli sequence model. We prove that when the length of a nonuniformly spaced seed is bounded above by an exponential function of the seed weight, the seed outperforms strictly the traditional consecutive seed of the same weight in both (i) the average number of nonoverlapping hits and (ii) the asymptotic hit probability. This clearly answers the first problem mentioned above in the Bernoulli sequence model. The theoretical study in this paper also gives a new solution to finding long optimal seeds.
ReferenceBased Alignment in Large Sequence Databases
"... This paper introduces a novel method, called ReferenceBased String Alignment (RBSA), that speeds up retrieval of optimal subsequence matches in large databases of sequences under the edit distance and the SmithWaterman similarity measure. RBSA operates using the assumption that the optimal match d ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
This paper introduces a novel method, called ReferenceBased String Alignment (RBSA), that speeds up retrieval of optimal subsequence matches in large databases of sequences under the edit distance and the SmithWaterman similarity measure. RBSA operates using the assumption that the optimal match deviates by a relatively small amount from the query, an amount that does not exceed a prespecified fraction of the query length. RBSA has an exact version that guarantees no false dismissals and can handle large queries efficiently. An approximate version of RBSA is also described, that achieves significant additional improvements over the exact version, with negligible losses in retrieval accuracy. RBSA performs filtering of candidate matches using precomputed alignment scores between the database sequence and a set of fixedlength reference sequences. At query time, the query sequence is partitioned into segments of length equal to that of the reference sequences. For each of those segments, the alignment scores between the segment and the reference sequences are used to efficiently identify a relatively small number of candidate subsequence matches. An alphabet collapsing technique is employed to improve the pruning power of the filter step. In our experimental evaluation, RBSA significantly outperforms stateoftheart biological sequence alignment methods, such as qgrams, BLAST, and BWT. 1.
RESEARCH ARTICLE Open Access BOND: Basic OligoNucleotide Design
"... Background: DNA microarrays have become ubiquitous in biological and medical research. The most difficult problem that needs to be solved is the design of DNA oligonucleotides that (i) are highly specific, that is, bind only to the intended target, (ii) cover the highest possible number of genes, th ..."
Abstract
 Add to MetaCart
Background: DNA microarrays have become ubiquitous in biological and medical research. The most difficult problem that needs to be solved is the design of DNA oligonucleotides that (i) are highly specific, that is, bind only to the intended target, (ii) cover the highest possible number of genes, that is, all genes that allow such unique regions, and (iii) are computed fast. None of the existing programs meet all these criteria. Results: We introduce a new approach with our software program BOND (Basic OligoNucleotide Design). According to Kane’s criteria for oligo design, BOND computes highly specific DNA oligonucleotides, for all the genes that admit unique probes, while running orders of magnitude faster than the existing programs. The same approach enables us to introduce also an evaluation procedure that correctly measures the quality of the oligonucleotides. Extensive comparison is performed to prove our claims. BOND is flexible, easy to use, requires no additional software, and is freely available for noncommercial use from
20120601 11h45 M5A8 109 Comparaisons de séquences musicales symboliques Mathieu Giraud Mathieu Giraud Corentin Bertiaux Anthony Lerouge
"... Merci pour les nombreux PJIs encadrées cette année!! n’hesitez pas à en proposer encore plus l’année prochaine:) Laurent Année 20112012J’avais fini l’année dernière par: Merci pour les nombreux PJIs encadrées cette année!! n’hesitez pas à en proposer encore plus l’année prochaine:) Donc je peux r ..."
Abstract
 Add to MetaCart
Merci pour les nombreux PJIs encadrées cette année!! n’hesitez pas à en proposer encore plus l’année prochaine:) Laurent Année 20112012J’avais fini l’année dernière par: Merci pour les nombreux PJIs encadrées cette année!! n’hesitez pas à en proposer encore plus l’année prochaine:) Donc je peux recommencer, en y rajoutant désormais: Merci aux présidents permanents et ponctuels cette année!! n’hesitez pas à en prendre encore plus (module) l’année prochaine:) Laurent Année 20112012Vendredi 1 er juin
Efficient computation of spaced seeds
 BMC RESEARCH NOTES
, 2012
"... Background: The most frequently used tools in bioinformatics are those searching for similarities, or local alignments, between biological sequences. Since the exact dynamic programming algorithm is quadratic, lineartime heuristics such as BLAST are used. Spaced seeds are much more sensitive than th ..."
Abstract
 Add to MetaCart
Background: The most frequently used tools in bioinformatics are those searching for similarities, or local alignments, between biological sequences. Since the exact dynamic programming algorithm is quadratic, lineartime heuristics such as BLAST are used. Spaced seeds are much more sensitive than the consecutive seed of BLAST and using several seeds represents the current state of the art in approximate search for biological sequences. The most important aspect is computing highly sensitive seeds. Since the problem seems hard, heuristic algorithms are used. The leading software in the common Bernoulli model is the SpEED program. Findings: SpEED uses a hill climbing method based on the overlap complexity heuristic. We propose a new algorithm for this heuristic that improves its speed by over one order of magnitude. We use the new implementation to compute improved seeds for several software programs. We compute as well multiple seeds of the same weight as MegaBLAST, that greatly improve its sensitivity. Conclusion: Multiple spaced seeds are being successfully used in bioinformatics software programs. Enabling researchers to compute very fast high quality seeds will help expanding the range of their applications.