Results 1 -
2 of
2
Combinatorial algorithms for DNA sequence assembly
- Algorithmica
, 1993
"... The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The seq ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
The trend towards very large DNA sequencing projects, such as those being undertaken as part of the human genome initiative, necessitates the development of efficient and precise algorithms for assembling a long DNA sequence from the fragments obtained by shotgun sequencing or other methods. The sequence reconstruction problem that we take as our formulation of DNA sequence assembly is a variation of the shortest common superstring problem, complicated by the presence of sequencing errors and reverse complements of fragments. Since the simpler superstring problem is NP-hard, any efficient reconstruction procedure must resort to heuristics. In this paper, however, a four phase approach based on rigorous design criteria is presented, and has been found to be very accurate in practice. Our method is robust in the sense that it can accommodate high sequencing error rates and list a series of alternate solutions in the event that several appear equally good. Moreover it uses a limited form ...
JOURNAL OF COMBINATORIAL THEORY (A) 18, 80-87 (1975) On Eulerian Circuits and Words with Prescribed Adjacency Patterns
, 1974
"... In the study of the biochemistry of the DNA molecule [l-3], of the statistical mechanics of large molecules in general [4], and elsewhere, one is led to postulate models of behavior in which the molecule is treated like a “word, ” and the individual bases arranged on the molecule are the “letters. ” ..."
Abstract
- Add to MetaCart
In the study of the biochemistry of the DNA molecule [l-3], of the statistical mechanics of large molecules in general [4], and elsewhere, one is led to postulate models of behavior in which the molecule is treated like a “word, ” and the individual bases arranged on the molecule are the “letters. ” A useful simplifying assumption is then that the informationcarrying properties of these molecules depend only on (a) the number of letters of each type and (b) a nearest-neighbor interaction in which the frequency of each letter pair is relevant, but triples,..., can be ignored. In such a model, one is soon led to consideration of the following purely combinatorial question: Let vi (i = l,..., n) be given positive integers, and let Vet (i, j = I,..., n) be given nonnegative integers. How many words can be made from an alphabet of n letters, in such a way that the letter i appears exactly vi times in the word (i = l,..., n), and exactly vii times the letter i is followed by the letter j (i, j = l,..., n)? We deal with this question both in the form given above, in which case we are able to give an exact, closed solution, and in the symmetric form, in which the matrix elements vii represent the number of occurrences of the unordered adjacent pair g in the word, so that vij = vij (i, j = l,..., n), where we cannot give a complete solution, but can only relate the solution to a well-known unsolved problem of considerable difficulty. With reference to the problem stated above, our solution is that the number of words satisfying the conditions is exactly- ’ det(v &- v&,~~, (1) *Research carried out under John Simon Guggenheim Memorial Fellowship.

