## Reconstructing Strings from Substrings (1993)

Venue: | Journal of Computational Biology |

Citations: | 30 - 2 self |

### BibTeX

@ARTICLE{Skiena93reconstructingstrings,

author = {Steven S. Skiena and Gopalakrishnan Sundaram},

title = {Reconstructing Strings from Substrings},

journal = {Journal of Computational Biology},

year = {1993},

volume = {2},

pages = {333--353}

}

### Years of Citing Articles

### OpenURL

### Abstract

this paper, we consider a variety of problems with application to sequencing by hybridization. First, we develop a theory of interactive sequencing by hybridization, based on

### Citations

10964 |
Computers and Intractability: A Guide to the Theory of NP-Completeness
- Garey, Johnson
- 1979
(Show Context)
Citation Context ...in step 3 is NP-complete, all our weights are bounded by l 0 , which at most the length n of the walk. Hence, we can use the standard dynamic programming algorithm to solve this knapsack in O(l 0 2 ) =-=[10]-=-. With at most l 0 instances of knapsack, step 3 can be performed in O(l 0 3 ) time. Testing whether there is a unique way to express l 0 as a integer linear combination of elements of U 00 is the bot... |

185 |
Constructing optimal binary decision trees is NPcomplete
- Hyafil, Rivest
- 1976
(Show Context)
Citation Context ...ast one test T k for each pair of candidates C i and C j such that C i 2 T k and C j 2 C \Gamma T k . Such a decision strategy is given by a decision tree of height at least dlg ne. Hyafil and Rivest =-=[13]-=- proved that the problem of constructing a minimum height or minimum path-length decision tree is NP-complete. Despite this result, there is some hope for being able to construct optimal decision tree... |

102 |
A combinatorial problem
- BRUIJN
- 1946
(Show Context)
Citation Context ...results of a classical sequencing chip experiment. In particular, Pevzner's algorithm for sequencing chip reconstruction [25] is based on finding Eulerian paths in a subgraph of the de Bruijn digraph =-=[5]-=-. For a given alphabet \Sigma and length k, the de Bruijn digraph G k (\Sigma) will contain j\Sigmaj k\Gamma1 vertices, each corresponding to a (k \Gamma 1)-length string on \Sigma. As shown in Figure... |

99 |
String overlaps, pattern matching, and nontransitive games
- Guibas, Odlyzko
- 1981
(Show Context)
Citation Context ...gs, f11; 12; 21; 22g, or all binary strings of length six. For each node in the tree, we seek a substring query which partitions the set of candidate strings as evenly as possible. Guibas and Odlyzko =-=[12]-=- and Wilf [30] consider the problem of counting the number of strings with a given set of substrings and forbidden substrings. However, the resulting formulae are far too cumbersome apply to construct... |

73 | Linear approximation of shortest superstrings
- Blum, Jiang, et al.
- 1994
(Show Context)
Citation Context ...t superstring containing each of a given set of strings is known to be NP-complete [9]. Thus finding an optimal solution is computationally intractable, even though approximation algorithms are known =-=[3, 14, 18]-=-. However, efficient algorithms do exist for finding the shortest string consistent with the results of a classical sequencing chip experiment. In particular, Pevzner's algorithm for sequencing chip r... |

66 | 1-Tuple DNA sequencing: computer analysis - Pevzner - 1989 |

63 |
Decision Trees and Diagrams
- Moret
- 1982
(Show Context)
Citation Context ...ubstring of length n and none of length n + 1. However, we seek to determine S using as few questions as possible. Any interactive strategy for determining strings can be specified by a decision tree =-=[21]-=-. A decision tree is a rooted binary tree, where each internal node is labeled by a substring query and each leaf by a candidate string. For each node, all leaf nodes of the left subtree contain the g... |

47 |
A novel method for nucleic acid sequence determination
- Bains, Smith
- 1988
(Show Context)
Citation Context ...termine the exact contents of S using as few queries as possible. Although this tale is perhaps over-dramatic, it is not completely inaccurate. The problem arises in sequencing by hybridization (SBH) =-=[2, 6, 8, 19, 24]-=-, a new and promising approach to DNA sequencing which offers the potential of reduced cost and higher throughput over traditional gel-based approaches. The basic sequencing by hybridization procedure... |

43 |
On finding minimal length superstrings
- Gallant, Maier, et al.
- 1980
(Show Context)
Citation Context ...ill all be 0 or 1, so each m-nucleotide fragment of S is unambiguously identified. The problem of finding the shortest superstring containing each of a given set of strings is known to be NP-complete =-=[9]-=-. Thus finding an optimal solution is computationally intractable, even though approximation algorithms are known [3, 14, 18]. However, efficient algorithms do exist for finding the shortest string co... |

40 |
Normal recurring decimals,” The
- Good
- 1946
(Show Context)
Citation Context ...lphabet f0; 1g. Let D(ff; m) denote the set of distinct de Bruijn sequences on ff. It is well known [5, 26] that jD(ff; m)j = ((ff \Gamma 1)!) ff m\Gamma1 ff ff m\Gamma1 \Gammam = 1 n (ff!) n=ff Good =-=[11]-=- demonstrated how to construct the sequences of D(ff; m), by building a directed graph G(ff; m) where the vertex set of G(ff; m) represents each string of length m \Gamma 1 on ff. Create a directed ed... |

23 |
Improved chips for sequencing by hybridization
- Pevzner, Lysov, et al.
- 1991
(Show Context)
Citation Context ...at large sequencing chips are needed to reconstruct relatively short strands of DNA. For example, the classical chip C(8) suffices to reconstruct 200 nucleotide long sequences in only 94 of 100 cases =-=[23]-=-, even in error-free experiments. However, additional information about the sequence is often available, in particular its length. We show that length can be used to help disambiguate the sequence. Fo... |

21 |
On the complexity of edge traversing
- Papadimitriou
- 1976
(Show Context)
Citation Context ...alk is known as the Chinese postman problem [17]. Polynomial algorithms based on bipartite matching exist for directed and undirected graphs, [7], although the problem is NP-complete for mixed graphs =-=[22]-=-. For a digraph G, there may exist positive integers l such that a postman walk of length l either (1) does not exist, (2) exists and is unique, or (3) exists and non-unique. For example, the graph in... |

20 |
spatially addressable parallel chemical synthesis
- Light-directed
- 1991
(Show Context)
Citation Context ...termine the exact contents of S using as few queries as possible. Although this tale is perhaps over-dramatic, it is not completely inaccurate. The problem arises in sequencing by hybridization (SBH) =-=[2, 6, 8, 19, 24]-=-, a new and promising approach to DNA sequencing which offers the potential of reduced cost and higher throughput over traditional gel-based approaches. The basic sequencing by hybridization procedure... |

18 |
Likelihood DNA sequencing by hybridization
- Lipshutz
- 1993
(Show Context)
Citation Context ...termine the exact contents of S using as few queries as possible. Although this tale is perhaps over-dramatic, it is not completely inaccurate. The problem arises in sequencing by hybridization (SBH) =-=[2, 6, 8, 19, 24]-=-, a new and promising approach to DNA sequencing which offers the potential of reduced cost and higher throughput over traditional gel-based approaches. The basic sequencing by hybridization procedure... |

17 |
Algorithms for finding k-best perfect matchings
- Chegireddy, Hamacher
- 1987
(Show Context)
Citation Context ...test path between them. The minimum postman walk is unique if and only if there does not exist more than one minimum weight matching in G 0 . Finding the K best matchings can be done in O(Kn 3 ) time =-=[4]-=-. Hence, in O(jV j 3 ) time we can find a unique minimum postman walk if there exists one. If the minimum postman walk is not unique, then by Lemma 19 there does not exist a unique postman walk W in G... |

16 |
Graphic programming using odd or even points
- Kwan
- 1962
(Show Context)
Citation Context ...f length l, containing all k-strings in A if and only if there exists a unique postman walk of length l = n \Gamma k + 1 in G. Finding the minimum postman walk is known as the Chinese postman problem =-=[17]-=-. Polynomial algorithms based on bipartite matching exist for directed and undirected graphs, [7], although the problem is NP-complete for mixed graphs [22]. For a digraph G, there may exist positive ... |

16 |
Ulam’s searching game with a fixed number of lies
- Spencer
- 1992
(Show Context)
Citation Context ...a given substring query may be reported incorrectly, as is the case with real-life sequencing by hybridization. The related problem of searching a sorted list with "lies" has been extensivel=-=y studied [28]-=-. ffl Sequencing chips perform all queries in parallel. How many rounds does it take to determine an unknown string when we can make f(n; ff) queries per round? Parallelizing the strategy of Theorem 2... |

14 |
United Kingdom patent application GB8810400
- Southern
- 1988
(Show Context)
Citation Context ...ybridization techniques, although the approach was proposed independently by several groups, including Bains and Smith [2], Drmanac and Crkvenjakov [6], Lysov, et.al [19], Macevicz [20], and Southern =-=[27]-=-. More recently, Crkvenjakov's and Drmanac's laboratories report sequencing a 340 base-pair fragment in a blind experiment [24]. In the classical sequencing chip C(m), all 4 m single-stranded oligonuc... |

10 |
Towards a DNA sequencing theory
- Li
- 1990
(Show Context)
Citation Context ...t superstring containing each of a given set of strings is known to be NP-complete [9]. Thus finding an optimal solution is computationally intractable, even though approximation algorithms are known =-=[3, 14, 18]-=-. However, efficient algorithms do exist for finding the shortest string consistent with the results of a classical sequencing chip experiment. In particular, Pevzner's algorithm for sequencing chip r... |

9 |
Approximating shortest superstrings with constraints
- Jiang, Li
- 1994
(Show Context)
Citation Context ...t superstring containing each of a given set of strings is known to be NP-complete [9]. Thus finding an optimal solution is computationally intractable, even though approximation algorithms are known =-=[3, 14, 18]-=-. However, efficient algorithms do exist for finding the shortest string consistent with the results of a classical sequencing chip experiment. In particular, Pevzner's algorithm for sequencing chip r... |

8 |
FW: DNA sequencing by primer walking with strings of contiguous hexamers
- Kieleczawa, JJ, et al.
- 1992
(Show Context)
Citation Context ...rial explosion prohibits storing all 4 k primers for even modest-sized k, and synthesizing primers is expensive and difficult. However, Kieleczawa, Dunn, and Studier's recent primer walking technique =-=[15]-=- suggests that strings of three to four hexamers can be used to construct large probe strings cheaply. 3 Reconstructing Unknown Strings In this section, we consider the problem of reconstructing strin... |

7 |
DNA sequencing by hybridization: 100 bases read by a non-gel-based method Proc Natl Acad Sci USA
- Strezoska, Radosavljevic, et al.
- 1991
(Show Context)
Citation Context ...ybridization (SBH) is a new and promising approach to DNA sequencing which offers the potential of reduced cost and higher throughput over traditional gel-based approaches. In 1991, Strezoska, et.al. =-=[29]-=- accurately sequenced 100 base pairs of a known sequence using hybridization techniques, although the approach was proposed independently by several groups, including Bains and Smith [2], Drmanac and ... |

6 |
On the Number of Queries Necessary to Identify a Permutation
- Ko, Teng
- 1986
(Show Context)
Citation Context ...ting can be considered as a two person query game in which we seek to find an unknown permutation f over f1; . . . ; ng by asking queries of the form `Is f \Gamma1 (i) ! f \Gamma1 (j) ?'. Ko and Teng =-=[16]-=- consider a generalization of the game Mastermind, specifically the problem of identifying an unknown permutation f by asking permutation queries, in which the adversary replies in how many places f a... |

5 |
DNA sequencing by hybridization. Yugoslav Patent Application 570
- Dramanac, Crkvenjakov
- 1987
(Show Context)
Citation Context |

5 |
Sainte-Marie. Solution to question nr. 48. L’Intermédiaire des Mathématiciens
- Flye
- 1894
(Show Context)
Citation Context ...0000100110101110 and 0000101001101110 are two distinct de Bruijn sequences of span 4 on the binary alphabet f0; 1g. Let D(ff; m) denote the set of distinct de Bruijn sequences on ff. It is well known =-=[5, 26]-=- that jD(ff; m)j = ((ff \Gamma 1)!) ff m\Gamma1 ff ff m\Gamma1 \Gammam = 1 n (ff!) n=ff Good [11] demonstrated how to construct the sequences of D(ff; m), by building a directed graph G(ff; m) where t... |

4 |
Decision trees for geometric objects
- Arkin, Meijer, et al.
- 1998
(Show Context)
Citation Context ...truct optimal decision trees for special types of models and queries. For example, optimal decision trees for non-degenerate polygonal models all sharing a common point can be efficiently constructed =-=[1]-=-, although the problem becomes hard if either the common point or degeneracy assumption is removed. In this section, we show that the minimum height decision tree problem remains NPcomplete for string... |

4 |
Strings, substrings, and the nearest integer function
- Wilf
- 1987
(Show Context)
Citation Context ...1; 22g, or all binary strings of length six. For each node in the tree, we seek a substring query which partitions the set of candidate strings as evenly as possible. Guibas and Odlyzko [12] and Wilf =-=[30]-=- consider the problem of counting the number of strings with a given set of substrings and forbidden substrings. However, the resulting formulae are far too cumbersome apply to constructing large deci... |

1 |
the Chinese postman problem
- Matching
- 1973
(Show Context)
Citation Context ...gth l = n \Gamma k + 1 in G. Finding the minimum postman walk is known as the Chinese postman problem [17]. Polynomial algorithms based on bipartite matching exist for directed and undirected graphs, =-=[7]-=-, although the problem is NP-complete for mixed graphs [22]. For a digraph G, there may exist positive integers l such that a postman walk of length l either (1) does not exist, (2) exists and is uniq... |