Results 1 - 10
of
56
Four Strikes against Physical Mapping of DNA
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 1993
"... Physical Mapping is a central problem in molecular biology ... and the human genome project. The problem is to reconstruct the relative position of fragments of DNA along the genome from information on their pairwise overlaps. We show that four simplified models of the problem lead to NP-complete ..."
Abstract
-
Cited by 46 (8 self)
- Add to MetaCart
Physical Mapping is a central problem in molecular biology ... and the human genome project. The problem is to reconstruct the relative position of fragments of DNA along the genome from information on their pairwise overlaps. We show that four simplified models of the problem lead to NP-complete decision problems: Colored unit interval graph completion, the maximum interval (or unit interval) subgraph, the pathwidth of a bipartite graph, and the k-consecutive ones problem for k >= 2. These models have been chosen to reflect various features typical in biological data, including false negative and positive errors, small width of the map and chimericism.
Physical Mapping of Chromosomes: A Combinatorial Problem in Molecular Biology
- Algorithmica
, 1993
"... This paper is concerned with algorithms for the reassembly process. ..."
Abstract
-
Cited by 46 (5 self)
- Add to MetaCart
This paper is concerned with algorithms for the reassembly process.
On the Complexity of DNA Physical Mapping
, 1994
"... The Physical Mapping Problem is to reconstruct the relative position of fragments (clones) of DNA along the genome from information on their pairwise overlaps. We show that two simplified versions of the problem belong to the class of NP-complete problems, which are conjectured to be computationa ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
The Physical Mapping Problem is to reconstruct the relative position of fragments (clones) of DNA along the genome from information on their pairwise overlaps. We show that two simplified versions of the problem belong to the class of NP-complete problems, which are conjectured to be computationally intractable. In one version all clones have equal length, and in another, clone lengths may be arbitrary. The proof uses tools from graph theory and complexity.
A new algorithm for DNA sequence assembly
- Journal of Computational Biology
, 1995
"... Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a ‘ well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are base ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
Since the advent of rapid DNA sequencing methods in 1976, scientists have had the problem of inferring DNA sequences from sequenced fragments. Shotgun sequencing is a ‘ well-established biological and computational method used in practice. Many conventional algorithms for shotgun sequencing are based on the notion.of pairwisk fragment overlap. * While shotgun sequencing infers a DNA sequence given the sequences of overlapping frag-ments, a recent and complementary method, called sequencing by hybridization (SBH), in-fers a DNA sequence given the set of oligomers that represents all subwords of some fixed length, k. In this paper,. we propose a new computer algorithm for DNA sequence assembly that combines in a novel way the techniques of both shotgun and SBH methods. Based on our preliminary investigations, the algorithm promises- to be very fast and practical for DNA sequence assembly.
Toward Simplifying and Accurately Formulating Fragment Assembly
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 1995
"... The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequence ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the 2-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a non-cyclic subgraph with certain properties and the objectives of being shortest or maximally-likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is ...
Genomics via Optical Mapping II: Ordered Restriction Maps
- Journal of Computational Biology
, 1996
"... In this paper, we describe our algorithmic approach to constructing ordered restriction maps based on the data created from the images of population of individual DNA molecules (clones) digested by restriction enzymes. The goal is to devise map-making algorithms capable of producing high-resolution, ..."
Abstract
-
Cited by 29 (17 self)
- Add to MetaCart
In this paper, we describe our algorithmic approach to constructing ordered restriction maps based on the data created from the images of population of individual DNA molecules (clones) digested by restriction enzymes. The goal is to devise map-making algorithms capable of producing high-resolution, high-accuracy maps rapidly and in a scalable manner. The resulting software is a key component of our optical mapping automation tools and has been used routinely to map cosmid, lambda and BAC clones. The experimental results appear highly promising. 1 Genomics and Optical Mapping Optical mapping [CAH+95, CJI+96, HRL+95, JRH+96, MBC+95, SCH+95, SLH+93, WHS95] is a single molecule methodology for the rapid production of ordered restriction maps from individual DNA molecules. Ordered restriction maps were constructed originally from yeast chromosomes by using fluorescence microscopy to visualize restriction endonuclease cutting events on individual fluorochrome-stained DNA molecules [SCH+95,...
Human whole-genome shotgun sequencing
- Genome Research
, 1997
"... Large-scale sequencing of the human genome is now under way (Boguski et al. 1996; Marshall and Pennisi 1996). Although at the beginning of the Genome Project, many doubted the scientific value of sequencing the entire human genome, these doubts have evaporated almost entirely (Gibbs 1995; Olson 1995 ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
Large-scale sequencing of the human genome is now under way (Boguski et al. 1996; Marshall and Pennisi 1996). Although at the beginning of the Genome Project, many doubted the scientific value of sequencing the entire human genome, these doubts have evaporated almost entirely (Gibbs 1995; Olson 1995). Primary reasons for generating the human genomic sequence are listed in Table 1. The approach being taken for human genomic sequencing is the same as that used for the Saccharomyces cerevisiae and Caenorhabditis elegans genomes, namely construction of overlapping arrays of large insert Escherichia coli clones, followed by complete sequencing of these clones one at a time. In this article, we outline an alternative approach to sequencing the human and other large genomes, which we argue is less costly and more informative than the clone-by-clone approach. A Plan for Human Whole-Genome Shotgun Sequencing Although there are many conceivable variations, the crux of our plan involves high-quality, semiautomated sequencing from both ends of very large numbers of randomly selected human genomic DNA fragments. DNA of high molecular weight purified from at least a few different human donors would be sheared, size-selected, and cloned into E. coli. Insert sizes would fall into two classes. Long inserts would be 5–20 kb in size and would be cloned into plasmid, phage, or possibly cosmid vectors. Short inserts would be 0.4–1.2 kb in size and would be cloned into plasmid vectors. Read lengths would be of sufficient magnitude so that the two sequence reads from the ends of the short inserts overlap. The ratio of long to short inserts would be �1. Standard, gel-based methods would be utilized to generate at least 30 billion nucleotides of raw sequence (10-fold coverage of the genome). Many laboratories throughout the world could participate in raw sequence generation, but all sequences
Genomics via Optical Mapping III: Contiging Genomic DNA and Variations (Extended Abstract)
, 1997
"... ) Thomas Anantharaman, Bud Mishra and David Schwartz 1 Abstract In this paper, we describe our algorithmic approach to constructing an alignment of (con- tiging) a set of optical maps created from the images of individual genomic DNA molecules digested by restriction enzymes. Generally, these DNA ..."
Abstract
-
Cited by 22 (17 self)
- Add to MetaCart
) Thomas Anantharaman, Bud Mishra and David Schwartz 1 Abstract In this paper, we describe our algorithmic approach to constructing an alignment of (con- tiging) a set of optical maps created from the images of individual genomic DNA molecules digested by restriction enzymes. Generally, these DNA segments are sized in the range of 1--4Mb. The problem of assembling clone contig maps is a simpler special case of this contig problem and is handled by our algorithms. The goal is to devise contiging algorithms capable of producing high-quality composite maps rapidly and in a scalable manner. The resulting software is a key component of our physical mapping automation tools and has been used routinely to create composite maps of various microorganisms (E. coli, P. falciparum and D. radioduran). The experimental results appear highly promising. 1 Introduction Single molecule approaches provide a new direction for characterizing structural and functional properties of individual DNA molec...
Physical Mapping by STS Hybridization: Algorithmic Strategies and the Challenge of Software Evaluation
- Journal of Computational Biology
, 1995
"... An important tool in the analysis of genomic sequences is the physical map. In this paper we examine the construction of physical maps from hybridization data between STS (sequence tag sites) probes and clones of genomic fragments. An algorithmic theory of the mapping process, a proposed performance ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
An important tool in the analysis of genomic sequences is the physical map. In this paper we examine the construction of physical maps from hybridization data between STS (sequence tag sites) probes and clones of genomic fragments. An algorithmic theory of the mapping process, a proposed performance evaluation procedure, and several new algorithmic strategies for mapping are given. A unifying theme for these developments is the idea of a "conservative extension." An algorithm, measure of algorithm quality, or description of physical map is a conservative extension if it is a generalization for data with errors of a corresponding concept in the error-free case. In our algorithmic theory we show that the nature of hybridization experiments imposes inherent limitations on the mapping information recorded in the experimental data. We prove that only certain types of mapping information can be reliably calculated by any algorithm. A test generator is then presented along with quantitative m...
AMASS: A Structured Pattern Matching Approach to Shotgun Sequence Assembly
- J. Comput. Biol
, 1999
"... In this paper, we propose an efficient, reliable shotgun sequence assembly algorithm based on a fingerprinting scheme that is robust to both noise and repetitive sequences in the data. Our algorithm uses exact matches of short patterns randomly selected from fragment data to identify fragment overla ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
In this paper, we propose an efficient, reliable shotgun sequence assembly algorithm based on a fingerprinting scheme that is robust to both noise and repetitive sequences in the data. Our algorithm uses exact matches of short patterns randomly selected from fragment data to identify fragment overlaps, construct an overlap map, and finally deliver a consensus sequence. We show how statistical clues made explicit in our approach can easily be exploited to correctly assemble results even in the presence of extensive repetitive sequences. Our approach is exceptionally fast in practice: e.g., we have successfully assembled a whole Mycoplasma genitalium genome (approximately 580 kbps) in roughly 8 minutes of 64MB 200MHz Pentium Pro CPU time from real shotgun data, where most existing algorithms can be expected to run for several hours to a day on the same data. Moreover, experiments with shotgun data synthetically prepared from real DNA sequences from a wide range of organisms (including h...

