Results 1  10
of
16
Four Strikes against Physical Mapping of DNA
 JOURNAL OF COMPUTATIONAL BIOLOGY
, 1993
"... Physical Mapping is a central problem in molecular biology ... and the human genome project. The problem is to reconstruct the relative position of fragments of DNA along the genome from information on their pairwise overlaps. We show that four simplified models of the problem lead to NPcomplete ..."
Abstract

Cited by 56 (8 self)
 Add to MetaCart
Physical Mapping is a central problem in molecular biology ... and the human genome project. The problem is to reconstruct the relative position of fragments of DNA along the genome from information on their pairwise overlaps. We show that four simplified models of the problem lead to NPcomplete decision problems: Colored unit interval graph completion, the maximum interval (or unit interval) subgraph, the pathwidth of a bipartite graph, and the kconsecutive ones problem for k >= 2. These models have been chosen to reflect various features typical in biological data, including false negative and positive errors, small width of the map and chimericism.
An Algorithm for Clustering cDNAs for Gene Expression Analysis
 In RECOMB99: Proceedings of the Third Annual International Conference on Computational Molecular Biology
, 1999
"... We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clusterin ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clustering with some provably good properties. The application that motivated this study was gene expression analysis, where a collection of cDNAs must be clustered based on their oligonucleotide fingerprints. The algorithm has been tested intensively on simulated libraries and was shown to outperform extant methods. It demonstrated robustness to high noise levels. In a blind test on real cDNA fingerprint data the algorithm obtained very good results. Utilizing the results of the algorithm would have saved over 70% of the cDNA sequencing cost on that data set. 1 Introduction Cluster analysis seeks grouping of data elements into subsets, so that elements in the same subset are in some sense more cl...
On the Complexity of DNA Physical Mapping
, 1994
"... The Physical Mapping Problem is to reconstruct the relative position of fragments (clones) of DNA along the genome from information on their pairwise overlaps. We show that two simplified versions of the problem belong to the class of NPcomplete problems, which are conjectured to be computationa ..."
Abstract

Cited by 40 (7 self)
 Add to MetaCart
The Physical Mapping Problem is to reconstruct the relative position of fragments (clones) of DNA along the genome from information on their pairwise overlaps. We show that two simplified versions of the problem belong to the class of NPcomplete problems, which are conjectured to be computationally intractable. In one version all clones have equal length, and in another, clone lengths may be arbitrary. The proof uses tools from graph theory and complexity.
Algorithms for Molecular Biology  Lecture 12
, 1999
"... this document we will briefly discuss several topics: ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
this document we will briefly discuss several topics:
Parameterized Complexity Analysis in Computational Biology
 Comput. Appl. Biosci
, 1995
"... Many computational problems in biology involve parameters for which a small range of values cover important applications. We argue that for many problems in this setting, parameterized computational complexity rather than NPcompleteness is the appropriate tool for studying apparent intractability. ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
Many computational problems in biology involve parameters for which a small range of values cover important applications. We argue that for many problems in this setting, parameterized computational complexity rather than NPcompleteness is the appropriate tool for studying apparent intractability. At issue in the theory of parameterized complexity is whether a problem can be solved in time O(n ff ) for each fixed parameter value, where ff is a constant independent of the parameter. In addition to surveying this complexity framework, we describe a new result for the Longest common subsequence problem. In particular, we show that the problem is hard for W [t] for all t when parameterized by the number of strings and the size of the alphabet. Lower bounds on the complexity of this basic combinatorial problem imply lower bounds on more general sequence alignment and consensus discovery problems. We also describe a number of open problems pertaining to the parameterized complexity of pro...
Construction of Physical Maps From Oligonucleotide Fingerprints Data
 J. of Computational Biology
, 1999
"... A new algorithm for the construction of physical maps from hybridization fingerprints of short oligonucleotide probes has been developed. Extensive simulations in highnoise scenarios show that the algorithm produces an essentially completely correct map in over 95% of trials. Tests for the infl ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
A new algorithm for the construction of physical maps from hybridization fingerprints of short oligonucleotide probes has been developed. Extensive simulations in highnoise scenarios show that the algorithm produces an essentially completely correct map in over 95% of trials. Tests for the influence of specific experimental parameters demonstrate that the algorithm is robust to both false positive and false negative experimental errors. The algorithm was also tested in simulations using real DNA sequences of E. coli, B. subtilis, M. tuberculosis, S. cerevisiae, C. elegans, and H. sapiens. To overcome the nonrandomness of probe frequencies in these sequences, probes were preselected based on sequence statistics and a screening process of the hybridization data was developed. With these modifications, the algorithm produced very encouraging results. A preliminary version of the paper is to appear in Proc. RECOMB 99. y Department of Computer Science, Sackler Faculty of Ex...
Use of high coverage reference libraries of Drosophila melanogaster for relational data analysis; a step towards mapping and sequencing of the genome
 J. Mol. Biol
, 1991
"... Three differently made, primary Drosophila cosmid libraries of 16fold genome coverage have been generated. Also, a jumping library has been created by a new method that, takes advantage of methylation differences between genomic DNA and vector. Thirdly. two cDNA libraries have been picked. All thes ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Three differently made, primary Drosophila cosmid libraries of 16fold genome coverage have been generated. Also, a jumping library has been created by a new method that, takes advantage of methylation differences between genomic DNA and vector. Thirdly. two cDNA libraries have been picked. All these libraries have been arrayed on highdensity in situ filters, each containing 9216 clones. As a reference system, such filters are distributed and identified clones are provided. Singlecopy probes have identified on average 1.4 cosmids per genome equivalent. Together with cytogenetically mapped yeast artificial chromosomes, the libraries are also being used for physically mapping the genome, mainly by oligonucleotide fingerprinting and pool hybridizations. cDNA clones are further examined by a partial sequencing analysis by oligomer hybridization. Ke!ywords: Drosophila melanogaster; genome; reference libraries: hybridization; mapping 1.
Query Driven Simulation As A Tool For Genetic Engineers
 Proceedings of the International Conference on Simulation in Engineering Education
, 1992
"... Simulations/animations of genetic structures and functions, simulations of actual or conceived experiments, and animations of algorithms such as simulated annealing, which is used to reconstruct a chromosome from its clonable DNA fragments, will be useful to genetics researchers and students alike. ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
Simulations/animations of genetic structures and functions, simulations of actual or conceived experiments, and animations of algorithms such as simulated annealing, which is used to reconstruct a chromosome from its clonable DNA fragments, will be useful to genetics researchers and students alike. In this paper, we discuss the design of an integrated simulation/objectoriented database system that can be used by genetic engineers to understand better and to visualize the objects of their study as well as their experimental procedures. Such a system can provide a solid foundation for ComputerAided Genetic Engineering (CAGE). 1. INTRODUCTION Storing genome mapping information on organisms is currently the major unsolved problem of the Human Genome Initiative. Relational databases are beginning to be used to store the vast amount of genetic information that is being collected [Cuti91b] and [Rudd90, Rudd91]. We are currently designing an objectoriented database to store such informati...
A Clustering Algorithm for Interval Graph Test on Noisy Data
"... Abstract. An interval graph is the intersection graph of a collection of intervals. One important application of interval graph is physical mapping in genome research, that is, to reassemble the clones to determine the relative position of fragments of DNA along the genome. The linear time algorithm ..."
Abstract
 Add to MetaCart
Abstract. An interval graph is the intersection graph of a collection of intervals. One important application of interval graph is physical mapping in genome research, that is, to reassemble the clones to determine the relative position of fragments of DNA along the genome. The linear time algorithm by Booth and Lueker (1976) for this problem has a serious drawback: the data must be errorfree. However, laboratory work is never flawless. We devised a new iterative clustering algorithm for this problem, which can accommodate noisy data and produce a likely interval model realizing the original graph. 1.