Results 1 - 10
of
17
Four Strikes against Physical Mapping of DNA
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 1993
"... Physical Mapping is a central problem in molecular biology ... and the human genome project. The problem is to reconstruct the relative position of fragments of DNA along the genome from information on their pairwise overlaps. We show that four simplified models of the problem lead to NP-complete ..."
Abstract
-
Cited by 46 (8 self)
- Add to MetaCart
Physical Mapping is a central problem in molecular biology ... and the human genome project. The problem is to reconstruct the relative position of fragments of DNA along the genome from information on their pairwise overlaps. We show that four simplified models of the problem lead to NP-complete decision problems: Colored unit interval graph completion, the maximum interval (or unit interval) subgraph, the pathwidth of a bipartite graph, and the k-consecutive ones problem for k >= 2. These models have been chosen to reflect various features typical in biological data, including false negative and positive errors, small width of the map and chimericism.
On the Complexity of DNA Physical Mapping
, 1994
"... The Physical Mapping Problem is to reconstruct the relative position of fragments (clones) of DNA along the genome from information on their pairwise overlaps. We show that two simplified versions of the problem belong to the class of NP-complete problems, which are conjectured to be computationa ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
The Physical Mapping Problem is to reconstruct the relative position of fragments (clones) of DNA along the genome from information on their pairwise overlaps. We show that two simplified versions of the problem belong to the class of NP-complete problems, which are conjectured to be computationally intractable. In one version all clones have equal length, and in another, clone lengths may be arbitrary. The proof uses tools from graph theory and complexity.
Human whole-genome shotgun sequencing
- Genome Research
, 1997
"... Large-scale sequencing of the human genome is now under way (Boguski et al. 1996; Marshall and Pennisi 1996). Although at the beginning of the Genome Project, many doubted the scientific value of sequencing the entire human genome, these doubts have evaporated almost entirely (Gibbs 1995; Olson 1995 ..."
Abstract
-
Cited by 25 (2 self)
- Add to MetaCart
Large-scale sequencing of the human genome is now under way (Boguski et al. 1996; Marshall and Pennisi 1996). Although at the beginning of the Genome Project, many doubted the scientific value of sequencing the entire human genome, these doubts have evaporated almost entirely (Gibbs 1995; Olson 1995). Primary reasons for generating the human genomic sequence are listed in Table 1. The approach being taken for human genomic sequencing is the same as that used for the Saccharomyces cerevisiae and Caenorhabditis elegans genomes, namely construction of overlapping arrays of large insert Escherichia coli clones, followed by complete sequencing of these clones one at a time. In this article, we outline an alternative approach to sequencing the human and other large genomes, which we argue is less costly and more informative than the clone-by-clone approach. A Plan for Human Whole-Genome Shotgun Sequencing Although there are many conceivable variations, the crux of our plan involves high-quality, semiautomated sequencing from both ends of very large numbers of randomly selected human genomic DNA fragments. DNA of high molecular weight purified from at least a few different human donors would be sheared, size-selected, and cloned into E. coli. Insert sizes would fall into two classes. Long inserts would be 5–20 kb in size and would be cloned into plasmid, phage, or possibly cosmid vectors. Short inserts would be 0.4–1.2 kb in size and would be cloned into plasmid vectors. Read lengths would be of sufficient magnitude so that the two sequence reads from the ends of the short inserts overlap. The ratio of long to short inserts would be �1. Standard, gel-based methods would be utilized to generate at least 30 billion nucleotides of raw sequence (10-fold coverage of the genome). Many laboratories throughout the world could participate in raw sequence generation, but all sequences
Beyond Islands: Runs in Clone-Probe Matrices (extended abstract)
- Proceedings of the 1st ACM Conference on Computational Molecular Biology, 320--329
, 1997
"... Physical mapping is a fundamental component of the human genome project. A physical map consists of a set of probes which mark unique positions on a long fragment of DNA, together with the relative order of the probes on the DNA. This order is inferred from clone-probe hybridization experiments, whi ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Physical mapping is a fundamental component of the human genome project. A physical map consists of a set of probes which mark unique positions on a long fragment of DNA, together with the relative order of the probes on the DNA. This order is inferred from clone-probe hybridization experiments, which determine the probes contained within various fragments of the genome. In practice, the order of the probes is not completely determined by the hybridization experiments. To better design these experiments, researchers have analyzed the expected distribution of "islands" --- groups of probes which are known to be near one another --- that would result from hybridization experiments with different numbers of clones and probes. In this paper we analyze the distribution of "runs" --- groups of probes whose relative order is completely determined by the hybridization experiment. We include analytic, numerical, Monte Carlo, and simulation results on runs, which can further assist in the design...
GD: A web server for performing electronic PCR
- Nucleic Acids Res
"... ‘Electronic PCR ’ (e-PCR) refers to a computational procedure that is used to search DNA sequences for sequence tagged sites (STSs), each of which is defined by a pair of primer sequences and an expected PCR product size. To gain speed, our implementation extracts short ‘words ’ from the 30 end of e ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
‘Electronic PCR ’ (e-PCR) refers to a computational procedure that is used to search DNA sequences for sequence tagged sites (STSs), each of which is defined by a pair of primer sequences and an expected PCR product size. To gain speed, our implementation extracts short ‘words ’ from the 30 end of each primer and stores them in a sorted hash table that can be accessed efficiently during the search. One recent improvement is the use of overlapping discontinuous words to allow matches to be found despite the presence of a mismatch. Moreover, it is possible to allow gaps in the alignment between the primer and the sequence. The effect of these changes is to improve sensitivity without significantly affecting specificity. The new software provides a search mode using a query STS against a sequence database to augment the previously available mode using a query sequence against an STS database. Finally, e-PCR may now be used through a web service, with search results linked to other web resources such as the UniSTS database and the MapViewer genome browser. The e-PCR web server may be found at www.ncbi.nlm.nih.gov/sutils/ e-pcr.
Cloning and characterization of HARP/SMARCAL1: a prokaryotic HepA-related SNF2 helicase protein from human and mouse
- Genomics
, 2000
"... The SNF2 gene family consists of a large group of proteins involved in transcriptional regulation, maintenance of chromosome integrity, and various aspects of DNA repair. We cloned a novel SNF2 family human cDNA, with sequence identity to the Escherichia coli RNA polymerase-binding protein HepA and ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The SNF2 gene family consists of a large group of proteins involved in transcriptional regulation, maintenance of chromosome integrity, and various aspects of DNA repair. We cloned a novel SNF2 family human cDNA, with sequence identity to the Escherichia coli RNA polymerase-binding protein HepA and named the human hepA-related protein (HHARP/SMARCAL1). In addition, the mouse ortholog (Mharp/Smarcal1) was cloned, and the Caenorhabditis elegans ortholog (CE-HARP) was identified in the GenBank database. Phylogenetic analysis indicates that the HARP proteins share a high level of sequence similarity to the seven motif helicase core region (SNF2 domain) with identifiable orthologs in other eukaryotic species, except for yeast. Purified His-tagged HARP/SMARCAL1 protein exhibits single-stranded DNA-dependent ATPase activity, consistent with it being a member of the SNF2 family of proteins. Both the human and the mouse genes consist of 17 exons and 16 introns. The human gene maps to chromosome 2q34–q36, and the mouse gene is localized to the syntenic region of chromosome 1 (between markers Gls and Acrg). HARP/SMARCAL1 transcripts are ubiquitously expressed in human and mouse tissues, with testis presenting the highest levels of mRNA expression in humans. © 2000 Academic Press
L.S.: Genome identification and classification by short oligo arrays
- In: Proceedings of the Fourth Annual Workshop on Algorithms in Bioinformatics. (2004
"... Abstract. We explore the problem of designing oligonucleotides that help locate organisms along a known phylogenetic tree. We develop a suffix-tree based algorithm to find such short sequences efficiently. Our algorithm requires O(Nm) time and O(N) space in the worst case where m is the number of th ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We explore the problem of designing oligonucleotides that help locate organisms along a known phylogenetic tree. We develop a suffix-tree based algorithm to find such short sequences efficiently. Our algorithm requires O(Nm) time and O(N) space in the worst case where m is the number of the genomes classified by the phylogeny and N is their total length. We implemented our algorithm and used it to find these discriminating sequences in both small and large phylogenies. We believe our algorithm will have wide applications including: high-throughput classification and identification, oligo array design optimally differentiating genes in gene families, and markers for closely related strains and populations. It will also have scientific significance as a new way to assess the confidence in a given classification. 1
PERSPECTIVE
"... The human genome project is entering its decisive final phase, in which the genome sequence will be determined in large-scale efforts in multiple laboratories worldwide. A number of sequencing groups are in the process of scaling up their throughput; over the next few years they will need to attain ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The human genome project is entering its decisive final phase, in which the genome sequence will be determined in large-scale efforts in multiple laboratories worldwide. A number of sequencing groups are in the process of scaling up their throughput; over the next few years they will need to attain a collective capacity approaching half a gigabase per year to complete the 3-Gb genome sequence by the target date of 2005. At present, all contributing groups are using a clone-by-clone approach, in which mapped bacterial clones (typically 40–400 kb in size) from known chromosomal locations are sequenced to completion. Among other advantages, this permits a variety of alternative sequencing strategies and methods to be explored independently without redundancy of effort. Although it is not too late to consider implementing a different approach, any such approach must have as high a probability of success as the current one and offer significant advantages (such as decreased cost). I argue here that the whole-genome shotgun proposed by Weber and Myers satisfies neither condition. Clone-by-Clone Sequencing For purposes of comparison it is helpful to first outline a specific implementation of clone-by-clone sequencing. Although by no means the only one possible, this implementation is being used by several of the larger groups and seems likely to be the method of choice for the major part of the genome. One starts with a set of mapped sequence-tagged sites (STSs) (Olson et al. 1989) from a particular chromosomal region. These are screened against a bacterial artificial chromosome (BAC) (or other large bacterial clone) library (Kim et al. 1996) to obtain overlapping clusters of clones from that region. Since whole-genome mapping efforts are nearing the target density of 1 STS per 100 kb [Hudson et al.
Liver X Receptor Activation Enhances Cholesterol Loss from the Brain, Decreases Neuroinflammation, and Increases Survival of the NPC1 Mouse
, 2008
"... You might find this additional information useful... This article cites 47 articles, 21 of which you can access free at: ..."
Abstract
- Add to MetaCart
You might find this additional information useful... This article cites 47 articles, 21 of which you can access free at:
Theoretical Population Biology 61, 349--363 (2002)
, 2002
"... this paper, we present a general maximumlikelihood -based algorithm for simultaneously estimating linkage and linkage phases for a mixed set of different marker types containing fully informative markers (segregating 1:1:1:1) and partially informative markers (or missing markers, segregating 1:2: ..."
Abstract
- Add to MetaCart
this paper, we present a general maximumlikelihood -based algorithm for simultaneously estimating linkage and linkage phases for a mixed set of different marker types containing fully informative markers (segregating 1:1:1:1) and partially informative markers (or missing markers, segregating 1:2:1, 3:1, and 1:1) in a full-sib family derived from two outbred parent plants. The characterization of linkage phases is based on the posterior probability distribution of the assignment of alternative alleles at given markers to two homologous chromosomes of each parent, conditional on the observed phenotypes of the markers. Two- and multi-point analyses are performed to estimate the recombination fraction and determine the most likely linkage phase between different types of markers. A numerical example is presented to demonstrate the statistical properties of the model for characterizing the linkage phase between markers. & 2002 Elsevier Science (USA) Key Words: EM algorithm; linkage phase; outcrossing species; partially informative marker; posterior probability; recombination fraction 1.

