Results 1 - 10
of
29
Toward Simplifying and Accurately Formulating Fragment Assembly
- JOURNAL OF COMPUTATIONAL BIOLOGY
, 1995
"... The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequence ..."
Abstract
-
Cited by 30 (1 self)
- Add to MetaCart
The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the 2-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a non-cyclic subgraph with certain properties and the objectives of being shortest or maximally-likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is ...
Whole-Genome DNA-Sequencing
- IEEE Computational Engineering and Science
, 1999
"... this article describes three current approaches for completing the sequencing. ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
this article describes three current approaches for completing the sequencing.
Clonal variation in cell surface display of an H-2 protein lacking a cytoplasmic tail
- J. Cell Biol. 102:1-10. The Journal of Cell Biology
, 1986
"... Abstract. Truncated variants of the gene encoding H-2L d, an integral membrane protein encoded by the major histocompatibility complex, were constructed by in vitro mutagenesis to elucidate the function of charged amino acids found on the cytoplasmic side of the transmembrane (TM) region. Analysis o ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Abstract. Truncated variants of the gene encoding H-2L d, an integral membrane protein encoded by the major histocompatibility complex, were constructed by in vitro mutagenesis to elucidate the function of charged amino acids found on the cytoplasmic side of the transmembrane (TM) region. Analysis of cloned L cells transfected with these genes shows that the seven amino acids following the TM segment, four of which are basic, enhance the cell surface expression of H-2L d protein but are not required for it. However, some clones do not express a tailless H-2L ~ protein on the cell surface but express it intracellularly where it has a long half-life. Turnover measurements on cell surface H-2L d proteins suggest that the basic residues following the TM segment are not a "stop transfer"
Improving the Quality of Automatic DNA Sequence Assembly using Fluorescent Trace-Data Classifications
- Proceedings, Fourth International Conference on Intelligent Systems for Molecular Biology
, 1996
"... Virtually all large-scale sequencing projects use automatic sequence-assembly programs to aid in the determination of DNA sequences. The computer-generated assemblies require substantial hand-editing to transform them into submissions for GenBank. As the size of sequencing projects increases, i ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Virtually all large-scale sequencing projects use automatic sequence-assembly programs to aid in the determination of DNA sequences. The computer-generated assemblies require substantial hand-editing to transform them into submissions for GenBank. As the size of sequencing projects increases, it becomes essential to improve the quality of the automated assemblies so that this timeconsuming hand-editing may be reduced. Current ABI sequencing technology uses base calls made from fluorescently-labeled DNA fragments run on gels. We present a new representation for the fluorescent trace data associated with individual base calls. This representation can be used before, during, and after fragment assembly to improve the quality of assemblies. We demonstrate one such use -- end-trimming of sub-optimal data -- that results in a significant improvement in the quality of subsequent assemblies. Introduction A fundamental goal of the Human Genome Project is to determine the seque...
An experimental study of SB-trees
- In ACM-SIAM symposium on Discrete Algorithms
, 1996
"... In a previous work of ours [13], we proposed a text indexing data structure for external memory, which we called SB-tree, that combines the best B-tree and suffix array qualities to overcome the limitations of inverted files, suffix arrays, suffix trees, and prefix B-trees. In this paper, we study t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In a previous work of ours [13], we proposed a text indexing data structure for external memory, which we called SB-tree, that combines the best B-tree and suffix array qualities to overcome the limitations of inverted files, suffix arrays, suffix trees, and prefix B-trees. In this paper, we study the performance of SB-trees in a practical setting by running a large number of searching and updating experiments. We obtain fast practical performance by means of a new space-efficient and alphabet-independent organization of SB-tree nodes and a new batch insertion procedure that avoids thrashing. 1 Introduction Textual data in electronic form are more available than before and range from published documents (e.g., electronic dictionaries, libraries and archives, etc.) to private databases (e.g., marketing information, legal records, medical histories, etc.). Online providers of legal and newswire texts (such as Westlaw and Lexis-Nexis) already have hundreds of text gigabytes and will have...
A Heuristic Managing Errors for DNA Sequencing
- Bioinformatics
, 2002
"... this paper the new method for rebuilding sequences from a set of oligonucleotides with the aim of managing both positive and negative errors has been proposed. This method is simple and fast, and behaves surprisingly well when the length of the oligonucleotides is large enough to ensure that only a ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
this paper the new method for rebuilding sequences from a set of oligonucleotides with the aim of managing both positive and negative errors has been proposed. This method is simple and fast, and behaves surprisingly well when the length of the oligonucleotides is large enough to ensure that only a few of them accept more than one immediate successor. Indeed, the main drawback of the current method remains the choice of the successor. Nevertheless, the method seems to be particularly well suited for detecting both kinds of errors and its improvement by incorporating a tabu search procedure for the choice of the successor when several \good candidates" are available, is planned
Sequencing by Hybridization: An Enhanced Crossover Operator for a Hybrid Genetic Algorithm
"... This paper presents a genetic algorithm for an important computational biology problem. The prob-lem appears in the computational part of a new proposal for DNA sequencing denominated sequencing by hybridization. The general usage of this method for real sequencing purposes depends mainly on the dev ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper presents a genetic algorithm for an important computational biology problem. The prob-lem appears in the computational part of a new proposal for DNA sequencing denominated sequencing by hybridization. The general usage of this method for real sequencing purposes depends mainly on the development of good algorithmic procedures solving its computational phase. The proposed genetic al-gorithm is a modified version of a previously proposed hybrid genetic algorithm for the same problem. It is compared with two well suited meta-heuristic approaches reported in the literature: the hybrid genetic algorithm, which is the origin of our proposed variant, and a tabu-scatter search algorithm. Experimental results carried out on real DNA data show the advantages of using the proposed algorithm. Furthermore, statistical tests confirm the superiority of the proposed variant over the state-of-the-art heuristics.
New methods for detection of low levels of DNA damage in human populations. Environ. Health Perspect. 48
, 1983
"... The use of a postlabeling method to characterize and to detect infrequent base modifications in DNA is outlined. This method has the advantage that low levels of DNA modifications, approximately 1 modified base per 105 nucleotides, can be detected. Moreover, a broad spectrum of modification can be i ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The use of a postlabeling method to characterize and to detect infrequent base modifications in DNA is outlined. This method has the advantage that low levels of DNA modifications, approximately 1 modified base per 105 nucleotides, can be detected. Moreover, a broad spectrum of modification can be identified by using this methodology. The basis for the method involves transfer of a radioactive phosphate from the y position of ATP to the 5'-hydroxyl terminus of 3'-phosphoryl nucleotides that are derived from modified DNA by appropriate nuclease digestion. The second method involves use of a defined DNA sequence within human cells. The a sequence is used as a probe for DNA damage to specific nucleotides. The a DNA sequence is reiterated approximately 300,000 times in the human genome and exists in tandem arrays. It comprises approximately 1 % of the entire genome. The reiterated sequence is sufficiently homogeneous to permit its use as a probe for a site specific in DNA damage. Examples of the application of both of these methodologies to DNA damage inflicted in human cells by chemicals and ultraviolet light are provided.
A Learning Algorithm for String Assembly
- Workshop on Data Mining in Bioinformatics BIOKDD, 7th International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD
, 2001
"... We present a supervised learning approach to DNA shotgun sequencing. The oracle (supervisor) is a set of already-sequenced DNA strands; the output of the learning process is a domain-specific algorithm for sequence assembly. Our goal is to learn a fast algorithm for a given problem domain. Our ap ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a supervised learning approach to DNA shotgun sequencing. The oracle (supervisor) is a set of already-sequenced DNA strands; the output of the learning process is a domain-specific algorithm for sequence assembly. Our goal is to learn a fast algorithm for a given problem domain. Our approach is to begin with a parameterized form of a sequencing algorithm and to then learn the optimal parameter values, numerical and combinatorial, for the given domain of interest.
Designing and Testing a New DNA Fragment Assembler VEDA-2
"... We present VEDA-2, a redesigned version of the DNA fragment assembler VEDA. VEDA-2 covers all stages of the assembly process, from sequencing the input fragments into collection of contigs, to reordering and orienting contigs, based on the mate-pair information. Like its predecessor, VEDA2 is a ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present VEDA-2, a redesigned version of the DNA fragment assembler VEDA. VEDA-2 covers all stages of the assembly process, from sequencing the input fragments into collection of contigs, to reordering and orienting contigs, based on the mate-pair information. Like its predecessor, VEDA2 is a generic procedure with several "open" numeric and algorithmic parameters that are "learned" by a learning meta-algorithm L-VEDA through post-processing of previously sequenced DNAs. Our experiments are performed on two types of input data: real, comprised of the system of Anthrax fragments that was made public on the TIGR site; and synthetic, comprised of systems of fragments generated by the program frag, which is applied to real DNA strings. The latter includes a frag-generated input formed from the Anthrax DNA. Testing on diverse DNA sequences of lengths of up to 5 million base pairs shows that VEDA-2 correctly assembles approximately 97% of the DNA. According to our experiments, VEDA2 correctly restores the order of the contigs and determines the lengths of the gaps between them within 5% of the true answer.

