Results 1 -
7 of
7
Large scale sequencing by hybridization
- J. of Computational Biology
, 2002
"... Sequencing by Hybridization is a method for reconstructing a DNA sequence based on its k-mer content. This content, called the spectrum of the sequence, can be obtained from hybridization with a universal DNA chip. However, even with a sequencing chip containing all 4 9 9-mers and assuming no hybrid ..."
Abstract
-
Cited by 15 (7 self)
- Add to MetaCart
Sequencing by Hybridization is a method for reconstructing a DNA sequence based on its k-mer content. This content, called the spectrum of the sequence, can be obtained from hybridization with a universal DNA chip. However, even with a sequencing chip containing all 4 9 9-mers and assuming no hybridization errors, only about 400 bases-long sequences can be reconstructed unambiguously. Drmanac et al. suggested sequencing long DNA targets by obtaining spectra of many short overlapping fragments of the target, inferring their relative positions along the target and then computing spectra of subfragments that are short enough to be uniquely recoverable. Drmanac et al. do not treat the realistic case of errors in the hybridization process. In this paper we study the effect of such errors. We show that the probability of ambiguous reconstruction in the presence of (false negative) errors is close to the probability in the errorless case. More precisely, the ratio between these probabilities is 1 + O(p/(1 − p) 4 · 1/d) where d is the average length of subfragments, and p is the probability of a false negative. We also obtain lower and upper bounds for the probability of unambiguous reconstruction based on errorless spectrum. For realistic chip sizes, these bounds are tighter than those given by Arratia et al. Finally, we report results on simulations with real DNA sequences, showing that even in the presence of 50 % false negative errors, a target of cosmid length can be recovered with less than 0.1 % miscalled bases. 1
Handling Long Targets and Errors in Sequencing by Hybridization
- In Proc. 6th Annual International Conference on Computational Molecular Biology (RECOMB '02
, 2003
"... Sequencing by hybridization (SBH) is a DNA sequencing technique, in which the sequence is reconstructed using its k-mer content. This content, which is called the spectrum of the sequence, is obtained by hybridization to a universal DNA array. Standard universal arrays contain all k-mers for some # ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Sequencing by hybridization (SBH) is a DNA sequencing technique, in which the sequence is reconstructed using its k-mer content. This content, which is called the spectrum of the sequence, is obtained by hybridization to a universal DNA array. Standard universal arrays contain all k-mers for some # xed k, typically 8 to 10. Currently, in spite of its promise and elegance, SBH is not competitive with standard gel-based sequencing methods. This is due to two main reasons: lack of tools to handle realistic levels of hybridization errors and an inherent limitation on the length of uniquely reconstructible sequence by standard universal arrays. In this paper, we deal with both problems. We introduce a simple polynomial reconstruction algorithm which can be applied to spectra from standard arrays and has provable performance in the presence of both false negative and false positive errors. We also propose a novel design of chips containing universal bases that differs from the one proposed by Preparata et al. (1999). We give a simple algorithm that uses spectra from such chips to reconstruct with high probability random sequences of length lower only by a squared log factor compared to the information theoretic bound. Our algorithm is very robust to errors and has a provable performance even if there are both false negative and false positive errors. Simulations indicate that its sensitivity to errors is also very small in practice.
Bounds for resequencing by hybridization
- In Proc. 3rd Workshop on Algorithms in Bioinformatics (WABI '03), LNCS 2812
, 2003
"... We study the problem of finding the sequence of an unknown DNA fragment given the set of its k-long subsequences and a homologous sequence, namely a sequence that is similar to the target sequence. Such a sequence is available in some applications, e.g., when detecting single nucleotide polymorphism ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
We study the problem of finding the sequence of an unknown DNA fragment given the set of its k-long subsequences and a homologous sequence, namely a sequence that is similar to the target sequence. Such a sequence is available in some applications, e.g., when detecting single nucleotide polymorphisms. Pe’er and Shamir studied this problem and presented a heuristic algorithm for it. In this paper, we give an algorithm with provable performance: We show that under some assumptions, the algorithm can reconstruct a random sequence of length O(4 k) with high probability. We also show that no algorithm can reconstruct sequences of length Ω(log k · 4 k). 1
Sequencing by hybridization in few rounds
- In Proc. ESA '03
, 2003
"... Sequencing by Hybridization (SBH) is a method for reconstructing an unknown DNA string based on substring queries: Using hybridization experiments, one can determine for each string in a given set of strings, whether the string appears in the target string, and use this information to reconstruct th ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Sequencing by Hybridization (SBH) is a method for reconstructing an unknown DNA string based on substring queries: Using hybridization experiments, one can determine for each string in a given set of strings, whether the string appears in the target string, and use this information to reconstruct the target string. We study the problem when the queries are performed in rounds, where the queries in each round depend on the answers to the queries in the previous rounds. We give an algorithm that can reconstruct almost all strings of length n using 2 rounds with O(n log α n / log α log α n) queries per round, and an algorithm that uses log ∗ α n − Ω(1) rounds with O(n) queries per round, where α is the size of the alphabet. We also consider a variant of the problem in which for each substring query, the answer is whether the string appears once in the target, appears at least twice in the target, or does not appear in the target. For this problem, we give an algorithm that uses 3 rounds of O(n) queries. In all our algorithms, the lengths of the query strings are Θ(log α n). Our results improve the previous results of Margaritis and Skiena [17] and Frieze and Halldórsson [10]. 1
Optimal probing patterns for sequencing by hybridization
- In: Proc. 6th Workshop on Algorithms in Bioinformatics (WABI). Volume 4175 of LNCS. (2006) 366–375
"... Sequencing by Hybridization (SBH) is a method for reconstructing a DNA sequence based on its k-mer content. This content, called the spectrum of the sequence, can be obtained from hybridization with a universal DNA chip. The main shortcoming of SBH is that it reliably reconstructs only sequences of ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Sequencing by Hybridization (SBH) is a method for reconstructing a DNA sequence based on its k-mer content. This content, called the spectrum of the sequence, can be obtained from hybridization with a universal DNA chip. The main shortcoming of SBH is that it reliably reconstructs only sequences of length at most square root of the size of the chip. Frieze et al. [9] showed that by using gapped probes, SBH can reconstruct sequences with length that is linear in the size of the chip. In this work we investigate the optimal placement of the gaps in the probes, and give an algorithm for finding nearly optimal gap placement. Using our algorithm, we obtain a chip design which is more efficient than the chip of Frieze et al. 1
DNA sequencing by hybridization using semi-degenerate bases
- J. of Computational Biology
, 2004
"... One way to enhance the performance of hybridization microarrrays for DNA de novo sequencing is the use of probing patterns with gaps of unsampled positions. Ideally, such gaps could be realized by the inclusion into microarray oligos (probes) of wild-card compounds, referred to as universal bases (w ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
One way to enhance the performance of hybridization microarrrays for DNA de novo sequencing is the use of probing patterns with gaps of unsampled positions. Ideally, such gaps could be realized by the inclusion into microarray oligos (probes) of wild-card compounds, referred to as universal bases (which bind nonspecifically to natural bases). The suggested alternative is to deploy in the gap positions degenerate bases, i.e., uniform mixtures of the four natural bases, with ensuing deterioration of the hybridization signal. In this paper, we show that such signal loss is a minor shortcoming, compared with the fact that degenerate bases cannot be treated as universal. Indeed, the substantial spread of hybridization energies at any microarray feature is such that on overwhelming number of mismatches bind more strongly than legal matches. We observed, however, that much narrower energy spreads are exhibited by pairs of bases in the same strength class (A-T and C-G). We call semi-degenerate a gap position realized with bases in the same energy class and show that well-known sequence reconstruction algorithms can be modified to achieve substantial improvements in sequencing effectiveness. For example, with a 4 9-feature microarray and an acceptable weakening of the hybridization signal, one may achieve lengths of about 4,000 bases (compared with < 250 of the standard uniform method). Our approach also incorporates the use of a spectrum expressed in terms of observed feature melting temperatures (analog spectrum), rather than binary decisions made directly at the biochemical level (digital spectrum). While universal bases represent the ultimate goal of sequencing by hybridization, semidegenerate natural bases are the most effective known substitute.
L.S.: Genome identification and classification by short oligo arrays
- In: Proceedings of the Fourth Annual Workshop on Algorithms in Bioinformatics. (2004
"... Abstract. We explore the problem of designing oligonucleotides that help locate organisms along a known phylogenetic tree. We develop a suffix-tree based algorithm to find such short sequences efficiently. Our algorithm requires O(Nm) time and O(N) space in the worst case where m is the number of th ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Abstract. We explore the problem of designing oligonucleotides that help locate organisms along a known phylogenetic tree. We develop a suffix-tree based algorithm to find such short sequences efficiently. Our algorithm requires O(Nm) time and O(N) space in the worst case where m is the number of the genomes classified by the phylogeny and N is their total length. We implemented our algorithm and used it to find these discriminating sequences in both small and large phylogenies. We believe our algorithm will have wide applications including: high-throughput classification and identification, oligo array design optimally differentiating genes in gene families, and markers for closely related strains and populations. It will also have scientific significance as a new way to assess the confidence in a given classification. 1

