#### DMCA

## A greedy algorithm for aligning DNA sequences (2000)

Venue: | J. COMPUT. BIOL |

Citations: | 584 - 16 self |

### Citations

8147 |
Basic local alignment search tool
- Altschul, Gish, et al.
- 1990
(Show Context)
Citation Context ...e region, such as a mono- or dinucleoide repeat. In addition to these dif� culties, there is the question of how to determine when the end of the similar region has been reached. The X-drop approach (=-=Altschul et al., 1990-=-; Altschul et al., 1997; Zhang et al., 1998a,b) provides a rather natural solution to these problems. The width of the region being searched, i.e., the number of adjacent diagonals, expands at regions... |

1649 | Base-calling of automated sequencer traces using phred. II. error probabilities
- Ewing, Green
- 1998
(Show Context)
Citation Context ... to penalize an indel about the same as, or slightly more than, a replacement. This seems consistent with published � gures on the rates of actual errors in both single-pass (low accuracy) sequences (=-=Ewing et al., 1998-=-; Hillier et al., 1996) and high quality data. On the other hand, it is widely appreciated that dynamic-programming alignment algorithms can guarantee a theoretically optimal alignment under a wide va... |

784 |
Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence.
- Cole, Brosch, et al.
- 1998
(Show Context)
Citation Context ...eriment that we performed, using a Blast-family tool that permits optional use of the greedy algorithm. We used that program to align the genomic sequence of Mycobacterium tuberculosis, strain H37Rv (=-=Cole et al., 1998-=-), with the sequence being generated at The Institute for Genomic Research from another M. tuberculosis strain that is 99% identical. A need to compare these sequences motivated others (Delcher et al.... |

557 |
An improved algorithm for matching biological sequences.
- Gotoh
- 1982
(Show Context)
Citation Context ...the “gap-open penalty.” A highest-scoring alignment of sequences A and B can be computed in time proportional to the product of the sequencesA GREEDY ALGORITHM FOR ALIGNING DNA SEQUENCES 213 lengths (=-=Gotoh, 1982-=-) by a dynamic programming algorithm that is equivalent to � nding an optimal path in a graph that has three vertices per grid-point (i, j) (Myers and Miller, 1989a). Let mat, mis, ind and a be intege... |

339 |
A computer program for aligning a cDNA sequence with a genomic DNA sequence.
- Florea, Hartzell, et al.
- 1998
(Show Context)
Citation Context ...computer scientists (Miller and Myers, 1985; Myers, 1986; Ukkonen, 1985; Wu et al., 1990). It has also been adapted to build several practical programs for comparing DNA sequences (Chao et al., 1997; =-=Florea et al., 1998-=-). The main contribution of this paper is a theoretically sound method for pruning the search region. Returning to the above scenario, where we explore three adjacent diagonals, we see that two types ... |

225 |
Algorithms for approximate string matching.
- Ukkonen
- 1985
(Show Context)
Citation Context ...hmetic involving scoring parameters. The faster approach, which we call the greedy algorithm, has been generalized and studied extensively by computer scientists (Miller and Myers, 1985; Myers, 1986; =-=Ukkonen, 1985-=-; Wu et al., 1990). It has also been adapted to build several practical programs for comparing DNA sequences (Chao et al., 1997; Florea et al., 1998). The main contribution of this paper is a theoreti... |

216 | Alignment of whole genomes.
- AL, Kasif, et al.
- 1999
(Show Context)
Citation Context ...e et al., 1998), with the sequence being generated at The Institute for Genomic Research from another M. tuberculosis strain that is 99% identical. A need to compare these sequences motivated others (=-=Delcher et al., 1999-=-) to develop an alignment program capable of handling complete bacterial genomes. The H37Rv sequence is 4,441,529 nucleotides, while the other sequence is roughly the same length, though at the time w... |

212 | An O(ND) difference algorithm and its variations.
- Myers
- 1986
(Show Context)
Citation Context ...on using arithmetic involving scoring parameters. The faster approach, which we call the greedy algorithm, has been generalized and studied extensively by computer scientists (Miller and Myers, 1985; =-=Myers, 1986-=-; Ukkonen, 1985; Wu et al., 1990). It has also been adapted to build several practical programs for comparing DNA sequences (Chao et al., 1997; Florea et al., 1998). The main contribution of this pape... |

79 | A File Comparison Program
- Miller, Myers
(Show Context)
Citation Context ...ith a three-way comparison using arithmetic involving scoring parameters. The faster approach, which we call the greedy algorithm, has been generalized and studied extensively by computer scientists (=-=Miller and Myers, 1985-=-; Myers, 1986; Ukkonen, 1985; Wu et al., 1990). It has also been adapted to build several practical programs for comparing DNA sequences (Chao et al., 1997; Florea et al., 1998). The main contribution... |

73 |
Approximate matching of regular expressions.
- Myers, Miller
- 1989
(Show Context)
Citation Context ...GORITHM FOR ALIGNING DNA SEQUENCES 213 lengths (Gotoh, 1982) by a dynamic programming algorithm that is equivalent to � nding an optimal path in a graph that has three vertices per grid-point (i, j) (=-=Myers and Miller, 1989-=-a). Let mat, mis, ind and a be integers, with mat even. De� ne g 5 gcd(mat ¡ mis, mat=2 ¡ ind, ¡ a), mis 0 5 (mat ¡ mis)=g, ind 0 5 (mat=2 ¡ ind)=g and a 0 5 ¡ a=g. The cost assigned to an alignment w... |

72 |
Sequence comparison with concave weighting functions.
- Miller, Myers
- 1988
(Show Context)
Citation Context ...are frequently used in bioinformatics. All of these generalizations can be handled by appropriate modi� cations to Figure 2. Further generalizations are possible (e.g., to more general gap penalties (=-=Miller and Myers, 1988-=-)), but have yet to prove useful in practice. Our discussion of the greedy algorithm has considered three operations, viz., substitution, insertion and deletion of single nucleotides. Each operation w... |

70 |
Generation and analysis of 280,000 human expressed sequence tags,”
- Hillier, Lennon, et al.
- 1996
(Show Context)
Citation Context ...l about the same as, or slightly more than, a replacement. This seems consistent with published � gures on the rates of actual errors in both single-pass (low accuracy) sequences (Ewing et al., 1998; =-=Hillier et al., 1996-=-) and high quality data. On the other hand, it is widely appreciated that dynamic-programming alignment algorithms can guarantee a theoretically optimal alignment under a wide variety of scoring schem... |

62 |
Gapped BLAST and PSI-BLAST:Anew generation of protein database search programs. Nucl.Acids Res.
- Altschul, Madden, et al.
- 1997
(Show Context)
Citation Context ...o- or dinucleoide repeat. In addition to these dif� culties, there is the question of how to determine when the end of the similar region has been reached. The X-drop approach (Altschul et al., 1990; =-=Altschul et al., 1997-=-; Zhang et al., 1998a,b) provides a rather natural solution to these problems. The width of the region being searched, i.e., the number of adjacent diagonals, expands at regions of low-sequence comple... |

40 |
Comparative biosequence metrics
- Smith
- 1981
(Show Context)
Citation Context ...ence the score, unchanged, from which it follows that 2 £ mis 5 2 £ ind 1 mat. In summary, the equivalence of score and distance implies that ind 5 mis ¡ mat=2. As the following lemma shows (see also =-=Smith et al., 1981-=-), that equality is suf� cient to guarantee the desired equivalence and, furthermore, the formula to translate distance into score depends only on the antidiagonal containing the alignment end-point. ... |

35 |
An O(NP) sequence comparison algorithm.
- WU, MANBER, et al.
- 1990
(Show Context)
Citation Context ...g scoring parameters. The faster approach, which we call the greedy algorithm, has been generalized and studied extensively by computer scientists (Miller and Myers, 1985; Myers, 1986; Ukkonen, 1985; =-=Wu et al., 1990-=-). It has also been adapted to build several practical programs for comparing DNA sequences (Chao et al., 1997; Florea et al., 1998). The main contribution of this paper is a theoretically sound metho... |

23 | Alignments without low-scoring regions
- Zhang, Berman, et al.
- 1998
(Show Context)
Citation Context ...t. In addition to these dif� culties, there is the question of how to determine when the end of the similar region has been reached. The X-drop approach (Altschul et al., 1990; Altschul et al., 1997; =-=Zhang et al., 1998-=-a,b) provides a rather natural solution to these problems. The width of the region being searched, i.e., the number of adjacent diagonals, expands at regions of low-sequence complexity or concentrated... |

9 | A tool for aligning very similar DNA sequences
- Chao, Zhang, et al.
- 1997
(Show Context)
Citation Context ...ied extensively by computer scientists (Miller and Myers, 1985; Myers, 1986; Ukkonen, 1985; Wu et al., 1990). It has also been adapted to build several practical programs for comparing DNA sequences (=-=Chao et al., 1997-=-; Florea et al., 1998). The main contribution of this paper is a theoretically sound method for pruning the search region. Returning to the above scenario, where we explore three adjacent diagonals, w... |

9 | Row replacement algorithms for screen editors
- Myers, Miller
- 1989
(Show Context)
Citation Context ...GORITHM FOR ALIGNING DNA SEQUENCES 213 lengths (Gotoh, 1982) by a dynamic programming algorithm that is equivalent to � nding an optimal path in a graph that has three vertices per grid-point (i, j) (=-=Myers and Miller, 1989-=-a). Let mat, mis, ind and a be integers, with mat even. De� ne g 5 gcd(mat ¡ mis, mat=2 ¡ ind, ¡ a), mis 0 5 (mat ¡ mis)=g, ind 0 5 (mat=2 ¡ ind)=g and a 0 5 ¡ a=g. The cost assigned to an alignment w... |