## Finding motifs using random projections (2001)

### Cached

### Download Links

- [www.cs.columbia.edu]
- [www1.cs.columbia.edu]
- [www.cs.washington.edu]
- [b.web.umkc.edu]
- [bipad.cmh.edu]
- [www.cs.wisc.edu]
- [research.cs.wisc.edu]
- [www.quretec.com]
- DBLP

### Other Repositories/Bibliography

Citations: | 226 - 5 self |

### BibTeX

@INPROCEEDINGS{Buhler01findingmotifs,

author = {Jeremy Buhler and Martin Tompa},

title = {Finding motifs using random projections},

booktitle = {},

year = {2001},

pages = {69--76}

}

### Years of Citing Articles

### OpenURL

### Abstract

Pevzner and Sze [23] considered a precise version of the motif discovery problem and simultaneously issued an algorithmic challenge: find a motif Å of length 15, where each planted instance differs from Å in 4 positions. Whereas previous algorithms all failed to solve this (15,4)-motif problem, Pevzner and Sze introduced algorithms that succeeded. However, their algorithms failed to solve the considerably more difficult (14,4)-, (16,5)-, and (18,6)motif problems. We introduce a novel motif discovery algorithm based on the use of random projections of the input’s substrings. Experiments on simulated data demonstrate that this algorithm performs better than existing algorithms and, in particular, typically solves the difficult (14,4)-, (16,5)-, and (18,6)-motif problems quite efficiently. A probabilistic estimate shows that the small values of � for which the algorithm fails to recover the planted Ð � �-motif are in all likelihood inherently impossible to solve. We also present experimental results on realistic biological data by identifying ribosome binding sites in prokaryotes as well as a number of known transcriptional regulatory motifs in eukaryotes. 1. CHALLENGING MOTIF PROBLEMS Pevzner and Sze [23] considered a very precise version of the motif discovery problem of computational biology, which had also been considered by Sagot [26]. Based on this formulation, they issued an algorithmic challenge: Planted Ð � �-Motif Problem: Suppose there is a fixed but unknown nucleotide sequence Å (the motif) of length Ð. The problem is to determine Å, givenØ nucleotide sequences each of length Ò, and each containing a planted variant of Å. More precisely, each such planted variant is a substring that is Å with exactly � point substitutions. One instantiation that they labeled “The Challenge Problem ” was parameterized as finding a planted (15,4)-motif in Ø � sequences each of length Ò � �. These values of Ò, Ø, andÐ are

### Citations

9054 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...quence is notsxed a priori, making computation of W dicult because Pr(S j W ; P ) must be summed over all possible locations of the motif instances. To address this diculty, the core EM algorithm [8] species an iterative calculation that, given an initial guess W 0 at the motif model, converges linearly to a locally maximum-likelihood model in the neighborhood of W 0 . Projection performs EM re... |

759 | Approximate nearest neighbors: Towards removing the curse of dimensionality
- Indyk, Motwani
- 1998
(Show Context)
Citation Context ...n use the k selected positions of each l-mer x as a hash function h(x). (This idea is derived from \locality-sensitive hashing," employed in the context of computational geometry by Indyk and Mot=-=wani [14-=-], in databases by Gionis et al. [11], and in computational biology by Buhler [7]. A dierent randomized projection algorithm was used by Linial et al. [19] to cluster proteins.) When a sucient number ... |

573 |
Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment
- Lawrence, Altschul, et al.
- 1993
(Show Context)
Citation Context ... collection of coregulated gene promoter regions in yeast. A number of algorithms tosnd motifs have been proposed previously, for example, Bailey and Elkan [2], Hertz and Stormo [13], Lawrence et al. =-=[16]-=-, Lawrence and Reilly [17], and Rocke and Tompa [25]. Since these algorithms employ some form of local search such as Gibbs sampling, expectation maximization, or the greedy method, each may end in a ... |

461 | Similarity search in high dimensions via hashing
- Gionis, Indyk, et al.
- 1999
(Show Context)
Citation Context ...ch l-mer x as a hash function h(x). (This idea is derived from \locality-sensitive hashing," employed in the context of computational geometry by Indyk and Motwani [14], in databases by Gionis et=-= al. [11-=-], and in computational biology by Buhler [7]. A dierent randomized projection algorithm was used by Linial et al. [19] to cluster proteins.) When a sucient number of l-mers hash to the same bucket, t... |

455 | The Geometry of Graphs and some of Its Algorithmic Applications - Linial, London, et al. - 1995 |

425 | Extensions of Lipschitz mappings into a Hilbert space. Conference in modern analysis and probability - Johnson, Lindenstrauss - 1982 |

321 |
Identifying DNA and protein patterns with statistically significant alignments of multiple sequences
- Hertz, Stormo
- 1999
(Show Context)
Citation Context ...tor binding sites in a collection of coregulated gene promoter regions in yeast. A number of algorithms tosnd motifs have been proposed previously, for example, Bailey and Elkan [2], Hertz and Stormo =-=[13]-=-, Lawrence et al. [16], Lawrence and Reilly [17], and Rocke and Tompa [25]. Since these algorithms employ some form of local search such as Gibbs sampling, expectation maximization, or the greedy meth... |

271 |
Pattern Classi cation and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...on sparsesFINDING MOTIFS USING RANDOM PROJECTIONS 227 sampling of positions from feature vectors have long been used in machine vision to match a perceived object against a database of known objects (=-=Duda and Hart, 1973-=-, Chapter 6); in the vision and computational geometry communities, this technique has a distinguished history under the name “geometric hashing” (Wolfson and Rigoutsos, 1997). Analytical proof of the... |

224 | Unsupervised learning of multiple motifs in biopolymers using expectation maximization
- Bailey, Elkan
- 1995
(Show Context)
Citation Context ...ding transcription factor binding sites in a collection of coregulated gene promoter regions in yeast. A number of algorithms tosnd motifs have been proposed previously, for example, Bailey and Elkan =-=[2]-=-, Hertz and Stormo [13], Lawrence et al. [16], Lawrence and Reilly [17], and Rocke and Tompa [25]. Since these algorithms employ some form of local search such as Gibbs sampling, expectation maximizat... |

216 | Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies
- Helden, Andre, et al.
- 1998
(Show Context)
Citation Context ...e guaranteed tosnd the optimal motif. (See, for example, Blanchette et al. [4], Brazma et al. [6], Galas et al. [10], Sagot [26], Sinha and Tompa [27], Staden [28], Tompa [29], and van Helden et al. [=-=30]-=-.) However, these enumerative algorithms run in time exponential in the motif length l and become impractical for the sizes involved in the challenge problem. Other motif discovery algorithms have bee... |

207 | Combinatorial approaches to finding subtle signals in DNA sequences
- Pevzner, Sze
- 2000
(Show Context)
Citation Context ...listic biological data by identifying ribosome binding sites in prokaryotes as well as a number of known transcriptional regulatory motifs in eukaryotes. 1. CHALLENGING MOTIF PROBLEMS Pevzner and Sze =-=[23]-=- considered a very precise version of the motif discovery problem of computational biology, which had also been considered by Sagot [26]. Based on this formulation, they issued an algorithmic challeng... |

152 |
TRANSFAC: a database on transcription factors and their DNA binding sites
- Wingender, Dietze, et al.
- 1996
(Show Context)
Citation Context ...re multiple shifted versions of the motif were found, only the most 5 0 result is shown. Italicized portions of the motifs indicate matches to known sequence features. References: (A) TRANSFAC signal =-=[31]-=-; (B) non-TATA transcription start signal [21]; (C) MREa promoter [1]; (D) c-fos serum response element [22]; (E) yeast early cell cycle box [20]. In all experiments we set l = 20 and d = 2, which wor... |

140 |
A expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences, PROTEINS: Structure, Function, Genetics 7
- Lawrence, Reilly
- 1990
(Show Context)
Citation Context ...ene promoter regions in yeast. A number of algorithms to find motifs have been proposed previously, for example, Bailey and Elkan [2], Hertz and Stormo [13], Lawrence et al. [16], Lawrence and Reilly =-=[17]-=-, and Rocke and Tompa [25]. Since these algorithms employ some form of local search such as Gibbs sampling, expectation maximization, or the greedy method, each may end in a local optimum rather than ... |

132 | Predicting gene regulatory elements in silico in a genomic scale
- Brazma
- 1998
(Show Context)
Citation Context ...o a number of motif-nding algorithms, based on exhaustive enumeration of the possible motifs M , that are guaranteed tosnd the optimal motif. (See, for example, Blanchette et al. [4], Brazma et al. [6=-=]-=-, Galas et al. [10], Sagot [26], Sinha and Tompa [27], Staden [28], Tompa [29], and van Helden et al. [30].) However, these enumerative algorithms run in time exponential in the motif length l and bec... |

98 | A statistical method for finding transcription factor binding sites
- Sinha, Tompa
- 2000
(Show Context)
Citation Context ...ustive enumeration of the possible motifs Å, that are guaranteed to find the optimal motif. (See, for example, Blanchette et al. [4], Brāzma et al. [6], Galas et al. [10], Sagot [26], Sinha and Tompa =-=[27]-=-, Staden [28], Tompa [29], and van Helden et al. [30].) However, these enumerative algorithms run in time exponential in the motif length Ð and become impractical for the sizes involved in the challen... |

97 |
FLASH: A fast look-up algorithm for string homology
- Rigoutsos, Califano
- 1993
(Show Context)
Citation Context ... feature vector whose features are individual residues (nucleotides or amino acids). This observation led to their FLASH algorithm for detecting strong pairwise local alignments between biosequences (=-=Rigoutsos and Califano, 1993-=-). Buhler (2001) took a somewhat different approach, applying Indyk and Motwani’s locality-sensitive hashing technique to obtain randomized sensitivity guarantees for genomic sequence similarity searc... |

85 |
Comparison of initiation of protein synthesis in procaryotes, eucaryotes, and organelles
- Kozak
- 1983
(Show Context)
Citation Context ...6S rRNA of the ribosome binds to the transcribed mRNAs of the organism's genes. It is known that this binding site is complementary to a short subsequence very near the 3 0 end of the 16S rRNA (Kozak =-=[15-=-]), which provides a check for the plausibility of the planted motif that Projection reports. These instances are quite dierent from the ones discussed in Sections 3.1 and 3.3: they contain thousands ... |

82 | J.(2001):Effective large-scale sequence comparison by locality-sensitive hashing. Bioinformatics
- Buhler
(Show Context)
Citation Context ...a is derived from \locality-sensitive hashing," employed in the context of computational geometry by Indyk and Motwani [14], in databases by Gionis et al. [11], and in computational biology by Bu=-=hler [7-=-]. A dierent randomized projection algorithm was used by Linial et al. [19] to cluster proteins.) When a sucient number of l-mers hash to the same bucket, they are likely to be enriched for the plante... |

76 | Spelling approximate repeated or common motifs using a suffix tree.In
- Sagot
- 1998
(Show Context)
Citation Context ...fs in eukaryotes. 1. Challenging Motif Problems Pevzner and Sze [23] considered a very precise version of the motif discovery problem of computational biology, which had also been considered by Sagot =-=[26]-=-. Based on this formulation, they issued an algorithmic challenge: Planted (l; d)-Motif Problem: Suppose there is asxed but unknown nucleotide sequence M (the motif ) of length l. The problem is to de... |

71 | An exact method for finding short motifs in sequences, with application to the Ribosome binding site problem
- Tompa
- 1999
(Show Context)
Citation Context ... possible motifs Å, that are guaranteed to find the optimal motif. (See, for example, Blanchette et al. [4], Brāzma et al. [6], Galas et al. [10], Sagot [26], Sinha and Tompa [27], Staden [28], Tompa =-=[29]-=-, and van Helden et al. [30].) However, these enumerative algorithms run in time exponential in the motif length Ð and become impractical for the sizes involved in the challenge problem. Other motif d... |

70 |
Rigorous pattern recognition methods for DNA sequences
- Galas, Eggert, et al.
- 1985
(Show Context)
Citation Context ...f-nding algorithms, based on exhaustive enumeration of the possible motifs M , that are guaranteed tosnd the optimal motif. (See, for example, Blanchette et al. [4], Brazma et al. [6], Galas et al. [1=-=0]-=-, Sagot [26], Sinha and Tompa [27], Staden [28], Tompa [29], and van Helden et al. [30].) However, these enumerative algorithms run in time exponential in the motif length l and become impractical for... |

59 | Algorithms for phylogenetic footprinting
- Blanchette
- 2001
(Show Context)
Citation Context ...amined orthologous sequences from a variety of organisms taken from regions upstream of four types of gene: preproinsulin, dihydrofolate reductase (DHFR), metallothioneins, and c-fos. (See Blanchette =-=[3]-=- for an alternative approach to finding motifs in these sequences.) These sequences are known to contain binding sites for specific transcription factors. We also tested a collection of promoter regio... |

59 |
Expectation maximization algorithm for identifying proteinbinding sites with variable lenghts from unaligned DNA fragments
- Cardon, Stormo
- 1992
(Show Context)
Citation Context ...t only permits substitutions. The latter problem is characteristic not only of our method but also of many other popular motif � nders. A simpler extension that has proven more tractable in practice (=-=Cardon and Stormo, 1992-=-; Sinha and Tompa, 2000; Marsan and Sagot, 2000) is to handle motifs with one or a few variable-length spacers. The dimeric structure of many transcription factors suggests that motifs with one centra... |

41 |
Methods for discovering novel motifs in nucleic acid sequences
- Staden
- 1989
(Show Context)
Citation Context ...ation of the possible motifs M , that are guaranteed tosnd the optimal motif. (See, for example, Blanchette et al. [4], Brazma et al. [6], Galas et al. [10], Sagot [26], Sinha and Tompa [27], Staden [=-=28]-=-, Tompa [29], and van Helden et al. [30].) However, these enumerative algorithms run in time exponential in the motif length l and become impractical for the sizes involved in the challenge problem. O... |

40 | Global self-organization of all known protein sequences reveals inherent biological signatures
- Linial, Linial, et al.
- 1997
(Show Context)
Citation Context ... computational geometry by Indyk and Motwani [14], in databases by Gionis et al. [11], and in computational biology by Buhler [7]. A dierent randomized projection algorithm was used by Linial et al. [=-=19]-=- to cluster proteins.) When a sucient number of l-mers hash to the same bucket, they are likely to be enriched for the planted motif M . Experiments demonstrate that Projection performs better than al... |

37 |
Genes VI
- Lewin
- 1997
(Show Context)
Citation Context ...follows from the well-known fact that in many bacteria, the binding site for the 16S rRNA during translation initiation is the Shine–Dalgarno sequence AAGGAGG or a large substring of it (Kozak, 1983; =-=Lewin, 1997-=-). The reported motifs for the four bacteria in Table 4 agree quite well with this sequence. In archaea such as M. jannaschii, the 3 0 end of the 16S rRNA is missing a few terminal nucleotides compare... |

35 |
The RFX-type transcription factor DAF-19 regulates sensory neuron cilium formation in C. elegans
- Swoboda, Adler, et al.
- 2000
(Show Context)
Citation Context ...sequences listed in Table 3, we ran PROJECTION on a set of 20 1,000-base C. elegans promoter regions containing the “X box” motif RYYNYYATRRNRAC , the target site for the DAF-19 transcription factor (=-=Swoboda et al., 2000-=-). The genes from which these sequences are taken were chosen by P. Swoboda (personal communication) because their expression is likely regulated by DAF-19. Some genes exhibit empirical evidence of su... |

27 | An exact algorithm to identify motifs in orthologous sequences from multiple species
- Blanchette, Schwikowski, et al.
(Show Context)
Citation Context ...above, there are also a number of motif-nding algorithms, based on exhaustive enumeration of the possible motifs M , that are guaranteed tosnd the optimal motif. (See, for example, Blanchette et al. [=-=4-=-], Brazma et al. [6], Galas et al. [10], Sagot [26], Sinha and Tompa [27], Staden [28], Tompa [29], and van Helden et al. [30].) However, these enumerative algorithms run in time exponential in the mo... |

25 |
Identification of common motifs in unaligned dna sequences: application to Escherichia coli Lrp regulon. Bioinformatics
- Fraenkel, Mandel, et al.
- 1995
(Show Context)
Citation Context ...algorithms run in time exponential in the motif length l and become impractical for the sizes involved in the challenge problem. Other motif discovery algorithms have been proposed by Fraenkel et al. =-=[9]-=- and Rigoutsos and Floratos [24]. Pevzner and Sze [23] introduced two novel algorithms, WINNOWER and SP-STAR, both of which succeeded in solving the planted (15,4)-motif challenge problem. In summary,... |

25 |
A novel Mcm1-dependent element
- McInerny, Partridge, et al.
- 1997
(Show Context)
Citation Context ...contain binding sites for specic transcription factors. We also tested a collection of promoter regions 1 from the yeast S. cerevisiae that is known to contain a shared cell-cycle-dependent promoter [=-=20]-=-. Unlike the synthetic examples of Section 3.1, our promoter examples contain background DNA that varies substantially from our simple random model. The embedded motifs are better conserved than in ou... |

18 | Motif Discovery Without Alignment Or Enumeration - Rigoutsos, Floratos - 1998 |

18 | An algorithm for finding novel gapped motifs in DNA sequences
- Rocke, Tompa
- 1998
(Show Context)
Citation Context ...ast. A number of algorithms to find motifs have been proposed previously, for example, Bailey and Elkan [2], Hertz and Stormo [13], Lawrence et al. [16], Lawrence and Reilly [17], and Rocke and Tompa =-=[25]-=-. Since these algorithms employ some form of local search such as Gibbs sampling, expectation maximization, or the greedy method, each may end in a local optimum rather than finding the best motif. In... |

17 |
Transcription initiation from the dihydrofolate reductase promoter is positioned by HIP1 binding at the initiation site
- Means, Farnham
- 1990
(Show Context)
Citation Context ... found, only the most 5 0 result is shown. Italicized portions of the motifs indicate matches to known sequence features. References: (A) TRANSFAC signal [31]; (B) non-TATA transcription start signal =-=[21]-=-; (C) MREa promoter [1]; (D) c-fos serum response element [22]; (E) yeast early cell cycle box [20]. In all experiments we set l = 20 and d = 2, which worked well despite the fact that the actual moti... |

10 |
Deriving ribosomal binding site (RBS) statistical models from unannotated DNA sequences and the use of the RBS model for N-terminal prediction. Paci c Symposium on Biocomputing
- Hayes, Borodovsky
- 1998
(Show Context)
Citation Context ... a few terminal nucleotides compared to the bacterial rRNA sequences, and the 16S rRNA binding site is instead AGGTGAT or a large substring of it (Woese, personal communication). Hayes and Borodovsky =-=[1-=-2] discovered the motif GGTGA in M. jannaschii using a Gibbs sampler, and Tompa [29] discovered similar binding sites in four dierent archaeal genomes, including M. jannaschii . Tompa used a very dier... |

8 |
YY1 facilitates the association of serum response factor with the c-fos serum response element
- Natsan, Gilman
- 1995
(Show Context)
Citation Context ... of the motifs indicate matches to known sequence features. References: (A) TRANSFAC signal [31]; (B) non-TATA transcription start signal [21]; (C) MREa promoter [1]; (D) c-fos serum response element =-=[22]-=-; (E) yeast early cell cycle box [20]. In all experiments we set l = 20 and d = 2, which worked well despite the fact that the actual motifs varied considerably in length. We chose a projection size k... |

7 |
Metal-dependent binding of a factor in vivo to the metal-responsive elements of the metallothionein 1 gene promoter.Molecular and Cellular Biology 7
- Andersen, Taplitz, et al.
- 1987
(Show Context)
Citation Context ...0 result is shown. Italicized portions of the motifs indicate matches to known sequence features. References: (A) TRANSFAC signal [31]; (B) non-TATA transcription start signal [21]; (C) MREa promoter =-=[1]-=-; (D) c-fos serum response element [22]; (E) yeast early cell cycle box [20]. In all experiments we set l = 20 and d = 2, which worked well despite the fact that the actual motifs varied considerably ... |

7 |
Positive and negative regulation of the human insulin gene by multiple trans-acting factors
- Boam, Clark, et al.
- 1990
(Show Context)
Citation Context ...sequences sometimes contained more than one known promoter site. For example, analysis of the preproinsulin locus yielded a site known from TRANSFAC but missed the better known CT-II promoter element =-=[5-=-]. Enhancements to EM renement, such as probabilistic erasing [2], can ameliorate this method's tendency to prefer only one out of several possible high-scoring motifs. Simply using less stringent sel... |

7 |
An exact method for short motifs in sequences, with application to the ribosome binding site problem
- Tompa
- 1999
(Show Context)
Citation Context ... possible motifs M , that are guaranteed tosnd the optimal motif. (See, for example, Blanchette et al. [4], Brazma et al. [6], Galas et al. [10], Sagot [26], Sinha and Tompa [27], Staden [28], Tompa [=-=29]-=-, and van Helden et al. [30].) However, these enumerative algorithms run in time exponential in the motif length l and become impractical for the sizes involved in the challenge problem. Other motif d... |

6 |
Combinatorial approaches to nding subtle signals in DNA sequences
- Pevzner, Sze
- 2000
(Show Context)
Citation Context ...ber of motif occurrences recovered in the synthetic challenge problems of Section 3.1, we perform a further combinatorial re� nement of each Sb. This further re� nement process is similar to SP-STAR (=-=Pevzner and Sze, 2000-=-) but uses a different score function. Compute the consensus Mb of the sequences in Sb, and de� ne the score .Sb/ to be the number of sequences in Sb whose Hamming distance to Sb is at most d. Let S... |

5 |
Combinatorial approaches to subtle signals in DNA sequences
- Pevzner, Sze
- 2000
(Show Context)
Citation Context ... Martin Tompa Department of Computer Science and Engineering Box 352350 University of Washington Seattle, WA 98195-2350 USA fjbuhler,tompag@cs.washington.edu February 6, 2001 Abstract Pevzner and Sze =-=[23-=-] considered a precise version of the motif discovery problem and simultaneously issued an algorithmic challenge:snd a motif M of length 15, where each planted instance diers from M in 4 positions. Wh... |

5 |
A statistical method for transcription factor binding sites
- Sinha, Tompa
- 2000
(Show Context)
Citation Context ...ustive enumeration of the possible motifs M , that are guaranteed tosnd the optimal motif. (See, for example, Blanchette et al. [4], Brazma et al. [6], Galas et al. [10], Sagot [26], Sinha and Tompa [=-=27]-=-, Staden [28], Tompa [29], and van Helden et al. [30].) However, these enumerative algorithms run in time exponential in the motif length l and become impractical for the sizes involved in the challen... |

5 | An expectation maximization (EM) algorithm for the identi cation and characterization of common sites in unaligned biopolymer sequences - Lawrence, Reilly - 1990 |

5 |
A statistical method for nding transcription factor binding sites
- Sinha, Tompa
- 2000
(Show Context)
Citation Context ...ons. The latter problem is characteristic not only of our method but also of many other popular motif � nders. A simpler extension that has proven more tractable in practice (Cardon and Stormo, 1992; =-=Sinha and Tompa, 2000-=-; Marsan and Sagot, 2000) is to handle motifs with one or a few variable-length spacers. The dimeric structure of many transcription factors suggests that motifs with one central spacer, as occur in, ... |

4 |
An expectation maximization (EM) algorithm for the identi and characterization of common sites in unaligned biopolymer sequences. Proteins
- Lawrence, Reilly
- 1990
(Show Context)
Citation Context ... gene promoter regions in yeast. A number of algorithms tosnd motifs have been proposed previously, for example, Bailey and Elkan [2], Hertz and Stormo [13], Lawrence et al. [16], Lawrence and Reilly =-=[17]-=-, and Rocke and Tompa [25]. Since these algorithms employ some form of local search such as Gibbs sampling, expectation maximization, or the greedy method, each may end in a local optimum rather thans... |

4 |
An exact method for nding short motifs in sequences, with application to the ribosome binding site problem
- Tompa
- 1999
(Show Context)
Citation Context ...rather than picking them at random. Indeed, because the embedded motifs are so short, this particular problem has been addressed enumeratively without resorting to iterative search techniques at all (=-=Tompa, 1999-=-). The signi� cance of our ribosome binding site results is rather to show that PROJECTION is capable of solving motif-� nding problems that are quite different both from the typical applications of S... |

3 | On Lipschitz embedding of nite metric spaces in Hilbert space - Bourgain - 1985 |

2 |
An algorithm for novel gapped motifs in DNA sequences
- Rocke, Tompa
- 1998
(Show Context)
Citation Context ...yeast. A number of algorithms tosnd motifs have been proposed previously, for example, Bailey and Elkan [2], Hertz and Stormo [13], Lawrence et al. [16], Lawrence and Reilly [17], and Rocke and Tompa =-=[25]-=-. Since these algorithms employ some form of local search such as Gibbs sampling, expectation maximization, or the greedy method, each may end in a local optimum rather thansnding the best motif. Inde... |

1 | Algorithms for phylogenetic footprinting.To - Blanchette, Schwikowski, et al. - 2002 |

1 | Identi� cation of common motifs in unaligned DNA sequences: Application to Escherichia coli Lrp regulon - Fraenkel, Mandel, et al. - 1995 |

1 | Rigorous pattern-recognitionmethods for DNA sequences: Analysis of promoter sequences from Escherichia coli - Galas, Eggert, et al. - 1985 |