Results 1  10
of
11
An efficient algorithm for the extended (l,d)motif problem with unknown number of binding sites
 Proc. BIBE
, 2005
"... Finding common patterns, or motifs, from a set of DNA sequences is an important problem in molecular biology. Most motifdiscovering algorithms/software require the length of the motif as input. Motivated by the fact that the motif’s length is usually unknown in practice, Styczynski et al. introduce ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
Finding common patterns, or motifs, from a set of DNA sequences is an important problem in molecular biology. Most motifdiscovering algorithms/software require the length of the motif as input. Motivated by the fact that the motif’s length is usually unknown in practice, Styczynski et al. introduced the Extended (l,d)Motif Problem (EMP), where the motif’s length is not an input parameter. Unfortunately, the algorithm given by Styczynski et al. to solve EMP can take an unacceptably long time to run, e.g. over 3 months to discover a length14 motif. This paper makes two main contributions. First, we eliminate another input parameter from EMP: the minimum number of binding sites in the DNA sequences. Fewer input parameters not only reduces the burden of the user, but also may give more realistic/robust results since restrictions on length or on the number of binding sites make little sense when the best motif may not be the longest nor have the largest number of binding sites. Second, we develop an efficient algorithm to solve our redefined problem. The algorithm is also a fast solution for EMP (without any sacrifice to accuracy) making EMP practical. 1.
Finding motifs from all sequences with and without binding sites
 Bioinformatics
, 2005
"... doi:10.1093/bioinformatics/btl371 ..."
Generalized Planted (l,d)Motif Problem with Negative Set
 WABI
, 2005
"... Abstract. Finding similar patterns (motifs) in a set of sequences is an important problem in Computational Molecular Biology. Pevzner and Sze [18] defined the planted (l,d)motif problem as trying to find a lengthl pattern that occurs in each input sequence with at most d substitutions. When d is la ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
Abstract. Finding similar patterns (motifs) in a set of sequences is an important problem in Computational Molecular Biology. Pevzner and Sze [18] defined the planted (l,d)motif problem as trying to find a lengthl pattern that occurs in each input sequence with at most d substitutions. When d is large, this problem is difficult to solve because the input sequences do not contain enough information on the motif. In this paper, we propose a generalized planted (l,d)motif problem which considers as input an additional set of sequences without any substring similar to the motif (negative set) as extra information. We analyze the effects of this negative set on the finding of motifs, and define a set of unsolvable problems and another set of most difficult problems, known as “challenging generalized problems”. We develop an algorithm called VANS based on voting and other novel techniques, which can solve the (9,3), (11,4),(15,6) and (20,8)motif problems which were unsolvable before as well as challenging problems of the planted (l,d)motif problem such as (9,2), (11,3), (15,5) and (20,7)motif problems. 1
OPTIMAL ALGORITHM FOR FINDING DNA MOTIFS WITH NUCLEOTIDE ADJACENT DEPENDENCY ∗
"... Abstract: Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract: Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and Leung introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NPhard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPSFinder) for finding optimal DPS motifs. Experimental results show that DPSFinder can discover a length10 motif from 22 length500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation. 1
REDUNDANCY ELIMINATION IN MOTIF DISCOVERY ALGORITHMS ∗
"... Abstract: The problem of finding motifs in binding sites is very important to the understanding of gene regulatory networks. However, when predicting a set of motifs, existing algorithms suffer the problem of either predicting many redundant motifs (motifs with similar binding sites) or, at the othe ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract: The problem of finding motifs in binding sites is very important to the understanding of gene regulatory networks. However, when predicting a set of motifs, existing algorithms suffer the problem of either predicting many redundant motifs (motifs with similar binding sites) or, at the other extreme, missing the hidden motif. In this paper, we formulate the Motif Redundancy Problem (MRP) to model this kind of problem and introduce an algorithm called RME (Redundancy Motif Elimination) for solving MRP. Experimental results on real biological data show that a standard EMbased motif discovery algorithm enhanced with RME has a better performance than the popular motif discovery algorithm MEME. 1
motif length and number of binding sites
"... An efficient motif discovery algorithm with unknown ..."
1 ALGORITHMS FOR CHALLENGING MOTIF PROBLEMS ∗
"... Pevzner and Sze [19] have introduced the Planted (l,d)Motif Problem to find similar patterns (motifs) in sequences which represent the promoter regions of coregulated genes, where l is the length of the motif and d is the maximum Hamming distance around the similar patterns. Many algorithms have b ..."
Abstract
 Add to MetaCart
(Show Context)
Pevzner and Sze [19] have introduced the Planted (l,d)Motif Problem to find similar patterns (motifs) in sequences which represent the promoter regions of coregulated genes, where l is the length of the motif and d is the maximum Hamming distance around the similar patterns. Many algorithms have been developed to solve this motif problem. However, these algorithms either have long running times or do not guarantee the motif can be found. In this paper, we introduce new algorithms to solve this motif problem. Our algorithms can find motifs in reasonable time for not only the challenging (9,2), (11,3), (15,5)motif problems but for even longer motifs, say (20,7), (30,11) and (40,15), which have never been seriously attempted by other researchers because of large time and space requirements. Besides, our algorithms can be extended to find more complicated motifs structure called cisregulatory modules (CRM). 1
OPTIMAL ALGORITHM FOR FINDING DNA MOTIFS WITH NUCLEOTIDE ADJACENT DEPENDENCY
"... Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurren ..."
Abstract
 Add to MetaCart
Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and Leung introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NPhard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPSFinder) for finding optimal DPS motifs. Experimental results show that DPSFinder can discover a length10 motif from 22 length500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation.
BIOINFORMATICS Finding Motifs from All Sequences With and Without Binding Sites
"... Motivation: Finding common patterns, motifs, from a set of promoter regions of coregulated genes is an important problem in molecular biology. Most existing motiffinding algorithms consider a set of sequences bound by the transcription factor as the only input. However, we can get better results ..."
Abstract
 Add to MetaCart
(Show Context)
Motivation: Finding common patterns, motifs, from a set of promoter regions of coregulated genes is an important problem in molecular biology. Most existing motiffinding algorithms consider a set of sequences bound by the transcription factor as the only input. However, we can get better results by considering sequences that are not bound by the transcription factor as an additional input. Results: Firstly, instead of using the simple hypergeometric analysis, we propose to calculate the likelihood based on a more precise probabilistic analysis which considers motif length, sequence length and number of binding sites as input parameters for testing whether motif is found. Secondly, we adopt an heuristic algorithm bases on our analysis to find motifs. For the simulated and real data sets, our algorithm ALSE compares favorably against common motiffinding programs such as SeedSearch and MEME in all cases and performs very well, especially when each input sequence contains more than one binding site. Availability: ALSE is available for download at the homepage