Results 1 
6 of
6
Finding motifs from all sequences with and without binding sites
 Bioinformatics
, 2005
"... doi:10.1093/bioinformatics/btl371 ..."
An efficient algorithm for the extended (l,d)motif problem with unknown number of binding sites
 Proc. BIBE
, 2005
"... Finding common patterns, or motifs, from a set of DNA sequences is an important problem in molecular biology. Most motifdiscovering algorithms/software require the length of the motif as input. Motivated by the fact that the motif’s length is usually unknown in practice, Styczynski et al. introduce ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Finding common patterns, or motifs, from a set of DNA sequences is an important problem in molecular biology. Most motifdiscovering algorithms/software require the length of the motif as input. Motivated by the fact that the motif’s length is usually unknown in practice, Styczynski et al. introduced the Extended (l,d)Motif Problem (EMP), where the motif’s length is not an input parameter. Unfortunately, the algorithm given by Styczynski et al. to solve EMP can take an unacceptably long time to run, e.g. over 3 months to discover a length14 motif. This paper makes two main contributions. First, we eliminate another input parameter from EMP: the minimum number of binding sites in the DNA sequences. Fewer input parameters not only reduces the burden of the user, but also may give more realistic/robust results since restrictions on length or on the number of binding sites make little sense when the best motif may not be the longest nor have the largest number of binding sites. Second, we develop an efficient algorithm to solve our redefined problem. The algorithm is also a fast solution for EMP (without any sacrifice to accuracy) making EMP practical. 1.
Generalized Planted (l,d)Motif Problem with Negative Set
 WABI
, 2005
"... Abstract. Finding similar patterns (motifs) in a set of sequences is an important problem in Computational Molecular Biology. Pevzner and Sze [18] defined the planted (l,d)motif problem as trying to find a lengthl pattern that occurs in each input sequence with at most d substitutions. When d is la ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
Abstract. Finding similar patterns (motifs) in a set of sequences is an important problem in Computational Molecular Biology. Pevzner and Sze [18] defined the planted (l,d)motif problem as trying to find a lengthl pattern that occurs in each input sequence with at most d substitutions. When d is large, this problem is difficult to solve because the input sequences do not contain enough information on the motif. In this paper, we propose a generalized planted (l,d)motif problem which considers as input an additional set of sequences without any substring similar to the motif (negative set) as extra information. We analyze the effects of this negative set on the finding of motifs, and define a set of unsolvable problems and another set of most difficult problems, known as “challenging generalized problems”. We develop an algorithm called VANS based on voting and other novel techniques, which can solve the (9,3), (11,4),(15,6) and (20,8)motif problems which were unsolvable before as well as challenging problems of the planted (l,d)motif problem such as (9,2), (11,3), (15,5) and (20,7)motif problems. 1
OPTIMAL ALGORITHM FOR FINDING DNA MOTIFS WITH NUCLEOTIDE ADJACENT DEPENDENCY
"... Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurren ..."
Abstract
 Add to MetaCart
Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and Leung introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NPhard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPSFinder) for finding optimal DPS motifs. Experimental results show that DPSFinder can discover a length10 motif from 22 length500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation.
OPTIMAL ALGORITHM FOR FINDING DNA MOTIFS WITH NUCLEOTIDE ADJACENT DEPENDENCY ∗
"... Abstract: Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the ..."
Abstract
 Add to MetaCart
Abstract: Finding motifs and the corresponding binding sites is a critical and challenging problem in studying the process of gene expression. String and matrix representations are two popular models to represent a motif. However, both representations share an important weakness by assuming that the occurrence of a nucleotide in a binding site is independent of other nucleotides. More complicated representations, such as HMM or regular expression, exist that can capture the nucleotide dependency. Unfortunately, these models are not practical (with too many parameters and require many known binding sites). Recently, Chin and Leung introduced the SPSP representation which overcomes the limitations of these complicated models. However, discovering novel motifs in SPSP representation is still a NPhard problem. In this paper, based on our observations in real binding sites, we propose a simpler model, the Dependency Pattern Sets (DPS) representation, which is simpler than the SPSP model but can still capture the nucleotide dependency. We develop a branch and bound algorithm (DPSFinder) for finding optimal DPS motifs. Experimental results show that DPSFinder can discover a length10 motif from 22 length500 DNA sequences within a few minutes and the DPS representation has a similar performance as SPSP representation. 1