Results 1  10
of
16
The gappedfactor tree
, 2006
"... Abstract. We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
Abstract. We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in O(n × Σ) time and space, with n the length of the text and Σ  the size of the alphabet. Such a data structure may play an important role in some pattern matching and motif inference problems, for instance in text filtration.
An efficient multicore implementation of planted motif problem
 In Proceedings of the International Conference On High Performance Computing and Simulation
, 2010
"... In this paper we propose a parallel algorithm for the planted motif problem that arises in computational biology. A variety of algorithms have been proposed in the literature to solve this problem. The drawback of all these algorithms is that they have been designed to work on serial computers; an ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
In this paper we propose a parallel algorithm for the planted motif problem that arises in computational biology. A variety of algorithms have been proposed in the literature to solve this problem. The drawback of all these algorithms is that they have been designed to work on serial computers; and are not suitable for parallelization on current multicore architectures. We have implemented the proposed algorithm on a 4 QuadCore Intel Xeon X5550 2.67GHz processor for a total of 16 cores. We compare our performance results with the best performance results reported in the literature; and showed that the performance of our algorithm scales linearly with the number of cores. We also solved the (21, 8) challenging instance on 16 cores in 6.9 hrs.
Indexing gappedfactors using a tree
 INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE
, 2008
"... We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text wit ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration.
Suffix Tree Characterization of Maximal Motifs in Biological Sequences
"... Finding motifs in biological sequences is one of the most intriguing problems for string algorithms designers due to, on the one hand, the numerous applications of this problem in molecular biology and, on the other hand, the challenging aspects of the computational problem. Indeed, when dealing wit ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Finding motifs in biological sequences is one of the most intriguing problems for string algorithms designers due to, on the one hand, the numerous applications of this problem in molecular biology and, on the other hand, the challenging aspects of the computational problem. Indeed, when dealing with biological sequences it is necessary to work with approximations (that is, to identify fragments that are not necessarily identical, but just similar, according to a given similarity notion) and this complicates the problem. Existing algorithms run in time linear with respect to the input size. Nevertheless, the output size can be very large due to the approximation (namely exponential in the approximation degree). This often makes the output unreadable, next to slowing down the inference itself. A high degree of redundancy has been detected in the set of motifs that satisfy traditional requirements, even for exact motifs. Moreover, it has been observed many times that only a subset of these motifs, namely the maximal motifs, could be enough to provide the information of all of them. In this paper, we aim at removing such redundancy. We extend some notions of maximality already defined for exact motifs to the case of approximate motifs with Hamming distance, and we give a characterization of maximal motifs on the suffix tree. Given that this data structure is used by a whole class of motif extraction tools, we show how these tools can be modified to include the maximality requirement without changing the asymptotical complexity.
An Affinity PropagationBased DNA Motif Discovery Algorithm
"... The planted ( , ) motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challe ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
The planted ( , ) motif search (PMS) is one of the fundamental problems in bioinformatics, which plays an important role in locating transcription factor binding sites (TFBSs) in DNA sequences. Nowadays, identifying weak motifs and reducing the effect of local optimum are still important but challenging tasks for motif discovery. To solve the tasks, we propose a new algorithm, APMotif, which first applies the Affinity Propagation (AP) clustering in DNA sequences to produce informative and good candidate motifs and then employs Expectation Maximization (EM) refinement to obtain the optimal motifs from the candidate motifs. Experimental results both on simulated data sets and real biological data sets show that APMotif usually outperforms four other widely used algorithms in terms of high prediction accuracy.
Combinatorial and Probabilistic Approaches to Motif Recognition
, 2010
"... Short substrings of genomic data that are responsible for biological processes, such as gene expression, are referred to as motifs. Motifs with the same function may not entirely match, due to mutation events at a few of the motif positions. Allowing for nonexact occurrences significantly complica ..."
Abstract
 Add to MetaCart
Short substrings of genomic data that are responsible for biological processes, such as gene expression, are referred to as motifs. Motifs with the same function may not entirely match, due to mutation events at a few of the motif positions. Allowing for nonexact occurrences significantly complicates their discovery. Given a number of DNA strings, the motif recognition problem is the task of detecting motif instances in every given sequence without knowledge of the position of the instances or the pattern shared by these substrings. We describe a novel approach to motif recognition, and provide theoretical and experimental results that demonstrate its efficiency and accuracy. Our algorithm, MCLWMR, builds an edgeweighted graph model of the given motif recognition problem and uses a graph clustering algorithm to quickly determine important subgraphs that need to be
Sublinear Selection Algorithms for Motif Finding
"... ABSTRACT We consider the problem of identifying motifs, recurring or conserved patterns, in the sets of biological sequences. To solve this task, we present new deterministic and exact algorithms for finding patterns that are embedded as exact or inexact instances in all or most of the input string ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT We consider the problem of identifying motifs, recurring or conserved patterns, in the sets of biological sequences. To solve this task, we present new deterministic and exact algorithms for finding patterns that are embedded as exact or inexact instances in all or most of the input strings. The proposed algorithms (1) improve search efficiency compared to existing exact algorithms by focusing search on a selected set of potential motif instances, and (2) scale well with the input length and the size of alphabet. While a variety of exact and probabilistic methods exist, our algorithms enhance pattern detection ability of these methods by (1) applying as a wrapper speedup mechanism to a variety of common exact enumerationbased pattern finders, allowing to search for longer, less conserved motifs, (2) combining with probabilistic pattern finders as candidate selectors and accelerating search for pattern models. Our algorithms are orders of magnitude faster than existing exact algorithms for common pattern identification. We evaluate our algorithms on benchmark motif finding problems and real applications in biological sequence analysis and show that our algorithms exhibit significant running time improvements compared to the stateoftheart approaches.
International Journal of Foundations of Computer Science c ○ World Scientific Publishing Company Indexing gappedfactors using a tree
"... Communicated by Editor’s name We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all th ..."
Abstract
 Add to MetaCart
(Show Context)
Communicated by Editor’s name We present a data structure to index a specific kind of factors, that is of substrings, called gappedfactors. A gappedfactor is a factor containing a gap that is ignored during the indexation. The data structure presented is based on the suffix tree and indexes all the gappedfactors of a text with a fixed size of gap, and only those. The construction of this data structure is done online in linear time and space. Such a data structure may play an important role in various pattern matching and motif inference problems, for instance in text filtration.
Additional File 1 “GRISOTTO: A greedy approach to improve combinatorial algorithms for motif discovery with prior knowledge”
"... 1.2 GRISOTTO subroutine calling RISOTTO.................... 2 2 Intermotif distance 4 ..."
Abstract
 Add to MetaCart
(Show Context)
1.2 GRISOTTO subroutine calling RISOTTO.................... 2 2 Intermotif distance 4