Computation of repetitions and regularities on biological weighted sequences
 Journal of Molecular Biology
Abstract. Biological Weighted Sequences are used extensively in Molecular Biology as profiles for protein families, in the representation of binding sites and often for the representation of sequences produced by a shotgun sequencing strategy. In this paper we address three fundamental problems in the area of Biological Weighted Sequences: i) Computation of Repetitions, ii) Pattern Matching and iii) Computation of Regularities. To the best of our knowledge, this is the first time these problems are tackled in the relative literature. Our algorithms can be used as basic building blocks for more sophisticated algorithms applied on weighted sequences.? A preliminary form of the results in this paper were presented in the conferences Fun with Algorithms [Iliopoulos et al. 2004b], CompBionets [Christodoulakis et al. 2004a] and ICCMSE [Christodoulakis et al. 2004b].
STRING DATA STRUCTURES FOR COMPUTATIONAL MOLECULAR BIOLOGY
The topic of the chapter is string data structures with applications in the field of computational molecular biology. Let � be a finite alphabet consisting of a set of characters (or symbols). The cardinality of the alphabet denoted by �  expresses the number of distinct characters in the alphabet. A string or word is an ordered list
A DYNAMIC APPROACH TO WEIGHTED SUFFIX TREE CONSTRUCTION ALGORITHM
In present time weighted suffix tree is consider as a one of the most important existing data structure used for analyzing molecular weighted sequence. Although a static partitioning based parallel algorithm existed for the construction of weighted suffix tree, but for very long weighted DNA sequences it takes significant amount of time. However, in our implementation of dynamic partition based parallel weighted suffix tree construction algorithm on cluster computing makes it possible to significantly accelerate the construction of weighted suffix tree.
An Algorithmic Framework for Motif Discovery Problems in Weighted Sequences
Abstract. A weighted sequence is a string in which a set of characters may appear at each position with respective probabilities of occurrence. A common task is to locate a given motif in a weighted sequence in exact, approximate or bounded gap form, with presence probability not less than a given threshold. The motif could be a normal nonweighted string or even a string with don’t care symbols. We give an algorithmic framework that is capable of tackling above motif discovery problems. Utilizing the notion of maximal factors, the framework provides an approach for reducing each problem to equivalent problem in nonweighted strings without any time degradation. 1
A Multiobjective Approach to the Weighted Longest Common Subsequence Problem
Abstract. Finding the Longest Common Subsequence in Weighted Sequences (WLCS) is an important problem in computational biology and bioinformatics. In this paper, we model this problem as a multiobjective optimization problem. As a result, we propose a novel and efficient algorithm that not only finds a WLCS but also the set of all possible solutions. The time complexity of the algorithm depends primarily on the number of length1 common subsequences between the two input weighted sequences.