Results 1  10
of
100
Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm
 BIOINFORMATICS
, 1998
"... Motivation: The discovery of motifs in biological sequences is an important problem. Results: This paper presents a new algorithm for the discovery of rigid patterns (motifs) in biological sequences. Our method is combinatorial in nature and able to produce all patterns that appear in at least a (us ..."
Abstract

Cited by 165 (11 self)
 Add to MetaCart
Motivation: The discovery of motifs in biological sequences is an important problem. Results: This paper presents a new algorithm for the discovery of rigid patterns (motifs) in biological sequences. Our method is combinatorial in nature and able to produce all patterns that appear in at least a (userdefined) minimum number of sequences, yet it manages to be very efficient by avoiding the enumeration of the entire pattern space. Furthermore, the reported patterns are maximal: any reported pattern cannot be made more specific and still keep on appearing at the exact same positions within the input sequences. The effectiveness of the proposed approach is showcased on a number of test cases which aim to: (i) validate the approach through the discovery of previously reported patterns; (ii) demonstrate the capability to identify automatically highly selective patterns particular to the sequences under consideration. Finally, experimental analysis indicates that the algorithm is output sensitive, i.e. its running time is quasilinear to the size of the generated output.
Dynamic programming algorithms for RNA secondary structure prediction with pseudoknots
 Discrete Applied Mathematics
, 2000
"... structure prediction with pseudoknots ..."
Improving the Practical Space and Time Efficiency of the ShortestPaths Approach to SumofPairs Multiple Sequence Alignment
, 1996
"... The MSA program, written and distributed in 1989, is one of the few existing programs that attempts to find optimal alignments of multiple protein or DNA sequences. The MSA program implements a branchandbound technique together with a variant of Dijkstra's shortest paths algorithm to prune the bas ..."
Abstract

Cited by 61 (4 self)
 Add to MetaCart
The MSA program, written and distributed in 1989, is one of the few existing programs that attempts to find optimal alignments of multiple protein or DNA sequences. The MSA program implements a branchandbound technique together with a variant of Dijkstra's shortest paths algorithm to prune the basic dynamic programming graph. We have made substantial improvements in the time and space usage of MSA. The improvements make feasible a variety of problem instances that were not feasible previously. On some runs we achieve an order of magnitude reduction in space usage and a significant multiplicative factor speedup in running time. To explain that these improvements work, we give a much more detailed description of MSA than has been previously available. In practice, MSA rarely produces a provably optimal alignment and we explain why.
On the Complexity of Loop Fusion
 Parallel Computing
, 1999
"... Loop fusion is a program transformation that combines several loops into one. It is used in parallelizing compilers mainly for increasing the granularity of loops and for improving data reuse. The goal of this paper is to study, from a theoretical point of view, several variants of the loop fusion p ..."
Abstract

Cited by 48 (2 self)
 Add to MetaCart
Loop fusion is a program transformation that combines several loops into one. It is used in parallelizing compilers mainly for increasing the granularity of loops and for improving data reuse. The goal of this paper is to study, from a theoretical point of view, several variants of the loop fusion problem identifying polynomially solvable cases and NPcomplete cases and to make the link between these problems and some scheduling problems that arise from completely dierent areas. We study, among others, the fusion of loops of dierent types, and the fusion of loops when combined with loop shifting. Key words: Parallelization, loop fusion, loop distribution, complexity. 1 Introduction Loop fusion is a program transformation that collapses several loops into one. The resulting program compaction and the corresponding increase in the size of the loop body has several wellknown impacts on the performances of a program [18]. It was rst used to reduce the cost of loop bound testing....
Algorithms and Complexity for Annotated Sequence Analysis
, 1999
"... Molecular biologists use algorithms that compare and otherwise analyze sequences that represent genetic and protein molecules. Most of these algorithms, however, operate on the basic sequence and do not incorporate the additional information that is often known about the molecule and its pieces. Thi ..."
Abstract

Cited by 42 (1 self)
 Add to MetaCart
Molecular biologists use algorithms that compare and otherwise analyze sequences that represent genetic and protein molecules. Most of these algorithms, however, operate on the basic sequence and do not incorporate the additional information that is often known about the molecule and its pieces. This research describes schemes to combinatorially annotate this information onto sequences so that it can be analyzed in tandem with the sequence; the overall result would thus reflect both types of information about the sequence. These annotation schemes include adding colours and arcs to the sequence. Colouring a sequence would produce a samelength sequence of colours or other symbols that highlight or label parts of the sequence. Arcs can be used to link sequence symbols (or coloured substrings) to indicate molecular bonds or other relationships. Adding these annotations to sequence analysis problems such as sequence alignment or finding the longest common subsequence can make the problem more complex, often depending on the complexity of the annotation scheme. This research examines the different annotation schemes and the corresponding problems of verifying annotations, creating annotations, and finding the longest common subsequence of pairs of sequences with annotations. This work involves both the conventional complexity framework and parameterized complexity, and includes algorithms and hardness results for both frameworks. Automata and transducers are created for some annotation verification and creation problems. Different restrictions on layered substring and arc annotation are considered to de iii termine what properties an annotation scheme must have to make its incorporation feasible. Extensions to the algorithms that use weighting schemes are explored. Examin...
The Parameterized Complexity of Sequence Alignment and Consensus
, 1994
"... The Longest common subsequence problem is examined from the point of view of parameterized computational complexity. There are several different ways in which parameters enter the problem, such as the number of sequences to be analyzed, the length of the common subsequence, and the size of the alpha ..."
Abstract

Cited by 37 (13 self)
 Add to MetaCart
The Longest common subsequence problem is examined from the point of view of parameterized computational complexity. There are several different ways in which parameters enter the problem, such as the number of sequences to be analyzed, the length of the common subsequence, and the size of the alphabet. Lower bounds on the complexity of this basic problem imply lower bounds on a number of other sequence alignment and consensus problems. At issue in the theory of parameterized complexity is whether a problem which takes input (x; k) can be solved in time f(k) \Delta n ff where ff is independent of k (termed fixedparameter tractability). It can be argued that this is the appropriate asymptotic model of feasible computability for problems for which a small range of parameter values covers important applications  a situation which certainly holds for many problems in biological sequence analysis. Our main results show that: (1) The Longest Common Subsequence (LCS) parameterized by t...
Longest Common Subsequences
 In Proc. of 19th MFCS, number 841 in LNCS
, 1994
"... . The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore ..."
Abstract

Cited by 29 (1 self)
 Add to MetaCart
. The length of a longest common subsequence (LLCS) of two or more strings is a useful measure of their similarity. The LLCS of a pair of strings is related to the `edit distance', or number of mutations /errors/editing steps required in passing from one string to the other. In this talk, we explore some of the combinatorial properties of the suband supersequence relations, survey various algorithms for computing the LLCS, and introduce some results on the expected LLCS for pairs of random strings. 1 Introduction The set \Sigma of finite strings over an unordered finite alphabet \Sigma admits of several natural partial orders. Some, such as the substring, prefix, and suffix relations, depend on contiguity and lead to many interesting combinatorial questions with practical applications to stringmatching. An excellent survey is given by Aho in [1]. In this talk however we will focus on the `subsequence' partial order. We say that u = u 1 \Delta \Delta \Delta um is a subsequence of ...
The emergence of pattern discovery techniques in computational biology
 Metabolic Engineering
, 2000
"... In the past few years, pattern discovery has been emerging as a generic tool of choice for tackling problems from the computational biology domain. In this presentation, and after defining the problem in its generality, we review some of the algorithms that have appeared in the literature and descri ..."
Abstract

Cited by 28 (4 self)
 Add to MetaCart
In the past few years, pattern discovery has been emerging as a generic tool of choice for tackling problems from the computational biology domain. In this presentation, and after defining the problem in its generality, we review some of the algorithms that have appeared in the literature and describe several applications of pattern discovery to problems from computational biology. 2000 Academic Press 1.
Merging Separately Generated Plans with Restricted Interactions
, 1992
"... Generating action sequences to achieve a set of goals is a computationally difficult task. When multiple goals are present, the problem is even worse. Although many solutions to this problem have been discussed in the literature, practical solutions focus on the use of restricted mechanisms for plan ..."
Abstract

Cited by 27 (8 self)
 Add to MetaCart
Generating action sequences to achieve a set of goals is a computationally difficult task. When multiple goals are present, the problem is even worse. Although many solutions to this problem have been discussed in the literature, practical solutions focus on the use of restricted mechanisms for planning or the application of domain dependent heuristics for providing rapid solutions (i.e. domaindependent planning). One previously proposed technique for handling multiple goals efficiently is to design a planner or even a set of planners (usually domaindependent) that can be used to generate separate plans for each goal. The outputs are typically either restricted to be independent and then concatenated into a single global plan, or else they are merged together using complex heuristic techniques. In this paper we explore a set of limitations, less restrictive than the assumption of independence, that still allow for the efficient merging of separate plans using straightforward algorith...
On the Approximation of Protein Threading
 Theoretical Computer Science
, 1997
"... In this paper, we study the protein threading problem, which was proposed for finding a folded 3D protein structure from an amino acid sequence. Since this problem was already proved to be NPhard by Lathrop, we study polynomial time approximation algorithms. First we show that the protein threading ..."
Abstract

Cited by 27 (2 self)
 Add to MetaCart
In this paper, we study the protein threading problem, which was proposed for finding a folded 3D protein structure from an amino acid sequence. Since this problem was already proved to be NPhard by Lathrop, we study polynomial time approximation algorithms. First we show that the protein threading problem is MAX SNPhard. Next we show that the protein threading problem can be approximated within a factor 4 for a special case in which a graph representing interaction between residues (amino acids) is planar. This case corresponds to a fisheet substructure, which appears in most protein structures. 1 Introduction The protein folding problem is, given an amino acid sequence (a string), to find its correctly folded 3D protein structure. It is one of the most important computational problems in molecular biology. Although this problem can be defined as a minimization problem, it is too hard to be solved directly. Recently, an indirect approach called inverse folding was proposed [2, 4,...