Results 1 - 10
of
13
A Double Combinatorial Approach to Discovering Patterns in Biological Sequences
- Combinatorial Pattern Matching, volume 1075 of Lecture Notes in Computer Science
"... We present in this paper an algorithm for finding degenerated common features by multiple comparison of a set of biological sequences (nucleic acids or proteins). The features that are of interest to us are words in the sequences. The algorithm uses the concept of a model we introduced earlier for l ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
We present in this paper an algorithm for finding degenerated common features by multiple comparison of a set of biological sequences (nucleic acids or proteins). The features that are of interest to us are words in the sequences. The algorithm uses the concept of a model we introduced earlier for locating these features. A model can be seen as a generalization of a consensus pattern as defined by Waterman [42]. It is an object against which the words in the sequences are compared and which serves as an identifier for the groups of similar ones. The algorithm given here innovates in relation to our previous work in that the models are defined over what we call a weighted combinatorial cover. This is a collection of sets among all possible subsets of the alphabet \Sigma of nucleotides or amino acids, including the wild card f\Sigmag, with a weight attached to each of these sets indicating the number of times it may appear in a model. In this way, we explore both the space of models and ...
Finding maximal pairs with bounded gap
- Proceedings of the 10th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 1645 of Lecture Notes in Computer Science
, 1999
"... A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this pape ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
A pair in a string is the occurrence of the same substring twice. A pair is maximal if the two occurrences of the substring cannot be extended to the left and right without making them different. The gap of a pair is the number of characters between the two occurrences of the substring. In this paper we present methods for finding all maximal pairs under various constraints on the gap. In a string of length n we can find all maximal pairs with gap in an upper and lower bounded interval in time O(n log n + z) where z is the number of reported pairs. If the upper bound is removed the time reduces to O(n+z). Since a tandem repeat is a pair where the gap is zero, our methods can be seen as a generalization of finding tandem repeats. The running time of our methods equals the running time of well known methods for finding tandem repeats.
Approximate string matching in musical sequences
- In Proceedings of the Prague Stringology Conference
, 2001
"... Abstract. Here we consider computational problems on ffi-approximate and(ffi; fl)-approximate string matching. These are two new notions of approximate matching that arise naturally in applications of computer assisted music analy-sis. We present fast, efficient and practical algorithms for these tw ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
Abstract. Here we consider computational problems on ffi-approximate and(ffi; fl)-approximate string matching. These are two new notions of approximate matching that arise naturally in applications of computer assisted music analy-sis. We present fast, efficient and practical algorithms for these two notions of approximate string matching.
Identifying Satellites and Periodic Repetitions in Biological Sequences
, 1998
"... We present in this paper an algorithm for identifying satellites in DNA sequences. Satellites (simple, micro, or mini) are repeats in number between 30 and as many as 1,000,000 whose lengths vary between 2 and hundreds of base pairs and that appear, with some mutations, in tandem along the sequenc ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We present in this paper an algorithm for identifying satellites in DNA sequences. Satellites (simple, micro, or mini) are repeats in number between 30 and as many as 1,000,000 whose lengths vary between 2 and hundreds of base pairs and that appear, with some mutations, in tandem along the sequence. We concentrate here on short to moderately long (up to 30-40 base pairs) approximate tandem repeats where copies may di#er up to # = 15-20% from a consensus model of the repeating unit (implying individual units may vary by 2# from each other). The algorithm is composed of two parts. The first one consists of a filter that basically eliminates all regions whose probability of containing a satellite is less than one in 10 4 when # = 10%. The second part realizes an exhaustive exploration of the space of all possible models for the repeating units present in the sequence. It therefore has the advantage over previous work of being able to report a consensus model, say m, of the repe...
Approximate String Matching with Gaps
, 2002
"... In this paper we consider several new versions of approximate string matching with gaps. The main characteristic of these new versions is the existence of gaps in the matching of a given pattern in a text. Algorithms are sketched for each version and their time and space complexity is stated. The sp ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
In this paper we consider several new versions of approximate string matching with gaps. The main characteristic of these new versions is the existence of gaps in the matching of a given pattern in a text. Algorithms are sketched for each version and their time and space complexity is stated. The specific versions of approximate string matching have various applications in computerized music analysis.
Computing Approximate Repetitions in Musical Sequences
- In Proceedings of Prague Stringology Club Workshop PSCW’00
, 2000
"... . Here we present new algorithms for computing all -approximate and ( ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
. Here we present new algorithms for computing all -approximate and (
Multiple Sequence Comparison and Consistency on Multipartite Graphs
- Adv. Appl. Math
, 1995
"... Calculation of dot-matrices is a widespread tool in biological sequence comparison. As a visual aid they are used in pairwise sequence comparison but so far have been of little help in the simultaneous comparison of several sequences. Viewing dot-matrices as projections of unknown n-dimensional poin ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Calculation of dot-matrices is a widespread tool in biological sequence comparison. As a visual aid they are used in pairwise sequence comparison but so far have been of little help in the simultaneous comparison of several sequences. Viewing dot-matrices as projections of unknown n-dimensional points we consider the multiple alignment problem (for n sequences) as an n-dimensional image reconstruction problem with noise. We model this situation using a multipartite graph and introduce a notion of "consistency" on such a graph. From this perspective we introduce and develop the filtering method due to Vingron and Argos (J. Mol. Biol. (1991), 218, pp. 33-43). We discuss a conjecture of theirs regarding the number of iterations their algorithm requires and demonstrate that this number may be large. An improved version of the original algorithm is introduced that avoids costly dotmatrix multiplications and runs in O(n 3 \Delta L 3 ) time (L is the length of the longest sequence and n i...
MIN-Graph: A Tool for Monitoring and Visualizing MIN-based Multiprocessor Performance
- J. Parallel Distrib. Comput
, 1993
"... A Multistage Interconnection Network (MIN) makes it possible to build large-scale sharedmemory multiprocessor systems. To provide insight into dynamic system performance, we have developed an integrated data collection, analysis, and data visualization environment for a MINbased multiprocessor syste ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
A Multistage Interconnection Network (MIN) makes it possible to build large-scale sharedmemory multiprocessor systems. To provide insight into dynamic system performance, we have developed an integrated data collection, analysis, and data visualization environment for a MINbased multiprocessor system, called MIN-Graph. MIN-Graph is a graphical instrumentation monitor to aid users in investigating performance problems and in determining an effective way of exploiting the high performance capabilities of interconnection network multiprocessor systems. Our monitor measures, analyzes, evaluates and displays the events, performance and overhead of interprocessor communication, process scheduling, remote-memory access, network contention and others on a MIN-based multiprocessor. The graphical monitor is X-window based and implemented on the BBN GP1000 and the BBN TC2000. This work has been supported in part by the National Science Foundation under research grants CCR-9008991, CCR-9102854,...
Computational Biology
, 2000
"... During four years of arduous service, a Ph. D. student is expected to familiarise himself with his field of research, and, hopefully, contribute to this field. This is reflected by the division of this dissertation into two parts. Part I is a (partial) overview of the field of computational biology ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
During four years of arduous service, a Ph. D. student is expected to familiarise himself with his field of research, and, hopefully, contribute to this field. This is reflected by the division of this dissertation into two parts. Part I is a (partial) overview of the field of computational biology as I conceive it, an overview that is aimed at presenting the context for my contributions to the field of computational biology. These contributions are presented in part II as five independent articles
Three Heuristics for δ-Matching: δ-BM Algorithms
, 2002
"... We consider a version of pattern matching useful in processing large musical data: delta-matching, which consists in finding matches which are delta-approximate in the sense of the distance measured as maximum difference between symbols. The alphabet is an interval of integers, and the distance betw ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We consider a version of pattern matching useful in processing large musical data: delta-matching, which consists in finding matches which are delta-approximate in the sense of the distance measured as maximum difference between symbols. The alphabet is an interval of integers, and the distance between two symbols a, b is measured as |a - b|. We present delta-matching algorithms fast on the average providing that the pattern is "non-flat"and the alphabet interval is large. The pattern is "flat" if its structure does not vary substantially. We also consider (delta, gamma)-matching, where gamma is a bound on the total number of errors. The algorithms, named delta-BM1, delta-BM2 and delta-BM3 can be thought as members of the generalized Boyer-Moore family of algorithms. The algorithms are fast on average. This is the first paper on the subject, previously only "occurrence heuristics" have been considered. Our heuristics are much stronger and refer to larger parts of texts (not only to single positions). We use delta-versions of...

