Results 1  10
of
11
An Algorithm for Approximate Tandem Repeats
 In Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching (CPM), volume 684 of Lecture Notes in Computer Science
, 1993
"... A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd. ..."
Abstract

Cited by 75 (2 self)
 Add to MetaCart
A perfect single tandem repeat is defined as a nonempty string that can be divided into two identical substrings, e.g. abcabc. An approximate single tandem repeat is one in which the substrings are similar, but not identical, e.g. abcdaacd.
An Optimal O(log log n) Time Parallel Algorithm for Detecting all Squares in a String
, 1995
"... An optimal O(log log n) time concurrentread concurrentwrite parallel algorithm for detecting all squares in a string is presented. A tight lower bound shows that over general alphabets this is the fastest possible optimal algorithm. When p processors are available the bounds become \Theta(d n ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
An optimal O(log log n) time concurrentread concurrentwrite parallel algorithm for detecting all squares in a string is presented. A tight lower bound shows that over general alphabets this is the fastest possible optimal algorithm. When p processors are available the bounds become \Theta(d n log n p e + log log d1+p=ne 2p). The algorithm uses an optimal parallel stringmatching algorithm together with periodicity properties to locate the squares within the input string.
Efficient String Algorithmics
, 1992
"... Problems involving strings arise in many areas of computer science and have numerous practical applications. We consider several problems from a theoretical perspective and provide efficient algorithms and lower bounds for these problems in sequential and parallel models of computation. In the sequ ..."
Abstract

Cited by 8 (6 self)
 Add to MetaCart
Problems involving strings arise in many areas of computer science and have numerous practical applications. We consider several problems from a theoretical perspective and provide efficient algorithms and lower bounds for these problems in sequential and parallel models of computation. In the sequential setting, we present new algorithms for the string matching problem improving the previous bounds on the number of comparisons performed by such algorithms. In parallel computation, we present tight algorithms and lower bounds for the string matching problem, for finding the periods of a string, for detecting squares and for finding initial palindromes.
String Pattern Matching For A Deluge Survival Kit
, 2000
"... String Pattern Matching concerns itself with algorithmic and combinatorial issues related to matching and searching on linearly arranged sequences of symbols, arguably the simplest possible discrete structures. As unprecedented volumes of sequence data are amassed, disseminated and shared at an incr ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
String Pattern Matching concerns itself with algorithmic and combinatorial issues related to matching and searching on linearly arranged sequences of symbols, arguably the simplest possible discrete structures. As unprecedented volumes of sequence data are amassed, disseminated and shared at an increasing pace, effective access to, and manipulation of such data depend crucially on the efficiency with which strings are structured, compressed, transmitted, stored, searched and retrieved. This paper samples from this perspective, and with the authors' own bias, a rich arsenal of ideas and techniques developed in more than three decades of history.
Efficient String Matching on Coded Texts
 In Proceedings of Combinatorial Pattern Matching, 6th Annual Symposium (CPM'95
, 1994
"... The so called "four Russians technique" is often used to speed up algorithms by encoding several data items in a single memory cell. Given a sequence of n symbols over a constant size alphabet, one can encode the sequence into O(n=) memory cells in O(log ) time using n= log processors. This paper ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
The so called "four Russians technique" is often used to speed up algorithms by encoding several data items in a single memory cell. Given a sequence of n symbols over a constant size alphabet, one can encode the sequence into O(n=) memory cells in O(log ) time using n= log processors. This paper presents an efficient CRCWPRAM stringmatching algorithm for coded texts that takes O(log log(m=)) time 1 making only O(n=) operations, an improvement by a factor of = O(logn) on the number of operations used in previous algorithms. Using this stringmatching algorithm one can test if a string is squarefree and find all palindromes in a string in O(log log n) time using n= log log n processors. 1 Introduction In the stringmatching problem one is searching for occurrences of a pattern string P[1::m] in a text string T [1::n]. There exist several O(n + m) time sequential stringmatching algorithms that are used in a large variety of applications. Galil [23] published the first efficient...
Parallel String Matching Algorithms
, 1990
"... The string matching problem is one of the most studied problems in computer science. While it is very easily stated and many of the simple algorithms perform very well in practice, numerous works have been published on the subject and research is still very active. In this paper we survey recent ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
The string matching problem is one of the most studied problems in computer science. While it is very easily stated and many of the simple algorithms perform very well in practice, numerous works have been published on the subject and research is still very active. In this paper we survey recent results on parallel algorithms for the string matching problem.
On the Complexity of Computing the Order of Repetition of a String
, 1998
"... We show a simple O(n log n) time algorithm computing the order of repetition in a string. A parallel version of the algorithm works in O(log ..."
Abstract
 Add to MetaCart
We show a simple O(n log n) time algorithm computing the order of repetition in a string. A parallel version of the algorithm works in O(log
FaultTolerant Repeat Pattern Mining On Biological Data
, 2001
"... With the development of biotechnology, more and more biological data is collected and available for analysis. One example is the GenBank and Proteins data from NCBI (National Center for Biotechnology Information). There is huge amount of data available, including DNA sequences, RNA sequences and pro ..."
Abstract
 Add to MetaCart
With the development of biotechnology, more and more biological data is collected and available for analysis. One example is the GenBank and Proteins data from NCBI (National Center for Biotechnology Information). There is huge amount of data available, including DNA sequences, RNA sequences and protein sequences of all different species. And not much is known about this data. How can one extract the most interesting and knowledgeable patterns from that data which may guide us to more discoveries is an interesting task. In this thesis, we study a quite important problem in molecular biology, tandem repeat finding problem. Furthermore, we refine the problem to find the complete set of tandem repeat patterns and analyze the problem, investigate the properties that the tandem repeat patterns have, and propose two algorithms to solve it. Interesting patterns we found by using our algorithms TCD and LSD are proposed. Our performance studies show that our LSD algorithm is efficient and scalable. 4 Acknowledgments First, I would like to thank my senior supervisor, Dr. Jiawei Han, for inspiring me about data mining on biology data, an extremely interesting and exciting topic. He has given me tremendous encouragement and support during my studies and research. I would also like to thank my supervisor, Dr. Veronica Dahl and external examiner, Dr. WoShun Luk, for taking the time to read my thesis and give me valuable suggestions. I would also like to thank all the people in the intelligent database lab of the School of Computing Science at Simon Fraser University for their suggestion and friendship. Special thanks goes to Jian Pei and Ying Lu for their valuable discussion. And I also want to thank Jian Pei for his informative review of my thesis. Finally, my thanks go to my...
Detecting all Squares in a String ∗
"... is permitted for educational or research use on condition that this copyright notice is included in any copy. See back inner page for a list of recent publications in the BRICS Report Series. Copies may be obtained by contacting: BRICS ..."
Abstract
 Add to MetaCart
is permitted for educational or research use on condition that this copyright notice is included in any copy. See back inner page for a list of recent publications in the BRICS Report Series. Copies may be obtained by contacting: BRICS
Testing SquareFreeness of Strings Compressed by Balanced Straight Line Program
"... In this paper we study the problem of deciding whether a given compressed string contains a square. A string x is called a square if x = zz and z = uk implies k = 1 and u = z. A string w is said to be squarefree if no substrings of w are squares. Many efficient algorithms to test if a given string ..."
Abstract
 Add to MetaCart
In this paper we study the problem of deciding whether a given compressed string contains a square. A string x is called a square if x = zz and z = uk implies k = 1 and u = z. A string w is said to be squarefree if no substrings of w are squares. Many efficient algorithms to test if a given string is squarefree, have been developed so far. However, very little is known for testing squarefreeness of a given compressed string. In this paper, we give an O(max(n 2, n log 2 N))time O(n 2)space solution to test squarefreeness of a given compressed string, where n and N are the size of a given compressed string and the corresponding decompressed string, respectively. Our input strings are compressed by balanced straight line program (BSLP). We remark that BSLP has exponential compression, that is, N = O(2 n). Hence no decompressthentest approaches can be better than our method in the worst case.