Results 1  10
of
27
Compressed suffix arrays and suffix trees with applications to text indexing and string matching
, 2005
"... The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. ..."
Abstract

Cited by 189 (17 self)
 Add to MetaCart
The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. The text T can be represented in n lg Σ  bits by encoding each symbol with lg Σ  bits. The goal is to support fast online queries for searching any string pattern P of m symbols, with T being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require Ω(n lg n) additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need Ω(n) memory words, each of Ω(lg n) bits. These indexes are larger than the text itself by a multiplicative factor of Ω(lg Σ  n), which is significant when Σ is of constant size, such as in ascii or unicode. On the other hand, these indexes support fast searching, either in O(m lg Σ) timeorinO(m +lgn) time, plus an outputsensitive cost O(occ) for listing the occ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast O(m / lg Σ  n +lgɛ Σ  n) search time in the worst case, for any constant
Efficient 2dimensional Approximate Matching of Halfrectangular Figures
, 1993
"... Efficient algorithms exist for the approximate two dimensional matching problem for rectangles. This is the problem of finding all occurrences of an m \Theta m pattern in an n \Theta n text with no more than k mismatch, insertion, and deletion errors. In computer vision it is important to general ..."
Abstract

Cited by 31 (11 self)
 Add to MetaCart
Efficient algorithms exist for the approximate two dimensional matching problem for rectangles. This is the problem of finding all occurrences of an m \Theta m pattern in an n \Theta n text with no more than k mismatch, insertion, and deletion errors. In computer vision it is important to generalize this problem to nonrectangular figures. We make progress towards this goal by defining halfrectangular figures of height m and area a. The approximate two dimensional matching problem for halfrectangular patterns can be solved using a dynamic programming approach in time O(an 2 ). We show an O(kn 2 p m log m p k log k + k 2 n 2 ) algorithm which combines convolutions with dynamic programming. Note that our algorithm is superior to previous known solutions for k m 1=3 . At the heart of the algorithm are the Smaller Matching Problem and the kAligned Ones with Location Problem. These are interesting problems in their own right. Efficient algorithms to solve both t...
Speeding Up Pattern Matching By Text Compression
, 2000
"... Pattern matching is one of the most fundamental operations in string processing. Recently, a new trend for accelerating pattern matching has emerged: Speeding up pattern matching by text compression. From the traditional criteria for data compression, i.e., compression ratio and compression/decom ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
Pattern matching is one of the most fundamental operations in string processing. Recently, a new trend for accelerating pattern matching has emerged: Speeding up pattern matching by text compression. From the traditional criteria for data compression, i.e., compression ratio and compression/decompression time, adaptive dictionary methods such as the LempelZiv family are often preferred. However, such methods cannot speed up the pattern matching since an extra work is needed to keep track of compression mechanism. We have to reexamine existing compression methods or develop a new method in the light of the new criterion: Efficiency of pattern matching in compressed text. Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is slow and the compression ratio is not as good as...
A Lower Bound for Parallel String Matching
 SIAM J. Comput
, 1993
"... This talk presents the derivation of an\Omega\Gamma/28 log m) lower bound on the number of rounds necessary for finding occurrences of a pattern string P [1::m] in a text string T [1::2m] in parallel using m comparisons in each round. The parallel complexity of the string matching problem using p ..."
Abstract

Cited by 25 (13 self)
 Add to MetaCart
This talk presents the derivation of an\Omega\Gamma/28 log m) lower bound on the number of rounds necessary for finding occurrences of a pattern string P [1::m] in a text string T [1::2m] in parallel using m comparisons in each round. The parallel complexity of the string matching problem using p processors for general alphabets follows. 1. Introduction Better and better parallel algorithms have been designed for stringmatching. All are on CRCWPRAM with the weakest form of simultaneous write conflict resolution: all processors which write into the same memory location must write the same value of 1. The best CREWPRAM algorithms are those obtained from the CRCW algorithms for a logarithmic loss of efficiency. Optimal algorithms have been designed: O(logm) time in [8, 17] and O(log log m) time in [4]. (An optimal algorithm is one with pt = O(n) where t is the time and p is the number of processors used.) Recently, Vishkin [18] developed an optimal O(log m) time algorithm. Unlike...
An Alphabet Independent Approach to Two Dimensional Matching
, 1994
"... There are many solutions to the string matching problem which are strictly linear in the input size and independent of alphabet size. Furthermore, the model of computation for these algorithms is very weak: they allow only simple arithmetic and comparisons of equality between characters of the in ..."
Abstract

Cited by 24 (8 self)
 Add to MetaCart
There are many solutions to the string matching problem which are strictly linear in the input size and independent of alphabet size. Furthermore, the model of computation for these algorithms is very weak: they allow only simple arithmetic and comparisons of equality between characters of the input. In contrast, algorithm for two dimensional matching have needed stronger models of computation, most notably assuming a totally ordered alphabet. The fastest algorithms for two dimensional matching have therefore had a logarithmic dependence on the alphabet size. In the worst case, this gives an algorithm that runs in O(n log m) with O(m log m) preprocessing.
Alphabet Dependence in Parameterized Matching
, 1993
"... The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set \Sigma. A recently introduced model is that of parameterized pattern matching; the main motivation for this scheme lies in software maintenance where pro ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
The classical pattern matching paradigm is that of seeking occurrences of one string in another, where both strings are drawn from an alphabet set \Sigma. A recently introduced model is that of parameterized pattern matching; the main motivation for this scheme lies in software maintenance where programs are considered "identical " even if variables are different. Strings, under this model, additionally have symbols from a variable set \Pi and occurrences of one string in the other up to a renaming of the variables are sought. In this paper we show that finding the occurrences of a mlength string in a n length string under the parameterized pattern matching paradigm can be done in time O(n log ß), where ß = min(m; j\Pij); that is, independent of j\Sigmaj. Additionally, we show that in general this dependence on j\Pij is inherent to any algorithm for this problem in the comparison model  that is, our algorithm is optimal.
Efficient 2dimensional approximate matching of nonrectangular figures
 Proc. of 2nd Symoposium on Descrete Algorithms
, 1991
"... Finding all occurrences of a nonrectangular pattern of height m and area a in an nn text with no more than k mismatch, insertion, and deletion errors is an important problem in computer vision. It can be solved using a dynamic programming approach in time O(an 2). We show a O(kn 2 # m log m # k log ..."
Abstract

Cited by 18 (7 self)
 Add to MetaCart
Finding all occurrences of a nonrectangular pattern of height m and area a in an nn text with no more than k mismatch, insertion, and deletion errors is an important problem in computer vision. It can be solved using a dynamic programming approach in time O(an 2). We show a O(kn 2 # m log m # k log k + k 2 n 2) algorithm which combines convolutions with dynamic programming. At the heart of the algorithm are the Smaller Matching Problem and the kAligned Ones with Location Problem. Efficient algorithms to solve both these problems are presented.
Two Dimensional Dictionary Matching
 Information Processing Letters
, 1992
"... Most traditional pattern matching algorithms solve the problem of finding all occurrences of a given pattern string P in a given text T . Another important paradigm is the dictionary matching problem. Let D = {P 1 , ..., P k } be the dictionary. We seek all locations of dictionary patterns that a ..."
Abstract

Cited by 12 (3 self)
 Add to MetaCart
Most traditional pattern matching algorithms solve the problem of finding all occurrences of a given pattern string P in a given text T . Another important paradigm is the dictionary matching problem. Let D = {P 1 , ..., P k } be the dictionary. We seek all locations of dictionary patterns that appear in a given text T .
An Optimal O(log log n) Time Parallel Algorithm for Detecting all Squares in a String
, 1995
"... An optimal O(log log n) time concurrentread concurrentwrite parallel algorithm for detecting all squares in a string is presented. A tight lower bound shows that over general alphabets this is the fastest possible optimal algorithm. When p processors are available the bounds become \Theta(d n ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
An optimal O(log log n) time concurrentread concurrentwrite parallel algorithm for detecting all squares in a string is presented. A tight lower bound shows that over general alphabets this is the fastest possible optimal algorithm. When p processors are available the bounds become \Theta(d n log n p e + log log d1+p=ne 2p). The algorithm uses an optimal parallel stringmatching algorithm together with periodicity properties to locate the squares within the input string.
Efficient Comparison Based String Matching
, 1992
"... We study the exact number of symbol comparisons that are required to solve the string matching problem and present a family of efficient algorithms. Unlike previous string matching algorithms, the algorithms in this family do not "forget" results of comparisons, what makes their analysi ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
We study the exact number of symbol comparisons that are required to solve the string matching problem and present a family of efficient algorithms. Unlike previous string matching algorithms, the algorithms in this family do not "forget" results of comparisons, what makes their analysis much simpler. In particular, we give a lineartime algorithm that finds all occurrences of a pattern of length m in a text of length n in n+d 4 log m+2 m (n \Gamma m)e comparisons. The pattern preprocessing takes linear time and makes at most 2m comparisons. This algorithm establishes that, in general, searching for a long pattern is easier than searching for a short one. We also show that any algorithm in the family of the algorithms presented must make at least n + blog mcb n\Gammam m c symbol comparisons, for m = 2 k \Gamma 1 and any integer k 1.