Results 1  10
of
42
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract
 in Proceedings of the 32nd Annual ACM Symposium on the Theory of Computing
, 2000
"... Abstract. The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed al ..."
Abstract

Cited by 189 (17 self)
 Add to MetaCart
Abstract. The proliferation of online text, such as found on the World Wide Web and in online databases, motivates the need for spaceefficient text indexing methods that support fast string searching. We model this scenario as follows: Consider a text T consisting of n symbols drawn from a fixed alphabet Σ. The text T can be represented in n lg Σ  bits by encoding each symbol with lg Σ  bits. The goal is to support fast online queries for searching any string pattern P of m symbols, with T being fully scanned only once, namely, when the index is created at preprocessing time. The text indexing schemes published in the literature are greedy in terms of space usage: they require Ω(n lg n) additional bits of space in the worst case. For example, in the standard unit cost RAM, suffix trees and suffix arrays need Ω(n) memory words, each of Ω(lg n) bits. These indexes are larger than the text itself by a multiplicative factor of Ω(lg Σ  n), which is significant when Σ is of constant size, such as in ascii or unicode. On the other hand, these indexes support fast searching, either in O(m lg Σ) timeorinO(m +lgn) time, plus an outputsensitive cost O(occ) for listing the occ pattern occurrences. We present a new text index that is based upon compressed representations of suffix arrays and suffix trees. It achieves a fast O(m / lg Σ  n +lgɛ Σ  n) search time in the worst case, for any constant
Text Retrieval: Theory and Practice
 In 12th IFIP World Computer Congress, volume I
, 1992
"... We present the state of the art of the main component of text retrieval systems: the searching engine. We outline the main lines of research and issues involved. We survey recently published results for text searching and we explore the gap between theoretical vs. practical algorithms. The main obse ..."
Abstract

Cited by 46 (14 self)
 Add to MetaCart
We present the state of the art of the main component of text retrieval systems: the searching engine. We outline the main lines of research and issues involved. We survey recently published results for text searching and we explore the gap between theoretical vs. practical algorithms. The main observation is that simpler ideas are better in practice. 1597 Shaks. Lover's Compl. 2 From off a hill whose concaue wombe reworded A plaintfull story from a sistring vale. OED2, reword, sistering 1 1 Introduction Full text retrieval systems are becoming a popular way of providing support for online text. Their main advantage is that they avoid the complicated and expensive process of semantic indexing. From the enduser point of view, full text searching of online documents is appealing because a valid query is just any word or sentence of the document. However, when the desired answer cannot be obtained with a simple query, the user must perform his/her own semantic processing to guess w...
Rotation of Periodic Strings and Short Superstrings
, 1996
"... This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous a ..."
Abstract

Cited by 26 (0 self)
 Add to MetaCart
This paper presents two simple approximation algorithms for the shortest superstring problem, with approximation ratios 2 2 3 ( 2:67) and 2 25 42 ( 2:596), improving the best previously published 2 3 4 approximation. The framework of our improved algorithms is similar to that of previous algorithms in the sense that they construct a superstring by computing some optimal cycle covers on the distance graph of the given strings, and then break and merge the cycles to finally obtain a Hamiltonian path, but we make use of new bounds on the overlap between two strings. We prove that for each periodic semiinfinite string ff = a1a2 \Delta \Delta \Delta of period q, there exists an integer k, such that for any (finite) string s of period p which is inequivalent to ff, the overlap between s and the rotation ff[k] = ak ak+1 \Delta \Delta \Delta is at most p+ 1 2 q. Moreover, if p q, then the overlap between s and ff[k] is not larger than 2 3 (p+q). In the previous shortes...
Speeding Up Pattern Matching By Text Compression
, 2000
"... Pattern matching is one of the most fundamental operations in string processing. Recently, a new trend for accelerating pattern matching has emerged: Speeding up pattern matching by text compression. From the traditional criteria for data compression, i.e., compression ratio and compression/decom ..."
Abstract

Cited by 26 (8 self)
 Add to MetaCart
Pattern matching is one of the most fundamental operations in string processing. Recently, a new trend for accelerating pattern matching has emerged: Speeding up pattern matching by text compression. From the traditional criteria for data compression, i.e., compression ratio and compression/decompression time, adaptive dictionary methods such as the LempelZiv family are often preferred. However, such methods cannot speed up the pattern matching since an extra work is needed to keep track of compression mechanism. We have to reexamine existing compression methods or develop a new method in the light of the new criterion: Efficiency of pattern matching in compressed text. Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is slow and the compression ratio is not as good as...
Experimental Results on String Matching Algorithms
, 1995
"... this paper we report experiments on eight different algorithms: the brute force algorithm, the BoyerMoore algorithm, the ApostolicoGiancarlo algorithm, the TurboBM algorithm, the BoyerMooreHorspool algorithm, the Quick Search algorithm, the Reverse Factor algorithm and the Turbo Reverse Fac ..."
Abstract

Cited by 25 (4 self)
 Add to MetaCart
this paper we report experiments on eight different algorithms: the brute force algorithm, the BoyerMoore algorithm, the ApostolicoGiancarlo algorithm, the TurboBM algorithm, the BoyerMooreHorspool algorithm, the Quick Search algorithm, the Reverse Factor algorithm and the Turbo Reverse Factor algorithm (respectively BF, BM, AG, TBM, BMH, QS, RF and TRF for short)
European Committee for Electrotechnical Standardization (CENELEC), http://www.cenelec.org
 In ICNP
, 2006
"... Abstract — The phenomenal growth of the Internet in the last decade and society’s increasing dependence on it has brought along, a flood of security attacks on the networking and computing infrastructure. Intrusion detection/prevention systems provide defenses against these attacks by monitoring hea ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Abstract — The phenomenal growth of the Internet in the last decade and society’s increasing dependence on it has brought along, a flood of security attacks on the networking and computing infrastructure. Intrusion detection/prevention systems provide defenses against these attacks by monitoring headers and payload of packets flowing through the network. Multiple string matching that can compare hundreds of string patterns simultaneously is a critical component of these systems, and is a wellstudied problem. Most of the string matching solutions today are based on the classic AhoCorasick algorithm, which has an inherent limitation; they can process only one input character in one cycle. As memory speed is not growing at the same pace as network speed, this limitation has become a bottleneck in the current network, having speeds of tens of gigabits per second. In this paper, we propose a novel multiple string matching algorithm that can process multiple characters at a time thus achieving multigigabit rate search speeds. We also propose an architecture for an efficient implementation on TCAMbased hardware. We additionally propose novel optimizations by making use of the properties of TCAMs to significantly reduce the memory requirements of the proposed algorithm. We finally present extensive simulation results of networkbased virus/worm detection using real signature databases to illustrate the effectiveness of the proposed scheme. 1 I.
Efficient Comparison Based String Matching
, 1992
"... We study the exact number of symbol comparisons that are required to solve the string matching problem and present a family of efficient algorithms. Unlike previous string matching algorithms, the algorithms in this family do not "forget" results of comparisons, what makes their analysis much sim ..."
Abstract

Cited by 11 (3 self)
 Add to MetaCart
We study the exact number of symbol comparisons that are required to solve the string matching problem and present a family of efficient algorithms. Unlike previous string matching algorithms, the algorithms in this family do not "forget" results of comparisons, what makes their analysis much simpler. In particular, we give a lineartime algorithm that finds all occurrences of a pattern of length m in a text of length n in n+d 4 log m+2 m (n \Gamma m)e comparisons. The pattern preprocessing takes linear time and makes at most 2m comparisons. This algorithm establishes that, in general, searching for a long pattern is easier than searching for a short one. We also show that any algorithm in the family of the algorithms presented must make at least n + blog mcb n\Gammam m c symbol comparisons, for m = 2 k \Gamma 1 and any integer k 1.
Watson–Crick conjugate and commutative words, in
 Proc. of DNA 13, in: LNCS
, 2008
"... Abstract. This paper is a theoretical study of notions in combinatorics of words motivated by information being encoded as DNA strands in DNA computing. We generalize the classical notions of conjugacy and commutativity of words to incorporate the notion of an involution function, a formalization of ..."
Abstract

Cited by 11 (10 self)
 Add to MetaCart
Abstract. This paper is a theoretical study of notions in combinatorics of words motivated by information being encoded as DNA strands in DNA computing. We generalize the classical notions of conjugacy and commutativity of words to incorporate the notion of an involution function, a formalization of the WatsonCrick complementarity of DNA singlestrands. We define and study properties of WatsonCrick conjugate and commutative words, as well as WatsonCrick palindromes. We obtain, for example, a complete characterization of the set of all words that are not WatsonCrick palindromes. Our results hold for more general functions, such as arbitrary morphic and antimorphic involutions. They generalize classical results in combinatorics of words, while formalizing concepts meaningful for DNA computing experiments. 1