Results 1 - 10
of
58
A Guided Tour to Approximate String Matching
- ACM Computing Surveys
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract
-
Cited by 306 (38 self)
- Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems. 1
A New Approach to Text Searching
"... We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with classes of symbols, don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of ..."
Abstract
-
Cited by 200 (14 self)
- Add to MetaCart
We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with classes of symbols, don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of these algorithms are that they don't need to buffer the input, they are real time algorithms (for constant size patterns), and they are suitable to be implemented in hardware. 1 Introduction String searching is a very important component of many problems, including text editing, bibliographic retrieval, and symbol manipulation. Recent surveys of string searching can be found in [17, 4]. The string matching problem consists of finding all occurrences of a pattern of length m in a text of length n. We generalize the problem allowing "don't care" symbols, the complement of a symbol, and any finite class of symbols. We solve this problem for one or more patterns, with or without mismatches. Fo...
A fast algorithm for multi-pattern searching
, 1994
"... A new algorithm to search for multiple patterns at the same time is presented. The algorithm is faster than previous algorithms and can support a very large number — tens of thousands — of patterns. Several applications of the multi-pattern matching problem are discussed. We argue that, in addition ..."
Abstract
-
Cited by 92 (2 self)
- Add to MetaCart
A new algorithm to search for multiple patterns at the same time is presented. The algorithm is faster than previous algorithms and can support a very large number — tens of thousands — of patterns. Several applications of the multi-pattern matching problem are discussed. We argue that, in addition to previous applications that required such search, multi-pattern matching can be used in lieu of indexed or sorted data in some applications involving small to medium size datasets. Its advantage, of course, is that no additional search structure is needed.
Deterministic Memory-Efficient String Matching Algorithms for Intrusion Detection
- In IEEE Infocom, Hong Kong
"... Intrusion Detection Systems (IDSs) have become widely recognized as powerful tools for identifying, deterring and deflecting malicious attacks over the network. Essential to almost every intrusion detection system is the ability to search through packets and identify content that matches known attac ..."
Abstract
-
Cited by 77 (4 self)
- Add to MetaCart
Intrusion Detection Systems (IDSs) have become widely recognized as powerful tools for identifying, deterring and deflecting malicious attacks over the network. Essential to almost every intrusion detection system is the ability to search through packets and identify content that matches known attacks. Space and time efficient string matching algorithms are therefore important for identifying these packets at line rate.
Fast and Practical Approximate String Matching
- In Combinatorial Pattern Matching, Third Annual Symposium
, 1992
"... We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searchin ..."
Abstract
-
Cited by 47 (0 self)
- Add to MetaCart
We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searching. Second, we present an algorithm for string matching with errors based on partitioning the pattern that requires linear expected time for typical inputs. 1 Introduction Approximate string matching is one of the main problems in combinatorial pattern matching. Recently, several new approaches emphasizing the expected search time and practicality have appeared [3, 4, 27, 32, 31, 17], in contrast to older results, most of them are only of theoretical interest. Here, we continue this trend, by presenting two new simple and efficient algorithms for approximate string matching. First, we present an algorithm for string matching with k mismatches. This problem consists of finding all instances o...
Dynamic Dictionary Matching
, 1993
"... We consider the dynamic dictionary matching problem. We are given a set of pattern strings (the dictionary) that can change over time; that is, we can insert a new pattern into the dictionary or delete a pattern from it. Moreover, given a text string, we must be able to find all occurrences of any p ..."
Abstract
-
Cited by 44 (8 self)
- Add to MetaCart
We consider the dynamic dictionary matching problem. We are given a set of pattern strings (the dictionary) that can change over time; that is, we can insert a new pattern into the dictionary or delete a pattern from it. Moreover, given a text string, we must be able to find all occurrences of any pattern of the dictionary in the text. Let D 0 be the empty dictionary. We present an algorithm that performs any sequence of the following operations in the given time bounds: (1) insert(p; D i01 ): Insert pattern p[1; m] into the dictionary D i01 . D i is the dictionary after the operation. The time complexity is O(m log jD i j). (2) delete(p; D i01 ): Delete pattern p[1; m] from the dictionary D i01 . D i is the dictionary after the operation. The time complexity is O(m log jD i01 j). (3) search(t; D i ): Search text t[1; n] for all occurrences of the patterns of dictionary D i . The time complexity is O((n + tocc) log jD i j), where tocc is the total number of occurrences of patterns i...
Algorithms to accelerate multiple regular expressions matching for deep packet inspection
- In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’06
, 2006
"... There is a growing demand for network devices capable of examining the content of data packets in order to improve network security and provide application-specific services. Most high performance systems that perform deep packet inspection implement simple string matching algorithms to match packet ..."
Abstract
-
Cited by 41 (1 self)
- Add to MetaCart
There is a growing demand for network devices capable of examining the content of data packets in order to improve network security and provide application-specific services. Most high performance systems that perform deep packet inspection implement simple string matching algorithms to match packets against a large, but finite set of strings. However, there is growing interest in the use of regular expression-based pattern matching, since regular expressions offer superior expressive power and flexibility. Deterministic finite automata (DFA) representations are typically used to implement regular expressions. However, DFA representations of regular expression sets arising in network applications require large amounts of memory, limiting their practical application. In this paper, we introduce a new representation for regular
NR-grep: A Fast and Flexible Pattern Matching Tool
- Software Practice and Experience (SPE
, 2000
"... We present nrgrep ("nondeterministic reverse grep"), a new pattern matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bit-parallel simulation of a nondeterminis ..."
Abstract
-
Cited by 36 (7 self)
- Add to MetaCart
We present nrgrep ("nondeterministic reverse grep"), a new pattern matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bit-parallel simulation of a nondeterministic suffix automaton. As a result, nrgrep can find from simple patterns to regular expressions, exactly or allowing errors in the matches, with an efficiency that degrades smoothly as the complexity of the searched pattern increases. Another concept fully integrated into nrgrep and that contributes to this smoothness is the selection of adequate subpatterns for fast scanning, which is also absent in many current tools. We show that the efficiency of nrgrep is similar to that of the fastest existing string matching tools for the simplest patterns, and by far unpaired for more complex patterns.
Fast Content-Based Packet Handling for Intrusion Detection
, 2001
"... It is becoming increasingly common for network devices to handle packets based on the contents of packet payloads. Example applications include intrusion detection, firewalls, web proxies, and layer seven switches. This paper analyzes the problem of intrusion detection and its reliance on fast strin ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
It is becoming increasingly common for network devices to handle packets based on the contents of packet payloads. Example applications include intrusion detection, firewalls, web proxies, and layer seven switches. This paper analyzes the problem of intrusion detection and its reliance on fast string matching in packets. We show that the problem can be restructured to allow the use of more efficient string matching algorithms that operate on sets of patterns in parallel. We then introduce and analyze a new string matching algorithm that has average-case performance that is better than AhoCorasick, a popular linear-time algorithm and much better than the iterative use of Boyer-Moore currently used in the popular intrusion detection platform Snort. We then measure the actual performance of several search algorithms on actual packet traces and rulesets. Our results provide lessons on the structuring of content-based handlers, string matching algorithms in general, and the importance of performance to security.
Approximate multiple string search
- In Proc. CPM'96, LNCS 1075
, 1996
"... Abstract. This paper presents a fast algorithm for searching a large text for multiple strings allowing one error. On a fast workstation, the algo-rithm can process a megabyte of text searching for 1000 patterns (with one error) in less than a second. Although we combine several interest-ing techniq ..."
Abstract
-
Cited by 28 (0 self)
- Add to MetaCart
Abstract. This paper presents a fast algorithm for searching a large text for multiple strings allowing one error. On a fast workstation, the algo-rithm can process a megabyte of text searching for 1000 patterns (with one error) in less than a second. Although we combine several interest-ing techniques, overall the algorithm is not deep theoretically. The emphasis of this paper is on the experimental side of algorithm design. We show the importance of careful design, experimentation, and utiliza-tion of current architectures. In particular, we discuss the issues of locality and cache performance, fast hash functions, and incremental hashing techniques. We introduce the notion of two-level hashing, which utilizes cache behavior to speed up hashing, especially in cases where unsuccessful searches are not uncommon. Two-level hashing may be useful for many other applications. The end result is also interesting by itself. We show that multiple search with one error is fast enough for most text applications. 1.

