Results 1  10
of
101
A Guided Tour to Approximate String Matching
 ACM Computing Surveys
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract

Cited by 447 (38 self)
 Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems. 1
A New Approach to Text Searching
"... We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with classes of symbols, don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the feature ..."
Abstract

Cited by 237 (15 self)
 Add to MetaCart
We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with classes of symbols, don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of these algorithms are that they don't need to buffer the input, they are real time algorithms (for constant size patterns), and they are suitable to be implemented in hardware. 1 Introduction String searching is a very important component of many problems, including text editing, bibliographic retrieval, and symbol manipulation. Recent surveys of string searching can be found in [17, 4]. The string matching problem consists of finding all occurrences of a pattern of length m in a text of length n. We generalize the problem allowing "don't care" symbols, the complement of a symbol, and any finite class of symbols. We solve this problem for one or more patterns, with or without mismatches. Fo...
A fast algorithm for multipattern searching
, 1994
"... A new algorithm to search for multiple patterns at the same time is presented. The algorithm is faster than previous algorithms and can support a very large number — tens of thousands — of patterns. Several applications of the multipattern matching problem are discussed. We argue that, in addition ..."
Abstract

Cited by 126 (2 self)
 Add to MetaCart
(Show Context)
A new algorithm to search for multiple patterns at the same time is presented. The algorithm is faster than previous algorithms and can support a very large number — tens of thousands — of patterns. Several applications of the multipattern matching problem are discussed. We argue that, in addition to previous applications that required such search, multipattern matching can be used in lieu of indexed or sorted data in some applications involving small to medium size datasets. Its advantage, of course, is that no additional search structure is needed.
Deterministic MemoryEfficient String Matching Algorithms for Intrusion Detection
 In IEEE Infocom, Hong Kong
"... Intrusion Detection Systems (IDSs) have become widely recognized as powerful tools for identifying, deterring and deflecting malicious attacks over the network. Essential to almost every intrusion detection system is the ability to search through packets and identify content that matches known attac ..."
Abstract

Cited by 110 (5 self)
 Add to MetaCart
(Show Context)
Intrusion Detection Systems (IDSs) have become widely recognized as powerful tools for identifying, deterring and deflecting malicious attacks over the network. Essential to almost every intrusion detection system is the ability to search through packets and identify content that matches known attacks. Space and time efficient string matching algorithms are therefore important for identifying these packets at line rate.
Algorithms to accelerate multiple regular expressions matching for deep packet inspection
 In Proceedings of the Annual Conference of the ACM Special Interest Group on Data Communication (SIGCOMM’06
, 2006
"... There is a growing demand for network devices capable of examining the content of data packets in order to improve network security and provide applicationspecific services. Most high performance systems that perform deep packet inspection implement simple string matching algorithms to match packet ..."
Abstract

Cited by 64 (1 self)
 Add to MetaCart
(Show Context)
There is a growing demand for network devices capable of examining the content of data packets in order to improve network security and provide applicationspecific services. Most high performance systems that perform deep packet inspection implement simple string matching algorithms to match packets against a large, but finite set of strings. However, there is growing interest in the use of regular expressionbased pattern matching, since regular expressions offer superior expressive power and flexibility. Deterministic finite automata (DFA) representations are typically used to implement regular expressions. However, DFA representations of regular expression sets arising in network applications require large amounts of memory, limiting their practical application. In this paper, we introduce a new representation for regular
Fast and Practical Approximate String Matching
 In Combinatorial Pattern Matching, Third Annual Symposium
, 1992
"... We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searchin ..."
Abstract

Cited by 55 (0 self)
 Add to MetaCart
(Show Context)
We present new algorithms for approximate string matching based in simple, but efficient, ideas. First, we present an algorithm for string matching with mismatches based in arithmetical operations that runs in linear worst case time for most practical cases. This is a new approach to string searching. Second, we present an algorithm for string matching with errors based on partitioning the pattern that requires linear expected time for typical inputs. 1 Introduction Approximate string matching is one of the main problems in combinatorial pattern matching. Recently, several new approaches emphasizing the expected search time and practicality have appeared [3, 4, 27, 32, 31, 17], in contrast to older results, most of them are only of theoretical interest. Here, we continue this trend, by presenting two new simple and efficient algorithms for approximate string matching. First, we present an algorithm for string matching with k mismatches. This problem consists of finding all instances o...
Dynamic Dictionary Matching
, 1993
"... We consider the dynamic dictionary matching problem. We are given a set of pattern strings (the dictionary) that can change over time; that is, we can insert a new pattern into the dictionary or delete a pattern from it. Moreover, given a text string, we must be able to find all occurrences of any p ..."
Abstract

Cited by 47 (8 self)
 Add to MetaCart
We consider the dynamic dictionary matching problem. We are given a set of pattern strings (the dictionary) that can change over time; that is, we can insert a new pattern into the dictionary or delete a pattern from it. Moreover, given a text string, we must be able to find all occurrences of any pattern of the dictionary in the text. Let D 0 be the empty dictionary. We present an algorithm that performs any sequence of the following operations in the given time bounds: (1) insert(p; D i01 ): Insert pattern p[1; m] into the dictionary D i01 . D i is the dictionary after the operation. The time complexity is O(m log jD i j). (2) delete(p; D i01 ): Delete pattern p[1; m] from the dictionary D i01 . D i is the dictionary after the operation. The time complexity is O(m log jD i01 j). (3) search(t; D i ): Search text t[1; n] for all occurrences of the patterns of dictionary D i . The time complexity is O((n + tocc) log jD i j), where tocc is the total number of occurrences of patterns i...
NRgrep: A Fast and Flexible Pattern Matching Tool
 Software Practice and Experience (SPE
, 2000
"... We present nrgrep ("nondeterministic reverse grep"), a new pattern matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bitparallel simulation of a non ..."
Abstract

Cited by 37 (7 self)
 Add to MetaCart
(Show Context)
We present nrgrep ("nondeterministic reverse grep"), a new pattern matching tool designed for efficient search of complex patterns. Unlike previous tools of the grep family, such as agrep and Gnu grep, nrgrep is based on a single and uniform concept: the bitparallel simulation of a nondeterministic suffix automaton. As a result, nrgrep can find from simple patterns to regular expressions, exactly or allowing errors in the matches, with an efficiency that degrades smoothly as the complexity of the searched pattern increases. Another concept fully integrated into nrgrep and that contributes to this smoothness is the selection of adequate subpatterns for fast scanning, which is also absent in many current tools. We show that the efficiency of nrgrep is similar to that of the fastest existing string matching tools for the simplest patterns, and by far unpaired for more complex patterns.
Approximate multiple string search
 In Proc. CPM'96, LNCS 1075
, 1996
"... Abstract. This paper presents a fast algorithm for searching a large text for multiple strings allowing one error. On a fast workstation, the algorithm can process a megabyte of text searching for 1000 patterns (with one error) in less than a second. Although we combine several interesting techniq ..."
Abstract

Cited by 35 (0 self)
 Add to MetaCart
(Show Context)
Abstract. This paper presents a fast algorithm for searching a large text for multiple strings allowing one error. On a fast workstation, the algorithm can process a megabyte of text searching for 1000 patterns (with one error) in less than a second. Although we combine several interesting techniques, overall the algorithm is not deep theoretically. The emphasis of this paper is on the experimental side of algorithm design. We show the importance of careful design, experimentation, and utilization of current architectures. In particular, we discuss the issues of locality and cache performance, fast hash functions, and incremental hashing techniques. We introduce the notion of twolevel hashing, which utilizes cache behavior to speed up hashing, especially in cases where unsuccessful searches are not uncommon. Twolevel hashing may be useful for many other applications. The end result is also interesting by itself. We show that multiple search with one error is fast enough for most text applications. 1.
Fast ContentBased Packet Handling for Intrusion Detection
, 2001
"... It is becoming increasingly common for network devices to handle packets based on the contents of packet payloads. Example applications include intrusion detection, firewalls, web proxies, and layer seven switches. This paper analyzes the problem of intrusion detection and its reliance on fast strin ..."
Abstract

Cited by 34 (0 self)
 Add to MetaCart
(Show Context)
It is becoming increasingly common for network devices to handle packets based on the contents of packet payloads. Example applications include intrusion detection, firewalls, web proxies, and layer seven switches. This paper analyzes the problem of intrusion detection and its reliance on fast string matching in packets. We show that the problem can be restructured to allow the use of more efficient string matching algorithms that operate on sets of patterns in parallel. We then introduce and analyze a new string matching algorithm that has averagecase performance that is better than AhoCorasick, a popular lineartime algorithm and much better than the iterative use of BoyerMoore currently used in the popular intrusion detection platform Snort. We then measure the actual performance of several search algorithms on actual packet traces and rulesets. Our results provide lessons on the structuring of contentbased handlers, string matching algorithms in general, and the importance of performance to security.