Results 1 - 10
of
20
XFA: Faster signature matching with extended automata
- In IEEE Symposium on Security and Privacy
, 2008
"... Automata-based representations and related algorithms have been applied to address several problems in information security, and often the automata had to be augmented with additional information. For example, extended finite-state automata (EFSA) augment finitestate automata (FSA) with variables to ..."
Abstract
-
Cited by 15 (6 self)
- Add to MetaCart
Automata-based representations and related algorithms have been applied to address several problems in information security, and often the automata had to be augmented with additional information. For example, extended finite-state automata (EFSA) augment finitestate automata (FSA) with variables to track dependencies between arguments of system calls. In this paper, we introduce extended finite automata (XFAs) which augment FSAs with finite scratch memory and instructions to manipulate this memory. Our primary motivation for introducing XFAs is signature matching in Network Intrusion Detection Systems (NIDS). Representing NIDS signatures as deterministic finite-state automata (DFAs) results in very fast signature matching but for several classes of signatures DFAs can blowup in space. Using nondeterministic finite-state automata (NFA) to represent NIDS signatures results in a succinct representation but at the expense of higher time complexity for signature matching. In other words, DFAs are time-efficient but space-inefficient, and NFAs are spaceefficient but time-inefficient. In our experiments we have noticed that for a large class of NIDS signatures XFAs have time complexity similar to DFAs and space complexity similar to NFAs. For our test set, XFAs use 10 times less memory than a DFA-based solution, yet achieve 20 times higher matching speeds. 1.
Efficient Submatch Addressing for Regular Expressions
, 2001
"... String pattern matching in its different forms is an important topic in theoretical computer science. This thesis concentrates on the problem of regular expression matching with submatch addressing, where the position and extent of the substrings matched by given subexpressions must be provided. The ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
String pattern matching in its different forms is an important topic in theoretical computer science. This thesis concentrates on the problem of regular expression matching with submatch addressing, where the position and extent of the substrings matched by given subexpressions must be provided. The algorithms in widespread use at the time either take exponential worst-case time to find a match, can handle only a subset of all regular expressions, or use space proportional to the length of the input string where constant space would suffice. This thesis proposes a new method for solving the submatch addressing problem using nondeterministic finite automata with transitions augmented by copy-on-write update operations. The resulting algorithm makes a single pass over the input string, always using time linearly proportional to the input. Space consumption depends only on the used regular expression, and not on the input string. To the author's knowledge, this is a new result. A prototype of a POSIX.2 compatible regular expression matcher using the algorithm was done. Benchmarking results indicate that the prototype compares favorably against some popular implementations. Furthermore, absence of exponential or polynomial time worst cases makes it possible to use any regular expression without performance problems, which is not the case with previous implementations or algorithms.
Multi-User File System Search
, 2007
"... Information retrieval research usually deals with globally visible, static document collections. Practical applications, in contrast, like file system search and enterprise search, have to cope with highly dynamic text collections and have to take into account user-specific access permissions when ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Information retrieval research usually deals with globally visible, static document collections. Practical applications, in contrast, like file system search and enterprise search, have to cope with highly dynamic text collections and have to take into account user-specific access permissions when generating the results to a search query. The goal of this thesis is to close the gap between information retrieval research and the requirements exacted by these real-life applications. The algorithms and data structures presented in this thesis can be used to implement a file system search engine that is able to react to changes in the file system by updating its index data in real time. File changes (in-sertions, deletions, or modifications) are reflected by the search results within a few seconds,
Regular Expression Matching on Graphics Hardware for Intrusion Detection
- In RAID 2009
, 2009
"... Abstract. The expressive power of regular expressions has been often exploited in network intrusion detection systems, virus scanners, and spam filtering applications. However, the flexible pattern matching functionality of regular expressions in these systems comes with significant overheads in ter ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract. The expressive power of regular expressions has been often exploited in network intrusion detection systems, virus scanners, and spam filtering applications. However, the flexible pattern matching functionality of regular expressions in these systems comes with significant overheads in terms of both memory and CPU cycles, since every byte of the inspected input needs to be processed and compared against a large set of regular expressions. In this paper we present the design, implementation and evaluation of a regular expression matching engine running on graphics processing units (GPUs). The significant spare computational power and data parallelism capabilities of modern GPUs permits the efficient matching of multiple inputs at the same time against a large set of regular expressions. Our evaluation shows that regular expression matching on graphics hardware can result to a 48 times speedup over traditional CPU implementations and up to 16 Gbit/s in processing throughput. We demonstrate the feasibility of GPU regular expression matching by implementing it in the popular Snort intrusion detection system, which results to a 60 % increase in the packet processing throughput. 1
Complex Event Detection at Wire Speed with FPGAs
"... Complex event detection is an advanced form of data stream processing where the stream(s) are scrutinized to identify given event patterns. The challenge for many complex event processing (CEP) systems is to be able to evaluate event patterns on high-volume data streams while adhering to realtime co ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Complex event detection is an advanced form of data stream processing where the stream(s) are scrutinized to identify given event patterns. The challenge for many complex event processing (CEP) systems is to be able to evaluate event patterns on high-volume data streams while adhering to realtime constraints. To solve this problem, in this paper we present a hardware based complex event detection system implemented on field-programmable gate arrays (FPGAs). By inserting the FPGA directly into the data path between the network interface and the CPU, our solution can detect complex events at gigabit wire speed with constant and fully predictable latency, independently of network load, packet size or data distribution. This is a significant improvement over CPU based systems and an architectural approach that opens up interesting opportunities for hybrid stream engines that combine the flexibility of the CPU with the parallelism and processing power of FPGAs. 1.
Exploitation of Similarity and Pattern Matching in XML Technologies (Technical Report)
"... Abstract. As XML technologies have undoubtedly become a standard for data representation, it is inevitable to provide efficient implementations of W3C recommendations. A possible optimization of particular types of techniques can be found in exploitation of similarity of XML data and/or matching of ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. As XML technologies have undoubtedly become a standard for data representation, it is inevitable to provide efficient implementations of W3C recommendations. A possible optimization of particular types of techniques can be found in exploitation of similarity of XML data and/or matching of XML patterns. In this paper we provide an overview and classification of such techniques from various points of view. We also briefly describe the best known representatives of particular ideas and we discuss their key advantages and disadvantages. The text should serve as a good starting point for proposing an appropriate similarity-based optimization. 1
A Text Pattern-Matching Tool based on Parsing Expression Grammars
- Software - Practice and Experience
"... This is a preprint of an article accepted for publication in Software: Practice and Experience; ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This is a preprint of an article accepted for publication in Software: Practice and Experience;
Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts ∗
, 2007
"... We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck. 1
Efficient earley parsing with regular right-hand sides
- In Workshop on Language Descriptions Tools and Applications
, 2009
"... We present a new variant of the Earley parsing algorithm capable of efficiently supporting context-free grammars with regular right hand-sides. We present the core state-machine driven algorithm, the translation of grammars into state machines, and the reconstruction algorithm. We also include a the ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
We present a new variant of the Earley parsing algorithm capable of efficiently supporting context-free grammars with regular right hand-sides. We present the core state-machine driven algorithm, the translation of grammars into state machines, and the reconstruction algorithm. We also include a theoretical framework for presenting the algorithm and for evaluating optimizations. Finally, we evaluate the algorithm by testing its implementation. Key words: Context-free grammars, Earley parsing, regular right sides, scannerless parsing, transducers, augmented transition networks 1

