Fast and simple character classes and bounded gaps pattern matching, with application to protein searching
 Journal of Computational Biology
, 2001
"... The problem of fast exact and approximate searching for a pattern that contains classes of characters and bounded size gaps (CBG) in a text has a wide range of applications, among which a very important one is protein pattern matching (for instance, one PROSITE protein site is associated with the CB ..."
Cited by 23 (4 self)
The problem of fast exact and approximate searching for a pattern that contains classes of characters and bounded size gaps (CBG) in a text has a wide range of applications, among which a very important one is protein pattern matching (for instance, one PROSITE protein site is associated with the CBG [RK]  x(2,3)  [DE]  x(2,3)  Y, where the brackets match any of the letters inside, and x(2,3) a gap of length between 2 and 3). Currently, the only way to search for a CBG in a text is to convert it into a full regular expression (RE). However, a RE is more sophisticated than a CBG, and searching for it with a RE pattern matching algorithm complicates the search and makes it slow. This is the reason why we design in this article two new practical CBG matching algorithms that are much simpler and faster than all the RE search techniques. The first one looks exactly once at each text character. The second one does not need to consider all the text characters, and hence it is usually faster than the first one, but in bad cases may have to read the same text character more than once. We then propose a criterion based on the form of the CBG to choose a priori the fastest between both. We also show how to search permitting a few mistakes in the occurrences. We performed many practical experiments using the PROSITE database, and all of them show that our algorithms are the fastest in virtually all cases.
Fast and compact regular expression matching, 2005. Submitted to a journal. Preprint availiable at arxiv.org/cs/0509069
"... We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmicsized words to be manipulated in constant time. We show how to imp ..."
Cited by 19 (4 self)
We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmicsized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way.
Matching a Set of Strings with Variable Length Don’t Cares, Theoretical Computer Science 178
, 1997
"... Given an alphabet A, a pattern p is a sequence (vl,...,vm) of words from A * called keywords. We represent p as a single word ..."
Cited by 17 (4 self)
Given an alphabet A, a pattern p is a sequence (vl,...,vm) of words from A * called keywords. We represent p as a single word
Generating optimal monitors for extended regular expressions
 In Proc. of the 3rd Workshop on Runtime Verification (RV’03), volume 89 of ENTCS
, 2003
"... Ordinary software engineers and programmers can easily understand regular patterns, as shown by the immense interest in and the success of scripting languages like Perl, based essentially on regular expression pattern matching. We believe that regular expressions provide an elegant and powerful spec ..."
Cited by 17 (7 self)
Ordinary software engineers and programmers can easily understand regular patterns, as shown by the immense interest in and the success of scripting languages like Perl, based essentially on regular expression pattern matching. We believe that regular expressions provide an elegant and powerful specification language also for monitoring requirements, because an execution trace of a program is in fact a string of states. Extended regular expressions (EREs) add complementation to regular expressions, which brings additional benefits by allowing one to specify patterns that must not occur during an execution. Complementation gives one the power to express patterns on strings more compactly. In this paper we present a technique to generate optimal monitors from EREs. Our monitors are deterministic finite automata (DFA) and our novel contribution is to generate them using a modern coalgebraic technique called coinduction. Based on experiments with our implementation, which can be publicly tested and used over the web, we believe that our technique is more efficient than the simplistic method based on complementation of automata which can quickly lead to a highlyexponential state explosion.
Fast Regular Expression Search
, 1999
"... . We present a new algorithm to search regular expressions, which is able to skip text characters. The idea is to determine the minimum length ` of a string matching the regular expression, manipulate the original automaton so that it recognizes all the reverse prefixes of length up to ` of the stri ..."
Cited by 14 (10 self)
. We present a new algorithm to search regular expressions, which is able to skip text characters. The idea is to determine the minimum length ` of a string matching the regular expression, manipulate the original automaton so that it recognizes all the reverse prefixes of length up to ` of the strings accepted, and use it to skip text characters as done for exact string matching in previous work. As we show experimentally, the resulting algorithm is fast, the fastest one in many cases of interest. 1 Introduction The need to search for regular expressions arises in many textbased applications, such as text retrieval, text editing and computational biology, to name a few. A regular expression is a generalized pattern composed of (i) basic strings, (ii) union, concatenation and Kleene closure of other regular expressions. Readers unfamiliar with the concept and terminology related to regular expressions are referred to a classical book such as [1]. The traditional technique [16] to sea...
Regular Expression Searching on Compressed Text
 Journal of Discrete Algorithms
, 2003
"... We present a solution to the problem of regular expression searching on compressed text. ..."
Cited by 13 (1 self)
We present a solution to the problem of regular expression searching on compressed text.
Compact DFA Representation for Fast Regular Expression Search
, 2001
"... . We present a new technique to encode a deterministic finite automaton (DFA). Based on the specific properties of Glushkov's nondeterministic finite automaton (NFA) construction algorithm, we are able to encode the DFA using (m + 1)(2 m+1 + j\Sigma j) bits, where m is the number of characters ..."
Cited by 10 (6 self)
. We present a new technique to encode a deterministic finite automaton (DFA). Based on the specific properties of Glushkov's nondeterministic finite automaton (NFA) construction algorithm, we are able to encode the DFA using (m + 1)(2 m+1 + j\Sigma j) bits, where m is the number of characters (excluding operator symbols) in the regular expression and \Sigma is the alphabet. This compares favorably against the worst case of (m+1)2 m+1 j\Sigma j bits needed by a classical DFA representation and m(2 2m+1 + j\Sigma j) bits needed by the Wu and Manber approach implemented in Agrep. Our approach is practical and simple to implement, and it permits searching regular expressions of moderate size (which include most cases of interest) faster than with any previously existing algorithm, as we show experimentally. 1
Runtime Resolution of Feature Interactions in Evolving Telecommunications Systems
, 2002
"... Feature interactions in telecommunications is an active research area. Many approaches to solve the socalled feature interaction problem have been proposed. However, all these approaches consider feature interaction as a somewhat isolated problem, in particular it is not seen in the context of evol ..."
Cited by 10 (7 self)
Feature interactions in telecommunications is an active research area. Many approaches to solve the socalled feature interaction problem have been proposed. However, all these approaches consider feature interaction as a somewhat isolated problem, in particular it is not seen in the context of evolving legacy systems and third party features in a deregulated market environment. An exception is the approach by Marples and Magill [MM98, Mar00], which presents an interaction detection mechanism and an essentially manual resolution approach. We develop an automatic resolution approach that can be integrated with Marples and Magill’s detection mechanism. We distinguish two key concepts, namely solutions and resolutions. The former are essentially possible behaviours of the system, they are not qualified as desirable or undesirable, the latter are the desirable solutions. Our approach allows for automatic removal of undesired behaviour and selection of the “best ” desired behaviour. The correctness, complexity and suitability of our approach are analysed. Two case studies support these more theoretical considerations. Our approach is transferable to other areas, such as quality of service management, and is not restricted to network architectures with a single point of control.
Flexible pattern matching
 Journal of Applied Statistics
, 2002
"... An important subtask of the pattern discovery process is pattern matching, where the pattern sought is already known and we want to determine how often and where it occurs in a sequence. In this paper we review the most practical techniques to find patterns of different kinds. We show how regular ex ..."
Cited by 9 (0 self)
An important subtask of the pattern discovery process is pattern matching, where the pattern sought is already known and we want to determine how often and where it occurs in a sequence. In this paper we review the most practical techniques to find patterns of different kinds. We show how regular expressions can be searched for with general techniques, and how simpler patterns can be dealt with more simply and efficiently. We consider exact as well as approximate pattern matching. Also, we cover both sequential searching, where the sequence cannot be preprocessed, and indexed searching, where we have a data structure built over the sequence to speed up the search. 1
A Unified View to String Matching Algorithms
 IN PROC. THEORY AND PRACTICE OF INFORMATICS (SOFSEM'96), LNCS 1175
, 1996
"... We present a unified view to sequential algorithms for many pattern matching problems, using a finite automaton built from the pattern which uses the text as input. We show the limitations of deterministic finite automata (DFA) and the advantages of using a bitwise simulation of nondeterminis ..."
Cited by 8 (3 self)
We present a unified view to sequential algorithms for many pattern matching problems, using a finite automaton built from the pattern which uses the text as input. We show the limitations of deterministic finite automata (DFA) and the advantages of using a bitwise simulation of nondeterministic finite automata (NFA). This approach gives very fast practical algorithms which have good complexity for small patterns on a RAM machine with word length O(log n), where n is the size of the text. For generalized string matching the time complexity is O(mn= log n) which for small patterns is linear. For approximate string matching we show that the two main known approaches to the problem are variations of the NFA simulation. For this case we present a different simulation technique which gives a running time of O(n) independently of the maximum number of errors allowed, k, for small patterns. This algorithm improves the best bitwise or comparison based algorithms of running ti...