Partial Derivatives of Regular Expressions and Finite Automata Constructions
 Theoretical Computer Science
, 1995
"... . We introduce a notion of a partial derivative of a regular expression. It is a generalization to the nondeterministic case of the known notion of a derivative invented by Brzozowski. We give a constructive definition of partial derivatives, study their properties, and employ them to develop a new ..."
. We introduce a notion of a partial derivative of a regular expression. It is a generalization to the nondeterministic case of the known notion of a derivative invented by Brzozowski. We give a constructive definition of partial derivatives, study their properties, and employ them to develop a new algorithm for turning regular expressions into relatively small NFA and to provide certain improvements to Brzozowski's algorithm constructing DFA. We report on a prototype implementation of our algorithm constructing NFA and present some examples. Introduction In 1964 Janusz Brzozowski introduced word derivatives of regular expressions and suggested an elegant algorithm turning a regular expression r into a deterministic finite automata (DFA); the main point of the algorithm is that the word derivatives of r serve as states of the resulting DFA [5]. In the following years derivatives were recognized as a quite useful and productive tool. Conway [8] uses derivatives to present various comp...
Rewriting Extended Regular Expressions
, 1993
"... We concider an extened algebra of regular events (languages) with intersection besides the usual operations. This algebra has the structure of a distributive lattice with monotonic operations; the latter property is crucial for some applications. We give a new complete Horn equational axiomatiztion ..."
We concider an extened algebra of regular events (languages) with intersection besides the usual operations. This algebra has the structure of a distributive lattice with monotonic operations; the latter property is crucial for some applications. We give a new complete Horn equational axiomatiztion of the algebra and develop some termrewriting techniques for constructing logical inferences of valid equations. A shorter version of this paper is to appear in the proceedings of Developments in Language Theory, Univ. of Turku, July 1993, published by World Scientific. The present version has been submitted for publication elsewhere. 1 Introduction In this paper we consider an extended algebra of regular events (languages) on a given alphabet with intersection besides the usual operations (union, concatenation, Kleene star, empty, and the regular unit). This algebra has the structure of a distributive lattice (join is union, meet is intersection) with only monotonic operations. The latte...
Efficient Submatch Addressing for Regular Expressions
, 2001
"... String pattern matching in its different forms is an important topic in theoretical computer science. This thesis concentrates on the problem of regular expression matching with submatch addressing, where the position and extent of the substrings matched by given subexpressions must be provided. The ..."
String pattern matching in its different forms is an important topic in theoretical computer science. This thesis concentrates on the problem of regular expression matching with submatch addressing, where the position and extent of the substrings matched by given subexpressions must be provided. The algorithms in widespread use at the time either take exponential worstcase time to find a match, can handle only a subset of all regular expressions, or use space proportional to the length of the input string where constant space would suffice. This thesis proposes a new method for solving the submatch addressing problem using nondeterministic finite automata with transitions augmented by copyonwrite update operations. The resulting algorithm makes a single pass over the input string, always using time linearly proportional to the input. Space consumption depends only on the used regular expression, and not on the input string. To the author's knowledge, this is a new result. A prototype of a POSIX.2 compatible regular expression matcher using the algorithm was done. Benchmarking results indicate that the prototype compares favorably against some popular implementations. Furthermore, absence of exponential or polynomial time worst cases makes it possible to use any regular expression without performance problems, which is not the case with previous implementations or algorithms.
Student Number: u4226371
, 2008
"... This research project examined automated methods for locating intended crossreferences and subsections in Australian legislation sourced as HTML/SGML from the World Wide Web. Methods that achieved the best results in this project are manually encoded regular expressions and Conditional Random Field ..."
This research project examined automated methods for locating intended crossreferences and subsections in Australian legislation sourced as HTML/SGML from the World Wide Web. Methods that achieved the best results in this project are manually encoded regular expressions and Conditional Random Fields relying on word token and regular expression feature sets. The two problems of locating subsections and locating crossreferences intended to link to those subsections were considered separately. Automated solutions for both these problems achieved 92 % or greater accuracy on test data from a labeled corpus. This report discusses previous work in this eld, potential future work, and other potential automated methods Hidden Markov Models and genetic algorithms. Automated methods are of value in the task of improving crossreferencing and subsection identi cation in complex legal documents. This in turn is
HELSINKI UNIVERSITY ABSTRACT OF OF TECHNOLOGY MASTER’S THESIS
"... Efficient submatch addressing for regular expressions ..."
MultiTildeBar Derivatives
"... Abstract. Multitildebar operators allow us to extend regular expressions. The associated extended expressions are compatible with the structure of Glushkov automata and they provide a more succinct representation than standard expressions. The aim of this paper is to examine the derivation of mult ..."
Abstract. Multitildebar operators allow us to extend regular expressions. The associated extended expressions are compatible with the structure of Glushkov automata and they provide a more succinct representation than standard expressions. The aim of this paper is to examine the derivation of multitildebar expressions. Two types of computation are investigated: Brzozowski derivation and Antimirov derivation, as well as the construction of the associated automata. 1
Word Problems Requiring Exponential Time ~ Preliminary Report
"... The equivalence problem for Kleene's regular expressions has several effective solutions, all of which are computationally inefficient. In [I], we showed that this inefficiency is an inherent pro ..."
The equivalence problem for Kleene's regular expressions has several effective solutions, all of which are computationally inefficient. In [I], we showed that this inefficiency is an inherent pro