Results 1  10
of
58
OneUnambiguous Regular Languages
 Information and computation
, 1997
"... The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic metalanguage for the definition of textual markup systems. In the standard, the righthand sides of productions are based on regular expressions, although only regular expressions that denote words unambigu ..."
Abstract

Cited by 101 (9 self)
 Add to MetaCart
The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic metalanguage for the definition of textual markup systems. In the standard, the righthand sides of productions are based on regular expressions, although only regular expressions that denote words unambiguously, in the sense of the ISO standard, are allowed. In general, a word that is denoted by a regular expression is witnessed by a sequence of occurrences of symbols in the regular expression that match the word. In an unambiguous regular expression as defined by Book, Even, Greibach, and Ott, each word has at most one witness. But the SGML standard also requires that a witness be computed incrementally from the word with a onesymbol lookahead; we call such regular expressions 1unambiguous. A regular language is a 1unambiguous language if it is denoted by some 1unambiguous regular expression. We give a Kleene theorem for 1unambiguous languages and characterize 1unambiguous regu...
Regular Expressions into Finite Automata
 Theoretical Computer Science
, 1996
"... It is a wellestablished fact that each regular expression can be transformed into a nondeterministic finite automaton (NFA) with or without ffltransitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi [BS86] have shown that the construction of ..."
Abstract

Cited by 64 (5 self)
 Add to MetaCart
It is a wellestablished fact that each regular expression can be transformed into a nondeterministic finite automaton (NFA) with or without ffltransitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi [BS86] have shown that the construction of an fflfree NFA due to to Glushkov [Glu61] is a natural representation of the regular expression, because it can be described in terms of the Brzozowski derivatives [Brz64] of the expression. Moreover, the Glushkov construction also plays a significant role in the document processing area: The SGML standard [ISO86], now widely adopted by publishing houses and government agencies for the syntactic specification of textual markup systems, uses deterministic regular expressions, i.e. expressions whose Glushkov automaton is deterministic, as a description language for document types. In this paper, we first show that the Glushkov automaton can be constructed in time quadratic in the size of the...
State Complexity of Basic Operations on Finite Languages
"... The state complexity of basic operations on regular languages has been studied in [911]. Here we focus on finite languages. We show that the catenation of two finite languages accepted by an m state and an nstate DFA, respectively, with m ? n is accepted by a DFA of (m \Gamma n + 3)2 n\Gamma2 ..."
Abstract

Cited by 29 (10 self)
 Add to MetaCart
The state complexity of basic operations on regular languages has been studied in [911]. Here we focus on finite languages. We show that the catenation of two finite languages accepted by an m state and an nstate DFA, respectively, with m ? n is accepted by a DFA of (m \Gamma n + 3)2 n\Gamma2 \Gamma 1 states in the twoletter alphabet case, and this bound is shown to be reachable. We also show that the tight upperbounds for the number of states of a DFA that accepts the star of an nstate finite language is 2 n\Gamma3 + 2 n\Gamma4 in the twoletter alphabet case. The same bound for reversal is 3 \Delta 2 p\Gamma1 \Gamma 1 when n is even and 2 p \Gamma 1 when n is odd. Results for alphabets of an arbitrary size are also obtained. These upperbounds for finite languages are strictly lower than the corresponding ones for general regular languages.
Standard Generalized Markup Language: Mathematical and Philosophical Issues
 Computer Science Today. Recent Trends and Developments
, 1995
"... . The Standard Generalized Markup Language (SGML), an ISO standard, has become the accepted method of defining markup conventions for text files. SGML is a metalanguage for defining grammars for textual markup in much the same way that BackusNaur Form is a metalanguage for defining programming ..."
Abstract

Cited by 22 (2 self)
 Add to MetaCart
. The Standard Generalized Markup Language (SGML), an ISO standard, has become the accepted method of defining markup conventions for text files. SGML is a metalanguage for defining grammars for textual markup in much the same way that BackusNaur Form is a metalanguage for defining programminglanguage grammars. Indeed, HTML, the method of marking up a hypertext documents for the World Wide Web, is an SGML grammar. The underlying assumptions of the SGML initiative are that a logical structure of a document can be identified and that it can be indicated by the insertion of labeled matching brackets (start and end tags). Moreover, it is assumed that the nesting relationships of these tags can be described with an extended contextfree grammar (the righthand sides of productions are regular expressions). In this survey of some of the issues raised by the SGML initiative, I reexamine the underlying assumptions and address some of the theoretical questions that SGML raises....
Rita  an Editor and User Interface for Manipulating Structured Documents
, 1991
"... This paper describes Rita, its user interface and some of its internal structure and algorithms, and relates anecdotal user experiences. Comparisons are also made with other commercial and experimental systems. ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
This paper describes Rita, its user interface and some of its internal structure and algorithms, and relates anecdotal user experiences. Comparisons are also made with other commercial and experimental systems.
Normal Form Algorithms for Extended ContextFree Grammars
 Theoretical Computer Science
, 2000
"... We investigate the complexity of a variety of normalform transformations for extended contextfree grammars, where by extended we mean that the set of righthand sides for each nonterminal in such a grammar is a regular set. The study is motivated by the implementation project GraMa which will p ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
We investigate the complexity of a variety of normalform transformations for extended contextfree grammars, where by extended we mean that the set of righthand sides for each nonterminal in such a grammar is a regular set. The study is motivated by the implementation project GraMa which will provide a C++ toolkit for the symbolic manipulation of contextfree objects just as Grail does for regular objects. Our results generalize known complexity bounds for contextfree grammars but do so in nontrivial ways. Specifically, we introduce a new representation scheme for extended contextfree grammars (the symbolthreaded expression forest), a new normal form for these grammars (dot normal form) and new regular expression algorithms. 1 Introduction In the 1960's, extended contextfree grammars were introduced, based on BackusNaur form, as a useful abbreviatory notation that made contextfree grammars easier to write. More recently, the Standardized General Markup Language (SGML...
The Validation of SGML Content Models
 MATHEMATICAL AND COMPUTER MODELLING
, 1997
"... The Standard Generalized Markup Language (SGML) is an ISO standard that provides a syntactic metalanguage for the definition of textual markup systems, which are used to indicate the structure of documents so that they can be electronically typeset, searched, and communicated. We address only on ..."
Abstract

Cited by 14 (8 self)
 Add to MetaCart
The Standard Generalized Markup Language (SGML) is an ISO standard that provides a syntactic metalanguage for the definition of textual markup systems, which are used to indicate the structure of documents so that they can be electronically typeset, searched, and communicated. We address only one problem raised by the standard, namely: In SGML, the righthand sides of contextfree productions are regular expressions, called content models, that are restricted to be what the standard calls "unambiguous," but what is more appropriately called deterministic. We solve the problem of how to define determinism precisely, how to recognize deterministic regular expressions efficiently, and how to recognize deterministic regular languages. Any SGML parser must check that a given document grammar conforms to the standard; that is, it must validate it. Hence, our results are an important step in the clarification of the standard and in the efficient implementation of an SGML parser fo...
The generalization of generalized automata: Expression automata
 International Journal of Foundations of Computer Science
, 2005
"... Abstract. We explore expression automata with respect to determinism, minimization and primeness. We define determinism of expression automata using prefixfreeness. This approach is, to some extent, similar to that of Giammarresi and Montalbano’s definition of deterministic generalized automata. We ..."
Abstract

Cited by 14 (11 self)
 Add to MetaCart
Abstract. We explore expression automata with respect to determinism, minimization and primeness. We define determinism of expression automata using prefixfreeness. This approach is, to some extent, similar to that of Giammarresi and Montalbano’s definition of deterministic generalized automata. We prove that deterministic expression automata languages are a proper subfamily of the regular languages. We define the minimization of deterministic expression automata. Lastly, we discuss prime prefixfree regular languages. Note that we have omitted almost all proofs in this preliminary version. 1
Obtaining shorter regular expressions from finitestate automata
 Theoretical Computer Science
, 2007
"... Abstract. We consider the use of state elimination to construct shorter regular expressions from finitestate automata. Although state elimination is an intuitive method for computing regular expressions from finitestate automata, the resulting regular expressions are often very long and complicated ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
Abstract. We consider the use of state elimination to construct shorter regular expressions from finitestate automata. Although state elimination is an intuitive method for computing regular expressions from finitestate automata, the resulting regular expressions are often very long and complicated. We examine the minimization of finitestate automata to obtain shorter expressions first. Then, we introduce vertical chopping based on bridge states and horizontal chopping based on the structural properties of given finitestate automata. We prove that we should not eliminate bridge states until we eliminate all nonbridge states to obtain shorter regular expressions. In addition, we suggest heuristics for state elimination that lead to shorter regular expressions based on vertical chopping and horizontal chopping. Note that we have omitted almost all proofs in this preliminary version. 1
Transformation of Structured Documents
, 1995
"... Structure definitions of documents have been used successfully for inputting and formatting in text processing systems. This report considers transformations between different representations of structured documents and studies possibilities to extend the use of structure definitions to document tra ..."
Abstract

Cited by 12 (4 self)
 Add to MetaCart
Structure definitions of documents have been used successfully for inputting and formatting in text processing systems. This report considers transformations between different representations of structured documents and studies possibilities to extend the use of structure definitions to document transformations and to discover algorithmic methods for carrying out transformations. Documents are presented as parse trees for contextfree grammars and transformations are made from parse tree to parse tree. First, the report describes differences of manuscript styles required by various scientific journals and presents a declarative classification for structure differences between two parse trees. Second, a set of tree transformation methods are described and their suitability for transformations between documents having a structure difference in each defined class is analyzed. For each class several methods may or must be used and only certain kinds of differences can be managed automatica...