Results 1  10
of
111
FiniteState Transducers in Language and Speech Processing
 Computational Linguistics
, 1997
"... Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducer ..."
Abstract

Cited by 361 (40 self)
 Add to MetaCart
Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducers. Transducers that output weights also play an important role in language and speech processing. We give a specific study of stringtoweight transducers, including algorithms for determinizing and minimizing these transducers very efficiently, and characterizations of the transducers admitting determinization and the corresponding algorithms. Some applications of these algorithms in speech recognition are described and illustrated. 1.
Adding structure to unstructured data
 In 6th Int. Conf. on Database Theory (ICDT ’97),LNCS 1186, 336–350
, 1997
"... We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that ..."
Abstract

Cited by 222 (23 self)
 Add to MetaCart
We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edgelabeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and e ciently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization. 1
XMill: an Efficient Compressor for XML Data
, 1999
"... We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogene ..."
Abstract

Cited by 201 (0 self)
 Add to MetaCart
We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogeneous XML data: it uses zlib, the library function for gzip, a collection of datatype specific compressors for simple data types, and, possibly, user defined compressors for application specific data types. 1 Introduction We have implemented a compressor/decompressor for XML data, to be used in data exchange and archiving, that achieves about twice the compression rate of generalpurpose compressors (gzip), at about the same speed. The tool can be downloaded from www.research.att.com/sw/tools/xmill/. XML is now being adopted by many organizations and industry groups, like the healthcare, banking, chemical, and telecommunications industries. The attraction in XML is that it is a selfdescribi...
Optimizing regular path expressions using graph schemas
 In Proceedings of the International Conference on Data Engineering
, 1998
"... ..."
OneUnambiguous Regular Languages
 Information and computation
, 1997
"... The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic metalanguage for the definition of textual markup systems. In the standard, the righthand sides of productions are based on regular expressions, although only regular expressions that denote words unambigu ..."
Abstract

Cited by 125 (9 self)
 Add to MetaCart
The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic metalanguage for the definition of textual markup systems. In the standard, the righthand sides of productions are based on regular expressions, although only regular expressions that denote words unambiguously, in the sense of the ISO standard, are allowed. In general, a word that is denoted by a regular expression is witnessed by a sequence of occurrences of symbols in the regular expression that match the word. In an unambiguous regular expression as defined by Book, Even, Greibach, and Ott, each word has at most one witness. But the SGML standard also requires that a witness be computed incrementally from the word with a onesymbol lookahead; we call such regular expressions 1unambiguous. A regular language is a 1unambiguous language if it is denoted by some 1unambiguous regular expression. We give a Kleene theorem for 1unambiguous languages and characterize 1unambiguous regu...
Partial Derivatives of Regular Expressions and Finite Automata Constructions
 Theoretical Computer Science
, 1995
"... . We introduce a notion of a partial derivative of a regular expression. It is a generalization to the nondeterministic case of the known notion of a derivative invented by Brzozowski. We give a constructive definition of partial derivatives, study their properties, and employ them to develop a new ..."
Abstract

Cited by 82 (0 self)
 Add to MetaCart
(Show Context)
. We introduce a notion of a partial derivative of a regular expression. It is a generalization to the nondeterministic case of the known notion of a derivative invented by Brzozowski. We give a constructive definition of partial derivatives, study their properties, and employ them to develop a new algorithm for turning regular expressions into relatively small NFA and to provide certain improvements to Brzozowski's algorithm constructing DFA. We report on a prototype implementation of our algorithm constructing NFA and present some examples. Introduction In 1964 Janusz Brzozowski introduced word derivatives of regular expressions and suggested an elegant algorithm turning a regular expression r into a deterministic finite automata (DFA); the main point of the algorithm is that the word derivatives of r serve as states of the resulting DFA [5]. In the following years derivatives were recognized as a quite useful and productive tool. Conway [8] uses derivatives to present various comp...
Regular Expressions into Finite Automata
 Theoretical Computer Science
, 1996
"... It is a wellestablished fact that each regular expression can be transformed into a nondeterministic finite automaton (NFA) with or without ffltransitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi [BS86] have shown that the construction of ..."
Abstract

Cited by 81 (5 self)
 Add to MetaCart
(Show Context)
It is a wellestablished fact that each regular expression can be transformed into a nondeterministic finite automaton (NFA) with or without ffltransitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi [BS86] have shown that the construction of an fflfree NFA due to to Glushkov [Glu61] is a natural representation of the regular expression, because it can be described in terms of the Brzozowski derivatives [Brz64] of the expression. Moreover, the Glushkov construction also plays a significant role in the document processing area: The SGML standard [ISO86], now widely adopted by publishing houses and government agencies for the syntactic specification of textual markup systems, uses deterministic regular expressions, i.e. expressions whose Glushkov automaton is deterministic, as a description language for document types. In this paper, we first show that the Glushkov automaton can be constructed in time quadratic in the size of the...
Minimization Algorithms for Sequential Transducers
, 2000
"... We present general algorithms for minimizing sequential finitestate transducers that output strings or numbers. The algorithms are shown to be efficient since in the case of acyclic transducers and for output strings they operate in O(S+E+V+(EV+F)x(Pmax+1)) steps, where S is the sum of ..."
Abstract

Cited by 58 (12 self)
 Add to MetaCart
We present general algorithms for minimizing sequential finitestate transducers that output strings or numbers. The algorithms are shown to be efficient since in the case of acyclic transducers and for output strings they operate in O(S+E+V+(EV+F)x(Pmax+1)) steps, where S is the sum of the lengths of all output labels of the resulting transducer, E the set of transitions of the given transducer, V the set of its states, F the set of final states, and Pmax one of the longest of the longest common prefixes of the output paths leaving each state of the transducer. The algorithms apply to a larger class of transducers which includes subsequential transducers.
Quotient complexity of regular languages
 J. Autom. Lang. Comb
, 2010
"... The past research on the state complexity of operations on regular languages is examined, and a new approach based on an old method (derivatives of regular expressions) is presented. Since state complexity is a property of a language, it is appropriate to define it in formallanguage terms as the nu ..."
Abstract

Cited by 33 (22 self)
 Add to MetaCart
(Show Context)
The past research on the state complexity of operations on regular languages is examined, and a new approach based on an old method (derivatives of regular expressions) is presented. Since state complexity is a property of a language, it is appropriate to define it in formallanguage terms as the number of distinct quotients of the language, and to call it “quotient complexity”. The problem of finding the quotient complexity of a language f(K,L) is considered, where K and L are regular languages and f is a regular operation, for example, union or concatenation. Since quotients can be represented by derivatives, one can find a formula for the typical quotient of f(K,L) in terms of the quotients of K and L. To obtain an upper bound on the number of quotients of f(K,L) all one has to do is count how many such quotients are possible, and this makes automaton constructions unnecessary. The advantages of this point of view are illustrated by many examples. Moreover, new general observations are presented to help in the estimation of the upper bounds on quotient complexity of regular operations. 1