Results 1 - 10
of
56
Finite-State Transducers in Language and Speech Processing
- Computational Linguistics
, 1997
"... Finite-state machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential string-tostring transducer ..."
Abstract
-
Cited by 260 (39 self)
- Add to MetaCart
Finite-state machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential string-tostring transducers. Transducers that output weights also play an important role in language and speech processing. We give a specific study of string-to-weight transducers, including algorithms for determinizing and minimizing these transducers very efficiently, and characterizations of the transducers admitting determinization and the corresponding algorithms. Some applications of these algorithms in speech recognition are described and illustrated. 1.
Adding structure to unstructured data
- In 6th Int. Conf. on Database Theory (ICDT ’97),LNCS 1186, 336–350
, 1997
"... We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that ..."
Abstract
-
Cited by 195 (22 self)
- Add to MetaCart
We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edge-labeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and e ciently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization. 1
XMill: an Efficient Compressor for XML Data
, 1999
"... We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogene ..."
Abstract
-
Cited by 165 (0 self)
- Add to MetaCart
We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogeneous XML data: it uses zlib, the library function for gzip, a collection of datatype specific compressors for simple data types, and, possibly, user defined compressors for application specific data types. 1 Introduction We have implemented a compressor/decompressor for XML data, to be used in data exchange and archiving, that achieves about twice the compression rate of general-purpose compressors (gzip), at about the same speed. The tool can be downloaded from www.research.att.com/sw/tools/xmill/. XML is now being adopted by many organizations and industry groups, like the healthcare, banking, chemical, and telecommunications industries. The attraction in XML is that it is a self-describi...
Optimizing Regular Path Expressions Using Graph Schemas
, 1998
"... Several languages, such as LOREL and UnQL, support querying of semi-structured data. Others, such as WebSQL and WebLog, query Web sites. All these languages model data as labeled graphs and use regular path expressions to express queries that traverse arbitrary paths in graphs. Naive execution of pa ..."
Abstract
-
Cited by 136 (5 self)
- Add to MetaCart
Several languages, such as LOREL and UnQL, support querying of semi-structured data. Others, such as WebSQL and WebLog, query Web sites. All these languages model data as labeled graphs and use regular path expressions to express queries that traverse arbitrary paths in graphs. Naive execution of path expressions is inefficient, however, because it often requires exhaustive graph search. We describe two optimization techniques for queries with regular path expressions, which we call regular queries. Both rely on graph schemas, which specify partial knowledge of a graph's structure. Query pruning restricts search to a fragment of the graph; we give an efficient algorithm for rewriting any regular query into a pruned one. Query rewriting using state extents can entirely eliminate or substantially reduce graph traversal; it is reminiscent of optimizing relational queries using indices. There may be several ways to optimize a query using state extents; we give an exponential-time algorith...
One-Unambiguous Regular Languages
- Information and computation
, 1997
"... The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic meta-language for the definition of textual markup systems. In the standard, the right-hand sides of productions are based on regular expressions, although only regular expressions that denote words unambigu ..."
Abstract
-
Cited by 91 (9 self)
- Add to MetaCart
The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic meta-language for the definition of textual markup systems. In the standard, the right-hand sides of productions are based on regular expressions, although only regular expressions that denote words unambiguously, in the sense of the ISO standard, are allowed. In general, a word that is denoted by a regular expression is witnessed by a sequence of occurrences of symbols in the regular expression that match the word. In an unambiguous regular expression as defined by Book, Even, Greibach, and Ott, each word has at most one witness. But the SGML standard also requires that a witness be computed incrementally from the word with a one-symbol lookahead; we call such regular expressions 1-unambiguous. A regular language is a 1-unambiguous language if it is denoted by some 1-unambiguous regular expression. We give a Kleene theorem for 1-unambiguous languages and characterize 1-unambiguous regu...
Regular Expressions into Finite Automata
- Theoretical Computer Science
, 1996
"... It is a well-established fact that each regular expression can be transformed into a non-deterministic finite automaton (NFA) with or without ffl-transitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi [BS86] have shown that the construction of ..."
Abstract
-
Cited by 56 (5 self)
- Add to MetaCart
It is a well-established fact that each regular expression can be transformed into a non-deterministic finite automaton (NFA) with or without ffl-transitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi [BS86] have shown that the construction of an ffl-free NFA due to to Glushkov [Glu61] is a natural representation of the regular expression, because it can be described in terms of the Brzozowski derivatives [Brz64] of the expression. Moreover, the Glushkov construction also plays a significant role in the document processing area: The SGML standard [ISO86], now widely adopted by publishing houses and government agencies for the syntactic specification of textual markup systems, uses deterministic regular expressions, i.e. expressions whose Glushkov automaton is deterministic, as a description language for document types. In this paper, we first show that the Glushkov automaton can be constructed in time quadratic in the size of the...
Partial Derivatives of Regular Expressions and Finite Automata Constructions
- Theoretical Computer Science
, 1995
"... . We introduce a notion of a partial derivative of a regular expression. It is a generalization to the non-deterministic case of the known notion of a derivative invented by Brzozowski. We give a constructive definition of partial derivatives, study their properties, and employ them to develop a new ..."
Abstract
-
Cited by 49 (0 self)
- Add to MetaCart
. We introduce a notion of a partial derivative of a regular expression. It is a generalization to the non-deterministic case of the known notion of a derivative invented by Brzozowski. We give a constructive definition of partial derivatives, study their properties, and employ them to develop a new algorithm for turning regular expressions into relatively small NFA and to provide certain improvements to Brzozowski's algorithm constructing DFA. We report on a prototype implementation of our algorithm constructing NFA and present some examples. Introduction In 1964 Janusz Brzozowski introduced word derivatives of regular expressions and suggested an elegant algorithm turning a regular expression r into a deterministic finite automata (DFA); the main point of the algorithm is that the word derivatives of r serve as states of the resulting DFA [5]. In the following years derivatives were recognized as a quite useful and productive tool. Conway [8] uses derivatives to present various comp...
Minimization Algorithms for Sequential Transducers
, 2000
"... We present general algorithms for minimizing sequential finite-state transducers that output strings or numbers. The algorithms are shown to be efficient since in the case of acyclic transducers and for output strings they operate in O(S+|E|+|V|+(|E|-|V|+|F|)x(|Pmax|+1)) steps, where S is the sum of ..."
Abstract
-
Cited by 47 (12 self)
- Add to MetaCart
We present general algorithms for minimizing sequential finite-state transducers that output strings or numbers. The algorithms are shown to be efficient since in the case of acyclic transducers and for output strings they operate in O(S+|E|+|V|+(|E|-|V|+|F|)x(|Pmax|+1)) steps, where S is the sum of the lengths of all output labels of the resulting transducer, E the set of transitions of the given transducer, V the set of its states, F the set of final states, and Pmax one of the longest of the longest common prefixes of the output paths leaving each state of the transducer. The algorithms apply to a larger class of transducers which includes subsequential transducers.
Characterizations of 1-Way Quantum Finite Automata
- SIAM Journal on Computing
"... The 2-way quantum finite automaton introduced by Kondacs and Watrous[KW97] can accept non-regular languages with bounded error in polynomial time. If we restrict the head of the automaton to moving classically and to moving only in one direction, the acceptance power of this 1-way quantum finite aut ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
The 2-way quantum finite automaton introduced by Kondacs and Watrous[KW97] can accept non-regular languages with bounded error in polynomial time. If we restrict the head of the automaton to moving classically and to moving only in one direction, the acceptance power of this 1-way quantum finite automaton is reduced to a proper subset of the regular languages. In this paper we study two different models of 1-way quantum finite automata. The first model, termed measure-once quantum finite automata, was introduced by Moore and Crutchfield[MCar], and the second model, termed measure-many quantum finite automata, was introduced by Kondacs and Watrous[KW97]. We characterize the measure-once model when it is restricted to accepting with bounded error and show that, without that restriction, it can solve the word problem over the free group. We also show that it can be simulated by a probabilistic finite automaton and describe an algorithm that determines if two measureonce automata are equiv...
Rewriting Extended Regular Expressions
, 1993
"... We concider an extened algebra of regular events (languages) with intersection besides the usual operations. This algebra has the structure of a distributive lattice with monotonic operations; the latter property is crucial for some applications. We give a new complete Horn equational axiomatiztion ..."
Abstract
-
Cited by 18 (1 self)
- Add to MetaCart
We concider an extened algebra of regular events (languages) with intersection besides the usual operations. This algebra has the structure of a distributive lattice with monotonic operations; the latter property is crucial for some applications. We give a new complete Horn equational axiomatiztion of the algebra and develop some termrewriting techniques for constructing logical inferences of valid equations. A shorter version of this paper is to appear in the proceedings of Developments in Language Theory, Univ. of Turku, July 1993, published by World Scientific. The present version has been submitted for publication elsewhere. 1 Introduction In this paper we consider an extended algebra of regular events (languages) on a given alphabet with intersection besides the usual operations (union, concatenation, Kleene star, empty, and the regular unit). This algebra has the structure of a distributive lattice (join is union, meet is intersection) with only monotonic operations. The latte...

