Results 1  10
of
73
FiniteState Transducers in Language and Speech Processing
 Computational Linguistics
, 1997
"... Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducer ..."
Abstract

Cited by 308 (41 self)
 Add to MetaCart
Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducers. Transducers that output weights also play an important role in language and speech processing. We give a specific study of stringtoweight transducers, including algorithms for determinizing and minimizing these transducers very efficiently, and characterizations of the transducers admitting determinization and the corresponding algorithms. Some applications of these algorithms in speech recognition are described and illustrated. 1.
Adding structure to unstructured data
 In 6th Int. Conf. on Database Theory (ICDT ’97),LNCS 1186, 336–350
, 1997
"... We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that ..."
Abstract

Cited by 204 (22 self)
 Add to MetaCart
We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edgelabeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and e ciently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization. 1
XMill: an Efficient Compressor for XML Data
, 1999
"... We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogene ..."
Abstract

Cited by 184 (0 self)
 Add to MetaCart
We describe a tool for compressing XML data, with applications in data exchange and archiving, which usually achieves about twice the compression ratio of gzip at roughly the same speed. The compressor, called XMill, incorporates and combines existing compressors in order to apply them to heterogeneous XML data: it uses zlib, the library function for gzip, a collection of datatype specific compressors for simple data types, and, possibly, user defined compressors for application specific data types. 1 Introduction We have implemented a compressor/decompressor for XML data, to be used in data exchange and archiving, that achieves about twice the compression rate of generalpurpose compressors (gzip), at about the same speed. The tool can be downloaded from www.research.att.com/sw/tools/xmill/. XML is now being adopted by many organizations and industry groups, like the healthcare, banking, chemical, and telecommunications industries. The attraction in XML is that it is a selfdescribi...
Optimizing Regular Path Expressions Using Graph Schemas
, 1998
"... Several languages, such as LOREL and UnQL, support querying of semistructured data. Others, such as WebSQL and WebLog, query Web sites. All these languages model data as labeled graphs and use regular path expressions to express queries that traverse arbitrary paths in graphs. Naive execution of pa ..."
Abstract

Cited by 145 (5 self)
 Add to MetaCart
Several languages, such as LOREL and UnQL, support querying of semistructured data. Others, such as WebSQL and WebLog, query Web sites. All these languages model data as labeled graphs and use regular path expressions to express queries that traverse arbitrary paths in graphs. Naive execution of path expressions is inefficient, however, because it often requires exhaustive graph search. We describe two optimization techniques for queries with regular path expressions, which we call regular queries. Both rely on graph schemas, which specify partial knowledge of a graph's structure. Query pruning restricts search to a fragment of the graph; we give an efficient algorithm for rewriting any regular query into a pruned one. Query rewriting using state extents can entirely eliminate or substantially reduce graph traversal; it is reminiscent of optimizing relational queries using indices. There may be several ways to optimize a query using state extents; we give an exponentialtime algorith...
OneUnambiguous Regular Languages
 Information and computation
, 1997
"... The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic metalanguage for the definition of textual markup systems. In the standard, the righthand sides of productions are based on regular expressions, although only regular expressions that denote words unambigu ..."
Abstract

Cited by 101 (9 self)
 Add to MetaCart
The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic metalanguage for the definition of textual markup systems. In the standard, the righthand sides of productions are based on regular expressions, although only regular expressions that denote words unambiguously, in the sense of the ISO standard, are allowed. In general, a word that is denoted by a regular expression is witnessed by a sequence of occurrences of symbols in the regular expression that match the word. In an unambiguous regular expression as defined by Book, Even, Greibach, and Ott, each word has at most one witness. But the SGML standard also requires that a witness be computed incrementally from the word with a onesymbol lookahead; we call such regular expressions 1unambiguous. A regular language is a 1unambiguous language if it is denoted by some 1unambiguous regular expression. We give a Kleene theorem for 1unambiguous languages and characterize 1unambiguous regu...
Regular Expressions into Finite Automata
 Theoretical Computer Science
, 1996
"... It is a wellestablished fact that each regular expression can be transformed into a nondeterministic finite automaton (NFA) with or without ffltransitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi [BS86] have shown that the construction of ..."
Abstract

Cited by 64 (5 self)
 Add to MetaCart
It is a wellestablished fact that each regular expression can be transformed into a nondeterministic finite automaton (NFA) with or without ffltransitions, and all authors seem to provide their own variant of the construction. Of these, Berry and Sethi [BS86] have shown that the construction of an fflfree NFA due to to Glushkov [Glu61] is a natural representation of the regular expression, because it can be described in terms of the Brzozowski derivatives [Brz64] of the expression. Moreover, the Glushkov construction also plays a significant role in the document processing area: The SGML standard [ISO86], now widely adopted by publishing houses and government agencies for the syntactic specification of textual markup systems, uses deterministic regular expressions, i.e. expressions whose Glushkov automaton is deterministic, as a description language for document types. In this paper, we first show that the Glushkov automaton can be constructed in time quadratic in the size of the...
Partial Derivatives of Regular Expressions and Finite Automata Constructions
 Theoretical Computer Science
, 1995
"... . We introduce a notion of a partial derivative of a regular expression. It is a generalization to the nondeterministic case of the known notion of a derivative invented by Brzozowski. We give a constructive definition of partial derivatives, study their properties, and employ them to develop a new ..."
Abstract

Cited by 59 (0 self)
 Add to MetaCart
. We introduce a notion of a partial derivative of a regular expression. It is a generalization to the nondeterministic case of the known notion of a derivative invented by Brzozowski. We give a constructive definition of partial derivatives, study their properties, and employ them to develop a new algorithm for turning regular expressions into relatively small NFA and to provide certain improvements to Brzozowski's algorithm constructing DFA. We report on a prototype implementation of our algorithm constructing NFA and present some examples. Introduction In 1964 Janusz Brzozowski introduced word derivatives of regular expressions and suggested an elegant algorithm turning a regular expression r into a deterministic finite automata (DFA); the main point of the algorithm is that the word derivatives of r serve as states of the resulting DFA [5]. In the following years derivatives were recognized as a quite useful and productive tool. Conway [8] uses derivatives to present various comp...
Minimization Algorithms for Sequential Transducers
, 2000
"... We present general algorithms for minimizing sequential finitestate transducers that output strings or numbers. The algorithms are shown to be efficient since in the case of acyclic transducers and for output strings they operate in O(S+E+V+(EV+F)x(Pmax+1)) steps, where S is the sum of ..."
Abstract

Cited by 55 (12 self)
 Add to MetaCart
We present general algorithms for minimizing sequential finitestate transducers that output strings or numbers. The algorithms are shown to be efficient since in the case of acyclic transducers and for output strings they operate in O(S+E+V+(EV+F)x(Pmax+1)) steps, where S is the sum of the lengths of all output labels of the resulting transducer, E the set of transitions of the given transducer, V the set of its states, F the set of final states, and Pmax one of the longest of the longest common prefixes of the output paths leaving each state of the transducer. The algorithms apply to a larger class of transducers which includes subsequential transducers.
Characterizations of 1Way Quantum Finite Automata
 SIAM Journal on Computing
"... The 2way quantum finite automaton introduced by Kondacs and Watrous[KW97] can accept nonregular languages with bounded error in polynomial time. If we restrict the head of the automaton to moving classically and to moving only in one direction, the acceptance power of this 1way quantum finite aut ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
The 2way quantum finite automaton introduced by Kondacs and Watrous[KW97] can accept nonregular languages with bounded error in polynomial time. If we restrict the head of the automaton to moving classically and to moving only in one direction, the acceptance power of this 1way quantum finite automaton is reduced to a proper subset of the regular languages. In this paper we study two different models of 1way quantum finite automata. The first model, termed measureonce quantum finite automata, was introduced by Moore and Crutchfield[MCar], and the second model, termed measuremany quantum finite automata, was introduced by Kondacs and Watrous[KW97]. We characterize the measureonce model when it is restricted to accepting with bounded error and show that, without that restriction, it can solve the word problem over the free group. We also show that it can be simulated by a probabilistic finite automaton and describe an algorithm that determines if two measureonce automata are equiv...