Results 1  10
of
12
Regular Functions, Cost Register Automata, and Generalized MinCost Problems
, 2012
"... Motivated by the successful application of the theory of regular languages to formal verification of finitestate systems, there is a renewed interest in developing a theory of analyzable functions from strings to numerical values that can provide a foundation for analyzing quantitative properties o ..."
Abstract

Cited by 9 (6 self)
 Add to MetaCart
(Show Context)
Motivated by the successful application of the theory of regular languages to formal verification of finitestate systems, there is a renewed interest in developing a theory of analyzable functions from strings to numerical values that can provide a foundation for analyzing quantitative properties of finitestate systems. In this paper, we propose a deterministic model for associating costs with strings that is parameterized by operations of interest (such as addition, scaling, and min), a notion of regularity that provides a yardstick to measure expressiveness, and study decision problems and theoretical properties of resulting classes of cost functions. Our definition of regularity relies on the theory of stringtotree transducers, and allows associating costs with events that are conditional upon regular properties of future events. Our model of cost register automata allows computation of regular functions using multiple “writeonly ” registers whose values can be combined using the allowed set of operations. We show that classical shortestpath algorithms as well as algorithms designed for computing discounted costs, can be adopted for solving the mincost problems for the more general classes of functions specified in our model. Cost register automata with min and increment give a deterministic model that is equivalent to weighted automata, an extensively studied nondeterministic model, and this 1.1
FAST: A transducerbased language for tree manipulation
, 2012
"... Tree automata and tree transducers are used in a wide range of applications in software engineering, from XML processing to language typechecking. While these formalisms are of immense practical use, they can only model finite alphabets, and since many realworld applications operate over infinit ..."
Abstract

Cited by 5 (4 self)
 Add to MetaCart
(Show Context)
Tree automata and tree transducers are used in a wide range of applications in software engineering, from XML processing to language typechecking. While these formalisms are of immense practical use, they can only model finite alphabets, and since many realworld applications operate over infinite domains such as integers, this is often a limitation. To overcome this problem we augment tree automata and transducers with symbolic alphabets represented as parametric theories. Admitting infinite alphabets makes these models more general and succinct than their classical counterparts. Despite this, we show how the main operations, such as composition and language equivalence, remain computable given a decision procedure for the alphabet theory. We introduce a highlevel language called Fast that acts as a frontend for the above formalisms. Fast supports symbolic alphabets through tight integration with stateoftheart satisfiability modulo theory (SMT) solvers. We demonstrate our techniques on practical case studies, covering a wide range of applications.
Regular Combinators for String Transformations ∗
"... We focus on (partial) functions that map input strings to a monoid such as the set of integers with addition and the set of output strings with concatenation. The notion of regularity for such functions has been defined using twoway finitestate transducers, (oneway) cost register automata, and MS ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
We focus on (partial) functions that map input strings to a monoid such as the set of integers with addition and the set of output strings with concatenation. The notion of regularity for such functions has been defined using twoway finitestate transducers, (oneway) cost register automata, and MSOdefinable graph transformations. In this paper, we give an algebraic and machineindependent characterization of this class analogous to the definition of regular languages by regular expressions. When the monoid is commutative, we prove that every regular function can be constructed from constant functions using the combinators of choice, split sum, and iterated sum, that are analogs of union, concatenation, and Kleene*, respectively, but enforce unique (or unambiguous) parsing. Our main result is for the general case of noncommutative monoids, which is of particular interest for capturing regular stringtostring transformations for document processing. We prove that the following additional combinators suffice for constructing all regular functions: (1) the leftadditive versions of split sum and iterated sum, which allow transformations such as string reversal; (2) sum of functions, which allows transformations such as copying of strings; and (3) function composition, or alternatively, a new concept of chained sum, which allows output values from adjacent blocks to mix.
2013): LookAhead Removal for TopDown Tree Transducers. CoRR abs/1311.2400
"... Abstract. Topdown tree transducers are a convenient formalism for describing tree transformations. They can be equipped with regular lookahead, which allows them to inspect a subtree before processing it. In certain cases, such a lookahead can be avoided and the transformation can be realized by ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Topdown tree transducers are a convenient formalism for describing tree transformations. They can be equipped with regular lookahead, which allows them to inspect a subtree before processing it. In certain cases, such a lookahead can be avoided and the transformation can be realized by a transducer without lookahead. Removing the lookahead from a transducer, if possible, is technically highly challenging. For a restricted class of transducers with lookahead, namely those that are total, deterministic, ultralinear, and bounded erasing, we present an algorithm that, for a given transducer from that class, (1) decides whether it is equivalent to a total deterministic transducer without lookahead, and (2) constructs such a transducer if the answer is positive. For the whole class of total deterministic transducers with lookahead we present a similar algorithm, which assumes that a socalled difference bound is known for the given transducer. The designer of a transducer can usually also determine a difference bound for it. 1
From Monadic SecondOrder Definable String Transformations to Transducers
"... Abstract—Courcelle (1992) proposed the idea of using logic, in particular Monadic secondorder logic (MSO), to define graph to graph transformations. Transducers, on the other hand, are executable machine models to define transformations, and are typically studied in the context of stringtostring ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
Abstract—Courcelle (1992) proposed the idea of using logic, in particular Monadic secondorder logic (MSO), to define graph to graph transformations. Transducers, on the other hand, are executable machine models to define transformations, and are typically studied in the context of stringtostring transformations. Engelfriet and Hoogeboom (2001) studied twoway finite state stringtostring transducers and showed that their expressiveness matches MSOdefinable transformations (MSOT). Alur and Čern´y (2011) presented streaming transducers—oneway transducers equipped with multiple registers that can store output strings, as an equiexpressive model. Natural generalizations of streaming transducers to stringtotree (Alur and D’Antoni, 2012) and infinitestringtostring (Alur, Filiot, and Trivedi, 2012) cases preserve MSOexpressiveness. While earlier reductions from MSOT to streaming transducers used twoway transducers as the intermediate model, we revisit the earlier reductions in a more general, and previously unexplored, setting of infinitestringtotree transformations, and provide a direct reduction. Proof techniques used for this new reduction exploit the conceptual tools (composition theorem and finite additive coloring theorem) presented by Shelah (1975) in his alternative proof of Büchi’s theorem. Using such streaming stringtotree transducers we show the decidability of functional equivalence for MSOdefinable infinitestringtotree transducers. Index Terms—Streaming string transducers, monadic secondorder logic, ωregular transformations, tree transducers. I.
DReX: A Declarative Language for Efficiently Evaluating Regular String Transformations ∗
"... We present DReX, a declarative language that can express all regular stringtostring transformations, and can still be efficiently evaluated. The class of regular string transformations has a robust theoretical foundation including multiple characterizations, closure properties, and decidable analy ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
We present DReX, a declarative language that can express all regular stringtostring transformations, and can still be efficiently evaluated. The class of regular string transformations has a robust theoretical foundation including multiple characterizations, closure properties, and decidable analysis questions, and admits a number of string operations such as insertion, deletion, substring swap, and reversal. Recent research has led to a characterization of regular string transformations using a primitive set of function combinators analogous to the definition of regular languages using regular expressions. While these combinators form the basis for the language DReX proposed in this paper, our main technical focus is on the complexity of evaluating the output of a DReX program on a given input string. It turns out that the natural evaluation algorithm involves dynamic programming, leading to complexity that is cubic in the length of the input string. Our main contribution is identifying a consistency restriction on the use of combinators in DReX programs, and a singlepass evaluation algorithm for consistent programs with time complexity that is linear in the length of the input string and polynomial in the size of the program. We show that the consistency restriction does not limit the expressiveness, and whether a DReX program is consistent can be checked efficiently. We report on a prototype implementation, and evaluate it using a representative set of text processing tasks.
Streamability of Nested Word Transductions ∗
"... We consider the problem of evaluating in streaming (i.e. in a single lefttoright pass) a nested word transduction with a limited amount of memory. A transduction T is said to be height bounded memory (HBM) if it can be evaluated with a memory that depends only on the size of T and on the height of ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We consider the problem of evaluating in streaming (i.e. in a single lefttoright pass) a nested word transduction with a limited amount of memory. A transduction T is said to be height bounded memory (HBM) if it can be evaluated with a memory that depends only on the size of T and on the height of the input word. We show that it is decidable in coNPTime for a nested word transduction defined by a visibly pushdown transducer (VPT), if it is HBM. In this case, the required amount of memory may depend exponentially on the height of the word. We exhibit a sufficient, decidable condition for a VPT to be evaluated with a memory that depends quadratically on the height of the word. This condition defines a class of transductions that strictly contains all determinizable VPTs. 1
Firstorder definable string transformations
, 2014
"... The connection between languages defined by computational models and logic for languages is wellstudied. Monadic secondorder logic and finite automata are shown to closely correspond to eachother for the languages of strings, trees, and partialorders. Similar connections are shown for firstorde ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
The connection between languages defined by computational models and logic for languages is wellstudied. Monadic secondorder logic and finite automata are shown to closely correspond to eachother for the languages of strings, trees, and partialorders. Similar connections are shown for firstorder logic and finite automata with certain aperiodicity restriction. Courcelle in 1994 proposed a way to use logic to define functions over structures where the output structure is defined using logical formulas interpreted over the input structure. Engelfriet and Hoogeboom discovered the corresponding "automata connection " by showing that twoway generalised sequential machines capture the class of monadicsecond order definable transformations. Alur and Cerny further refined the result by proposing a oneway deterministic transducer model with string variables—called the streaming string transducers—to capture the same class of transformations. In this paper we establish a transducerlogic correspondence for Courcelle’s firstorder definable string transformations. We propose a new notion of transition monoid for streaming string transducers that involves structural properties of both underlying input automata and variable dependencies. By putting an aperiodicity restriction on the transition monoids, we define a class of streaming string transducers that captures exactly the class of firstorder definable transformations. 1
ACM Student Member: 4585012
"... Web Scraping. The last decade has seen a proliferation of programs that operate on data collected from the Internet. In general the database containing such data cannot be accessed directly due to permission restrictions, however these programs can still “scrape ” many useful pieces of information d ..."
Abstract
 Add to MetaCart
(Show Context)
Web Scraping. The last decade has seen a proliferation of programs that operate on data collected from the Internet. In general the database containing such data cannot be accessed directly due to permission restrictions, however these programs can still “scrape ” many useful pieces of information directly from HTML pages. A Web Scraper is a program that given an HTML document extracts some data from it and stores it in a table (a relation). An example of such a scraper is a program that given an Amazon search page builds a table containing the names and prices of each product (Fig.1). Even though these transformations are pretty simple, they are often written by nonexpert programmers that are not able to provide an efficient implementation. Several tools, like Outwit and Mozenda, have been proposed for helping these users in the task o writing such scrapers, however no clear theory has been developed for supporting them. A formal model for Web Scraping. In our work we propose a novel model that is able do describe many interesting web scraping tasks while enjoying many decidability and efficiency properties. We build on tree automata and transducers, which have been extensively studied in the context of program analysis [HP03] and XML transformations [MBPS05]. In particular we extend Streaming Tree Transducers [AD12] to output relations instead of trees and to allow