Results 1  10
of
15
Symbolic Finite State Transducers: Algorithms and Applications
 POPL'12
, 2012
"... Finite automata and finite transducers are used in a wide range of applications in software engineering, from regular expressions to specification languages. We extend these classic objects with symbolic alphabets represented as parametric theories. Admitting potentially infinite alphabets makes t ..."
Abstract

Cited by 28 (12 self)
 Add to MetaCart
(Show Context)
Finite automata and finite transducers are used in a wide range of applications in software engineering, from regular expressions to specification languages. We extend these classic objects with symbolic alphabets represented as parametric theories. Admitting potentially infinite alphabets makes this representation strictly more general and succinct than classical finite transducers and automata over strings. Despite this, the main operations, including composition, checking that a transducer is singlevalued, and equivalence checking for singlevalued symbolic finite transducers are effective given a decision procedure for the background theory. We provide novel algorithms for these operations and extend composition to symbolic transducers augmented with registers. Our base algorithms are unusual in that they are
Dataparallel stringmanipulating programs
, 2012
"... Applications ranging from malware detection to graphics to Web security sanitization depend on string transformations, but writing such transformations is a challenge. Making these transformations run in parallel on a cluster of machines or special hardware is an even greater challenge. We answer th ..."
Abstract

Cited by 8 (5 self)
 Add to MetaCart
(Show Context)
Applications ranging from malware detection to graphics to Web security sanitization depend on string transformations, but writing such transformations is a challenge. Making these transformations run in parallel on a cluster of machines or special hardware is an even greater challenge. We answer this challenge with fast, parallel string manipulating code compiled from Bek, a domainspecific language for writing complex string manipulation routines[9]. First, our new compilation pipeline maps a Bek program to an intermediate format consisting of symbolic finite state transducers, which extend classical transducers with symbolic predicates. We present a novel algorithm that we call exploration which performs a symbolic partial evaluation of these transducers to obtain simplified, stateless versions of the original program. These simplified versions can be lifted back to Bek, and from there compiled to C#, C, or JavaScript. Next, we show how the resulting transducers, postexploration, fit into a recent advance in dataparallel compilation of finite state machines. Finally, we describe a concrete implementation built on the Windows High Performance Computing framework in a cluster. We have implemented our code generation pipeline for Bek code corresponding to several real string manipulating functions, such as security sanitizers for Web applications. We use an automatic testing approach to compare our generated code to the original C # implementations and found no semantic deviations. Our generated C # code outperforms the previous handwritten code by a factor of up to 3 and we generate code in C that is a factor of 5 faster. For a cluster with 32 nodes, we see speedups of 13.7 times compared to sequential C # code for an HTML sanitizer over 32GB of data. 1.
Symbolic transducers
, 2011
"... Symbolic Finite Transducers, or SFTs, is a representation of finite transducers that annotates transitions with logical formulae to denote sets of concrete transitions. This representation has practical advantages in applications for web security analysis, as it provides ways to succinctly represen ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
Symbolic Finite Transducers, or SFTs, is a representation of finite transducers that annotates transitions with logical formulae to denote sets of concrete transitions. This representation has practical advantages in applications for web security analysis, as it provides ways to succinctly represent web sanitizers that operate on large alphabets. More importantly, the representation is also conducive for efficient analysis using stateoftheart theorem proving techniques. Besides introducing SFTs we provide algorithms for various closure properties including composition and domain restriction. A central result is that equivalence of SFTs is decidable when there is a fixed bound on how many different values that can be generated for arbitrary inputs. In practice, we use a semidecision algorithm, encoded axiomatically, for nonequivalence of arbitrary SFTs. We show that several of the main results lift to a more expressive version of SFTs with Registers, SFTRs. They admit a fixed set of registers that can be referenced in the logical formulae, updated by input characters, or used to generate output.
Decision procedures for composition and equivalence of symbolic finite state transducers
, 2011
"... Finite automata model a wide array of applications in software engineering, from regular expressions to specification languages. Finite transducers are an extension of finite automata tomodel functions on lists ofelements, which in turn haveusesinfieldsas diverseas computationallinguistics and model ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
Finite automata model a wide array of applications in software engineering, from regular expressions to specification languages. Finite transducers are an extension of finite automata tomodel functions on lists ofelements, which in turn haveusesinfieldsas diverseas computationallinguistics and modelbased testing. Symbolic finite transducers are a furthergeneralization offinitetransducerswheretransitionsare labeled with formulas in a given background theory. Compared to classical finite transducers, symbolic transducers are far more succinct in the case of finite alphabets, because they have no need to enumerate all cases of a transition; symbolic transducers can also use theories, such as the theory of linear arithmetic over integers or reals, with infinite alphabets.
Generating Fast String Manipulating Code Through Transducer Exploration and SIMD Integration
, 2011
"... Security sanitizers have long been known to be very difficult to implement correctly. Moreover, with the rise of the web, developers need string manipulating functions in both “server”and“client”languages. Handwritingthesefunctions separately is an open invitation to bugs. At the same time, autoge ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Security sanitizers have long been known to be very difficult to implement correctly. Moreover, with the rise of the web, developers need string manipulating functions in both “server”and“client”languages. Handwritingthesefunctions separately is an open invitation to bugs. At the same time, autogenerated code will not be accepted unless it is significantly faster than previous handwritten code. We address this problem with two complementary approaches centered around Bek, a domainspecific language for writing complex string manipulation routines [8]. First, Bek compiles the input domainspecific program into an intermediate format consisting of symbolic finite state transducers, which extend classical transducers with symbolic predicates. In this paper, we present a novel algorithm that we call exploration which performs a symbolic partial evaluation of these transducers to obtain simplified, stateless versions of the original program. These simplified versions can be lifted back to Bek, and from there compiled to C#, C, or JavaScript. Second, we explore how SIMD instructions can be combined with Bek compilation to C and C, enabling developers to access parallel features of modern architectures without needing to tweak the C compiler or handwrite assembly. We have implemented our code generation pipeline for Bek code corresponding to several real string sanitizers. We use an automatic testingapproach tocompare our generated code to the original C# implementations and found no semantic deviations. Our generated C# code outperforms the previous handtuned code by a factor of up to 2.5. For C code with SIMD, we see speedups of 2.5 times compared to native C code for the same sanitizer.
Tree Regular Model Checking for LatticeBased Automata
 in "CIAA  18th International Conference on Implementation and Application of Automata
"... appor t t e ch n i qu e ..."
(Show Context)
Grenoble RhôneAlpes THEME Embedded and Real Time Systems
"... 2. Overall Objectives........................................................................ 1 ..."
Abstract
 Add to MetaCart
2. Overall Objectives........................................................................ 1
Tree Regular Model Checking for
, 2013
"... HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte p ..."
Abstract
 Add to MetaCart
(Show Context)
HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et a ̀ la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. appor t t e ch n i qu e